Difference in UPDATE and MODIFY statements in SAS
Intoduction :
Update and Modify are two underused statements in SAS programming
language, but the questions regarding the working of each and the differences
between them is a heavily asked question in SAS interviews, the latter one
being more frequent.
This post sheds some light on working of update and modify with extra
stress on comparing them and listing out the differences.
This post has enough information to get you through most of the interview
questions, but is not an exhaustive material on UPDATE and MODIFY statements.
For complete working of these please visit SAS website.
So let us start with explanation of each and then we’ll see the differences:
What is Updating:
Updating a SAS dataset means replacing the variable
values in one data (typically called the master dataset) with values from
another data set (the transaction data set).
If optionUPDATEMODE=MISSINGCHECK, then blankvalues in a transaction
dataset will not replace current values in a master data set.
If option UPDATEMODE=NOMISSINGCHECK, then blank values in transaction dataset replace preexisting values in the master data set.
By default it is set to MISSINGCHECK.
If option UPDATEMODE=NOMISSINGCHECK, then blank values in transaction dataset replace preexisting values in the master data set.
By default it is set to MISSINGCHECK.
You update a data set by using the
UPDATE statement along with a BY statement. Both of the input data sets must be
sorted by the variable that you use in the BY statement. The following figure
shows the results of updating a SAS data set.
As you can see the names for
accounts numbers 6 and 8 are updated according to the transaction dataset.
Also a new account (Acc no. 9) has
been added to the master dataset.
data master;
input acc_no first_name $ last_name $;
datalines;
1 A B
2 A B
3 A B
4 A B
5 A B
6 A B
7 A B
8 A B
;
run;
data transaction;
input acc_no first_name $ last_name $;
datalines;
6 C D
8 C D
9 C D
;
run;
data master;
update master transaction;
by acc_no;
run;
So think of the places where you’ll
require an UPDATE. One place is a company dataset which stores the salaries of
the employees. Every month the salaries are updated for existing employees and
new joinees and their salaries are appended to the existing dataset.
Definition
of Modifying:
When
we MODIFY a SAS data set we replace observations
or parts of them in an existing data set. Modifying a SAS data set is similar
to updating a SAS data set, but the following differences exist:
Modifying
cannot create a new data set, while updating can. Though
it is actually not a limitation as it results in less disk space being used.
Unlike
updating, modifying does not require that the master data set or the
transaction data set be sorted.
We
can change an existing dataset by using the MODIFY statement with a BY
statement. The following figure shows the results.
The
code used for this is :
data master;
input acc_no first_name $ last_name $;
datalines;
1
A B
2
A B
3
A B
4
A B
5
A B
6
A B
7
A B
8
A B
;
run;
data transaction;
input acc_no first_name $ last_name $;
datalines;
6
C D
8
C D
;
run;
data master;
modify master transaction;
by acc_no;
run;
If
we try to add a new value using MODIFY statement with BY statement we get an error as MODIFY searches a match
for all observations and if a match for any observations in the transaction
dataset is not found, it thoriws an error.
See the example below:
data transaction;
input acc_no first_name $ last_name $;
datalines;
6
C D
8
C D
9
C D
;
run;
data master;
modify master transaction;
by acc_no;
run;
ERROR:
The TRANSACTION data set observation does not exist on the MASTER data set.
ERROR: No matching observation was found
in MASTER data set.
Differences:
MODIFY statement, has overcome the limitations of SET,
MERGE, and UPDATE statements.
For eg, this statement:
For eg, this statement:
- Can be used to update a dataset in place(Does not make a copy)
- Can use sequential, matching, or direct access method.
- Sorting is not required unlike MERGE and UPDATE while match merging.
- Unlike Merge and UPDATE does not give an error if duplicate BY values exist in master or transaction dataset or both.
- Requires less disk space than UPDATE, MERGE etc. as a temporary copy of master dataset is not made.
·
The MODIFY statement can only be used to update values of
existing data set variables; no changes can be made with respect to the
structure of the program data vector that is created at execution time for the
data set being modified.
The syntax for MODIFY and update are similar when performing matching access.
In both the master dataset is mentioned followed by the transaction dataset. Both require a BY statement which lists the variables whose values from the trans. dataset are used to find observations in the master data set. However, the execution processing for both statements is not same.
Written below is how observations for update are retrieved in matching access. During execution,
In both the master dataset is mentioned followed by the transaction dataset. Both require a BY statement which lists the variables whose values from the trans. dataset are used to find observations in the master data set. However, the execution processing for both statements is not same.
Written below is how observations for update are retrieved in matching access. During execution,
- First the MODIFY statement pick up a row from the trans.n dataset;
- Secondly, MODIFY generates a WHERE condition, which specifies the value of the BY variable from the row of transaction dataset , to find and get a row from the
master data set.
The
following rules always apply when duplicate values exist:
- If there are multiple
observations with the same value for the BY variable in the master data
set, MODIFY changes only the values for the first occurrence. For UPDATE
statement is that processing does not continue if duplicate values of the
BY variable are detected in the master data set.
- If there are multiple
observations with the same value for the BY variable in the transaction
data set, MODIFY performs operations consecutively, a manner simi1ar to
that of the UPDATE statement.
I
found the table below on SAS website, which summarizes the differences between MERGE,
MODIFY and UPADTE :
Comparing Modifying, Merging, and Updating Data Sets
Comparison |
MERGE
|
UPDATE
|
MODIFY
|
Is sorting of datasets required?
|
Merge with By statement:
Required
One on one merge: Not required
|
Required
|
Not required
|
Requirement of Unique By values
|
Not required
|
Master
data set: Required
Transaction data set: Not required
|
Not required
|
Can variables be created or deleted
|
Can be created
|
Can be created
|
Cannot be created
|
Number of dsets that can be combined
|
Unlimited
|
Two
|
Two
|
Missing value handling
|
Non-missing values are overwritten
|
If there are missing values in transaction dataset the values in the maser are not updated
|
If UPDATEMODE option is MISSINGCHECK then does not overwrite non-missing else overwrite.
Default: MISSINGCHECK
|
Conclusion: Update and Modify have a lot of things in common but have very specific
differences and that is why this question comes up again and again in
interviews. Hope this post will help you get the hang of these useful
statements and gain you confidence.
Will be back with some more
SAS magic. Goodbye till then.
References :
Saurabh Singh Chauhan
(er.chauhansaurabh@gmail.com)
Note:
Comments and suggestions are always welcome.
Disclaimer :
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc.in the USA and other countries. ® Indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies.
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc.in the USA and other countries. ® Indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies.
The contents of this post are the works of the author(s)and do not necessarily represent the opinions,recommendations, or practices of any organization whatsoever.
Hi Saurabh, I have a few doubts, need to contact you. I have sent you an invitation on Linked In. Please accept.
ReplyDeleteHave accepted the invitation..I have my email in my post. Please do send you questions to my mail. You are most welcome to also add them as comments here.
ReplyDeleteWill definitely try to clear you doubts. Thanks for reading.
hi saurabh i have sent invitation on linked in but not accepted..and the links you provided for faq's is not working saurabh can please upload them
ReplyDeletehi saurabh can u please upload faqs at your blog what u previously posted
ReplyDelete