SOME USEFUL SAS
PROCEDURES
Part 2
Objective : This post is a sequel to my first
post on procedures where I highlighted some underutilized procedures. The
objective of this post is to the readers basic understanding of the
capabilities of some underutilized and some common procedures which can be of
great help in day to day programming. The objective is not to give exhaustive
knowledge about these procedures but to attract readers’ attention to these
procs by interesting examples.
Procedures:
The SORT Procedure:
This procedure I guess everyone knows and uses but I
think you read on further and most probably you’ll find something interesting. The
SORT procedure orders SAS data set observations by the values of one or more
character or numeric variables. The SORT procedure either replaces the original
data set or creates a new data set.
Syntax :
PROC SORT <collating-sequence-option>
<other option(s)>;
BY <DESCENDING>
variable-1 <…<DESCENDING> variable-n>;
Example:
proc sort data=employee;
by idnumber;
run;
I included this procedure to give stress on 4
options which are generally asked in lot of interviews so you should know it.
1) DUPOUT=
: This option creates a new datasets which will contain the observations
deleted by nodupkey or noduprecs, so you can review those later.
2) NODUPKEY
: This option deletes observation based on duplicate values of variables
provided in by statement.
3) NOUPRECS
/ NODUP :- This less known option deletes an observation only when values for
all the variables match and not just for by variables.
4) TAGSORT
:- This option reduces the need for temporary disk space as it only stores the
keys while sorting instead of the complete observation. This can be used when
you have a huge dataset and not much disk space avalaible, using this can save
you from OUT OF DISK SPACE error.
The PRINT Procedure:
The print procedure like proc sort is also one of
the most used procedures. But there are lot of options and functionality in
print procedure which are unknown to most and get underutilized; I’ll be
listing those below. BASE SAS PROCDURES GUIDE has a 72 page chapter dedicated
to it so after all this can’t be such naïve procedure as we use it.
The PRINT procedure prints
the observations in a SAS data set, using all or some of the variables. You can
create a variety of reports ranging from a simple listing to a highly
customized report that groups he data and
calculates totals and subtotals for numeric variables.
Syntax :
PROC PRINT <option(s)>;
BY <DESCENDING> variable-1 <…<DESCENDING>
variable-n><NOTSORTED>;
PAGEBY BY-variable;
SUMBY BY-variable;
ID variable(s) <option>;
SUM variable(s) <option>;
VAR variable(s)
<option>;
RUN;
SUM
– The sum statement gives totals for numeric values.
SUMBY
- Limits the number of sums that appear in the report. The BY-variable
identifies a variable that appears in the BY statement in the PROC PRINT step.
If the value of the BY variable changes, or if the value of any BY variable
that precedes it in the BY statement changes, PROC PRINT prints the sums of all
variables listed in the SUM statement.(Yes right! It is not what you thought it
is, summing by each by group is done by SUM
and BY statements given separately).
PAGEBY
: Control page ejects that occur before a page is full.
VAR -
Select variables that appear in the report and determine their order.
The FORMAT Procedure:
The FORMAT procedure enables you to:
2) You
can print the parts of a catalog that contain informats or
formats.
3) Store
descriptions of informats or formats in a SAS data set, and use a SAS data set
to create informats or formats.
Syntax :
PROC
FORMAT <option(s)>;
EXCLUDE entry(s);
INVALUE <$>name
<(informat-option(s))>
value-range-set(s);
PICTURE name
<(format-option(s))>
value-range-set-1
<(picture-1-option(s) )>
<…value-range-set-n
<(picture-n-option(s))>>;
SELECT entry(s);
VALUE <$>name <(format-option(s))>
value-range-set(s);
run;
Frankly speaking the above syntax doesn’t do
anything except scare you, so I’ll be discussing only few uses of proc format
which will be very useful in your day to day programming.
1) Defining formats and informat :
I know you all know this but just read it again for
revision or for the heck of it J
Here are the 2 basic examples
Proc
format;
value gender
1='M'
2='F'
;
run;
For
informats you just exchange the value keyword by invalue.
Proc
format;
invalue $gender
1='M'
2='F'
;
run;
A good
thing to know here are MULTILABEL FORMATS.
MULTILABEL FORMATS :
Mltilabel
formats empower you to :
a)
Providing overlapping ranges across labels.
b)
Secondary labels to same values.
An example
for multilabel formats is :
Proc
format;
value agefmt (multilabel)
low - 24 = "Under 25"
low - 49 = "Under 50"
low - 74 = "Under 75"
low - 99 = "Under 100"
100
- high = "100 and
Above"
;
run;
This format
just assigns overlapping ranges to different labels.
The second
type of format is
proc
format ;
value drugcnt (multilabel)
1 = "Placebo"
2 = "Company 10mg"
2 = "Any Dose"
3 = "Company 20mg"
3 = "Any Dose"
4 = "Comparator"
;
run ;
This format
assigns multiple labels to same value.
You might
be wondering where these can be used. Their use is very limited and these can
only be used in summary procedures like means, summary and tabulate procedures.
Use them and see the magic how they solve the problems of counting a subject in
multiple categories.
2) Printing
formats from unreadable catalogs
The simple
way to print a format in catalog is :
proc
format library=work fmtlib;
select drugcnt;
run;
To print
the complete catalog you can just leave out the select statement.
3) Storing
format information to dataset and use a dataset information to create format :
To display
or use format information is the Cntlout option.
proc
format library=work cntlout=fmts;
select drugcnt;
run;
TO create a
format from a dataset we have to have a highly specialized dataset with exact
variable names as required by the format statement.
Either you
can output the dataset (from now on reffered to as control dataset) using
CNTLOUT option and use its structure add values or use its subset as the input
control dataset.
Below is
the proc contents output for a control dataset :
DATATYPE
Char 8 Date/time/datetime?
DECSEP
Char 1 Decimal separator
DEFAULT
Num 3 Default length
DIG3SEP
Char 1 Three-digit separator
EEXCL
Char 1 End exclusion
END Char 16
Ending value for format
FILL
Char 1 Fill character
FMTNAME Char
32 Format name
FUZZ
Num 8 Fuzz value
HLO
Char 11 Additional information
LABEL Char
12 Format value label
LANGUAGE
Char 8 Language for date strings
LENGTH
Num 3 Format length
MAX
Num 3 Maximum length
MIN Num
3 Minimum length
MULT
Num 8 Multiplier
NOEDIT
Num 3 Is picture string noedit?
PREFIX
Char 2 Prefix characters
SEXCL
Char 1 Start exclusion
TYPE
Char 1 Type
of format
START Char
32 Starting value for format
The Ones in bold are the minimum
required variables in the input control dataset to create a format
successfully. Afetr creating the control datset only a small piece of code is
left :
proc
format cntlin=fmts;
run;
And your format is created. This
technique is very useful in creating formats dynamically based on the values
received in a dataset.
The PWENCODE Procedure:
A small but useful procedure serving
only a single purpose, it encrypts your password using one of its 3 techniques
(Generally technique doesn’t matter). SO you can use the encoded password
instead of the real password.
Syntax
PROC
PWENCODE IN=’password’ <OUT=fileref>
<METHOD=encoding-method>;
RUN;
IN= Specify your actual password here
OUT= specifies the file where your
encoded password needs to be printed, default is log
You can use this technique to save your
encrypted password to a secure file and can read the password in a datastep and
use it anywhere without having to specify your actual password anywhere.
The OPTIONS Procedure:
Many of us will consider the heading above as a
typing mistake. Everyone has used options but is there an options procedure..??
And if any exists then what extra it will than just
specifying options which we can do with just ‘options’ statement!!
Syntax :
PROC
OPTIONS <option(s)>;
RUN;
Proc options does 3 jobs :
1) Dsiplays a list of all syste options.
PROC
OPTIONS;
RUN;
There are a host of options to control how your
listing looks. You can get the complete list and their usage in SAS® 9.2 Procedures
Guide. Your log looks something like this :
SAS (r)
Proprietary Software Release 9.2 TS1M0
Portable Options:
APPLETLOC=C:\Program
Files\SAS\SASGraphJavaApplets\9.2
Location of Java applets
ARMAGENT= ARM Agent to use to collect ARM records
ARMLOC=ARMLOG.LOG Identify location where ARM
records are to be written
ARMSUBSYS=(ARM_NONE)
Enable/Disable ARMing of SAS subsystems
NOASYNCHIO Do not enable asynchronous input/output
AUTOSAVELOC= Identifies the location where program
editor contents are auto saved
NOAUTOSIGNON SAS/CONNECT remote submit will not
automatically attempt to SIGNON
BINDING=DEFAULT Controls the binding edge for duplexed
output
BOMFILE Add Byte Order Mark when creating
Unicode files
BOTTOMMARGIN=0.000 IN
Bottom margin for printed output
BUFNO=1 Number of buffers for each SAS data
set
BUFSIZE=0 Size of buffer for page of SAS data
set
BYERR Set the error flag if a null data
set is input to the SORT procedure
BYLINE Print the by-line at the beginning
of each by-group
BYSORTED Require SAS data set observations to
be sorted for BY processing
NOCAPS Do not translate source input to
uppercase
NOCARDIMAGE Do not process SAS source and data lines
as 80-byte records
CATCACHE=0 Number of SAS catalogs to keep in cache
memory
CBUFNO=0 Number of buffers to use for each SAS
catalog
2) The second thing it does is to do the above job
for a specific option. It also displays some extra information about that
option when we use define in the end.
options linesize=80;
proc
options option=linesize define;
run;
3) Displays info about the system option groups.
proc
options listgroups;
run;
When the option issue has come up I will give a tip
which I generally use :
If you require to change the value in a program and
then restore its value back after sometime you generally declare that option 2
times once to change and other to reset it back. Though this does the job but
is not very smart and you have to go and check the original value. You can use
a trick like.
1) Get the option value dynamically. There are two
ways to this
data
_null_;
call symput(‘lnsz_vlaue’,getoption('linesize'));
run;
%put & lnsz_vlaue;
%macro
hello();
%global lnsz_vlaue ;
%let lnsz_vlaue=%sysfunc(getoption(linesize));
%put
linesize=&lnsz_vlaue;
%mend;
%hello;
2) Change to your desired value.
options linesize=100;
3) Reset it back to the original value dynamically.
options linesize=&lnsz_vlaue;
Though it does the same job but with this you can be
absolutely sure that in future also under different option values this code will
work perfectly.
Conclusion : After reading this post the users will be having a good workable
knowledge on the procedures outlined above and I will expect them to strat
utilizing the capabilities of these procs.
Will be back with some more magic of SAS. Till then Goodbye.
References :
SAS 9.2 procedures guide by SAS Institute.
Saurabh Singh Chauhan
(er.chauhansaurabh@gmail.com)
Note:
Comments and suggestions are always welcome.
Disclaimer :
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc.in the USA and other countries. ® Indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies. The contents of this post are the works of the author(s)and do not necessarily represent the opinions,recommendations, or practices of any organization whatsoever.
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc.in the USA and other countries. ® Indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies. The contents of this post are the works of the author(s)and do not necessarily represent the opinions,recommendations, or practices of any organization whatsoever.
Thank you for bringingthe up these options. Some where indeed unknown to me.
ReplyDelete-ap