Sunday, March 4, 2012

SOME USEFUL SAS PROCEDURES


SOME USEFUL SAS PROCEDURES
Part 2

Objective : This post is a sequel to my first post on procedures where I highlighted some underutilized procedures. The objective of this post is to the readers basic understanding of the capabilities of some underutilized and some common procedures which can be of great help in day to day programming. The objective is not to give exhaustive knowledge about these procedures but to attract readers’ attention to these procs by interesting examples.

Procedures:

The SORT Procedure:

This procedure I guess everyone knows and uses but I think you read on further and most probably you’ll find something interesting. The SORT procedure orders SAS data set observations by the values of one or more character or numeric variables. The SORT procedure either replaces the original data set or creates a new data set.

Syntax :
PROC SORT <collating-sequence-option> <other option(s)>;
BY <DESCENDING> variable-1 <…<DESCENDING> variable-n>;
Example:

proc sort data=employee;
  by idnumber;
run;

I included this procedure to give stress on 4 options which are generally asked in lot of interviews so you should know it.
  
  1) DUPOUT= : This option creates a new datasets which will contain the observations deleted by nodupkey or noduprecs, so you can review those later.

  2) NODUPKEY : This option deletes observation based on duplicate values of variables provided in by statement.

  3) NOUPRECS / NODUP :- This less known option deletes an observation only when values for all the variables match and not just for by variables.

  4) TAGSORT :- This option reduces the need for temporary disk space as it only stores the keys while sorting instead of the complete observation. This can be used when you have a huge dataset and not much disk space avalaible, using this can save you from OUT OF DISK SPACE error.

The PRINT Procedure:

The print procedure like proc sort is also one of the most used procedures. But there are lot of options and functionality in print procedure which are unknown to most and get underutilized; I’ll be listing those below. BASE SAS PROCDURES GUIDE has a 72 page chapter dedicated to it so after all this can’t be such naïve procedure as we use it.

The PRINT procedure prints the observations in a SAS data set, using all or some of the variables. You can create a variety of reports ranging from a simple listing to a highly customized report that groups  he data and calculates totals and subtotals for numeric variables.

Syntax :
PROC PRINT <option(s)>;
BY <DESCENDING> variable-1 <…<DESCENDING> variable-n><NOTSORTED>;
PAGEBY BY-variable;
SUMBY BY-variable;
ID variable(s) <option>;
SUM variable(s) <option>;
VAR variable(s) <option>;
RUN;

SUM – The sum statement gives totals for numeric values.

SUMBY - Limits the number of sums that appear in the report. The BY-variable identifies a variable that appears in the BY statement in the PROC PRINT step. If the value of the BY variable changes, or if the value of any BY variable that precedes it in the BY statement changes, PROC PRINT prints the sums of all variables listed in the SUM statement.(Yes right! It is not what you thought it is, summing by each by group is done by SUM and BY statements given separately).

PAGEBY : Control page ejects that occur before a page is full.

VAR - Select variables that appear in the report and determine their order.

The FORMAT  Procedure:

The FORMAT procedure enables you to:

  1) Define your own informats and formats for variables
  2) You can print the parts of a catalog that contain informats or  
    formats.
  3) Store descriptions of informats or formats in a SAS data set, and use a SAS data set to create informats or formats.

Syntax :

PROC FORMAT <option(s)>;
     EXCLUDE entry(s);
     INVALUE <$>name <(informat-option(s))>
     value-range-set(s);
     PICTURE name <(format-option(s))>
     value-range-set-1 <(picture-1-option(s) )>
     <…value-range-set-n <(picture-n-option(s))>>;
     SELECT entry(s);
     VALUE <$>name <(format-option(s))>
     value-range-set(s);
run;

Frankly speaking the above syntax doesn’t do anything except scare you, so I’ll be discussing only few uses of proc format which will be very useful in your day to day programming.

1) Defining formats and informat :

I know you all know this but just read it again for revision or for the heck of it J

Here are the 2 basic examples

Proc format;
value gender
1='M'
2='F'
;
run;

For informats you just exchange the value keyword by invalue.

Proc format;
     invalue $gender
          1='M'
          2='F'
;
run;

A good thing to know here are MULTILABEL FORMATS.

MULTILABEL FORMATS :

Mltilabel formats empower you to :

a) Providing overlapping ranges across labels.
b) Secondary labels to same values.

An example for multilabel formats is :

Proc format;
     value agefmt (multilabel)
          low - 24 = "Under 25"
          low - 49 = "Under 50"
          low - 74 = "Under 75"
          low - 99 = "Under 100"
          100 - high = "100 and Above"
     ;
run;

This format just assigns overlapping ranges to different labels.
The second type of format is

proc format ;
     value drugcnt (multilabel)
          1 = "Placebo"
          2 = "Company 10mg"
          2 = "Any Dose"
          3 = "Company 20mg"
          3 = "Any Dose"
          4 = "Comparator"
     ;
run ;

This format assigns multiple labels to same value.

You might be wondering where these can be used. Their use is very limited and these can only be used in summary procedures like means, summary and tabulate procedures. Use them and see the magic how they solve the problems of counting a subject in multiple categories.

2) Printing formats from unreadable catalogs

The simple way to print a format in catalog is :

proc format library=work fmtlib;
select drugcnt;
run;

To print the complete catalog you can just leave out the select statement.

3) Storing format information to dataset and use a dataset information to create format :

To display or use format information is the Cntlout option.

proc format library=work cntlout=fmts;
select drugcnt;
run;

TO create a format from a dataset we have to have a highly specialized dataset with exact variable names as required by the format statement.

Either you can output the dataset (from now on reffered to as control dataset) using CNTLOUT option and use its structure add values or use its subset as the input control dataset.
Below is the proc contents output for a control dataset :

   DATATYPE    Char      8    Date/time/datetime?
   DECSEP      Char      1    Decimal separator
   DEFAULT     Num       3    Default length
   DIG3SEP     Char      1    Three-digit separator
   EEXCL       Char      1    End exclusion
   END         Char     16    Ending value for format
   FILL        Char      1    Fill character
   FMTNAME     Char     32    Format name
   FUZZ        Num       8    Fuzz value
   HLO         Char     11    Additional information
   LABEL       Char     12    Format value label
   LANGUAGE    Char      8    Language for date strings
   LENGTH      Num       3    Format length
   MAX         Num       3    Maximum length
   MIN         Num       3    Minimum length
   MULT        Num       8    Multiplier
   NOEDIT      Num       3    Is picture string noedit?
   PREFIX      Char      2    Prefix characters
   SEXCL       Char      1    Start exclusion
   TYPE        Char      1    Type of format
   START       Char     32    Starting value for format

The Ones in bold are the minimum required variables in the input control dataset to create a format successfully. Afetr creating the control datset only a small piece of code is left :

proc format cntlin=fmts;
run;

And your format is created. This technique is very useful in creating formats dynamically based on the values received in a dataset.

The PWENCODE Procedure:

A small but useful procedure serving only a single purpose, it encrypts your password using one of its 3 techniques (Generally technique doesn’t matter). SO you can use the encoded password instead of the real password.

Syntax

PROC PWENCODE IN=’password’ <OUT=fileref> <METHOD=encoding-method>;
RUN;

IN= Specify your actual password here
OUT= specifies the file where your encoded password needs to be printed, default is log

You can use this technique to save your encrypted password to a secure file and can read the password in a datastep and use it anywhere without having to specify your actual password anywhere.

The OPTIONS Procedure:

Many of us will consider the heading above as a typing mistake. Everyone has used options but is there an options procedure..??
And if any exists then what extra it will than just specifying options which we can do with just ‘options’ statement!!

Syntax :

PROC OPTIONS <option(s)>;
RUN;

Proc options does 3 jobs :

1) Dsiplays a list of all syste options.

PROC OPTIONS;
RUN;

There are a host of options to control how your listing looks. You can get the complete list and their usage in SAS® 9.2 Procedures Guide. Your log looks something like this :

  SAS (r) Proprietary Software Release 9.2  TS1M0

Portable Options:

 APPLETLOC=C:\Program Files\SAS\SASGraphJavaApplets\9.2
                   Location of Java applets
 ARMAGENT=         ARM Agent to use to collect ARM records
 ARMLOC=ARMLOG.LOG Identify location where ARM records are to be written
 ARMSUBSYS=(ARM_NONE)
                   Enable/Disable ARMing of SAS subsystems
 NOASYNCHIO        Do not enable asynchronous input/output
 AUTOSAVELOC=      Identifies the location where program editor contents are auto saved
 NOAUTOSIGNON      SAS/CONNECT remote submit will not automatically attempt to SIGNON
 BINDING=DEFAULT   Controls the binding edge for duplexed output
 BOMFILE           Add Byte Order Mark when creating Unicode files
 BOTTOMMARGIN=0.000 IN
                   Bottom margin for printed output
 BUFNO=1           Number of buffers for each SAS data set
 BUFSIZE=0         Size of buffer for page of SAS data set
 BYERR             Set the error flag if a null data set is input to the SORT procedure
 BYLINE            Print the by-line at the beginning of each by-group
 BYSORTED          Require SAS data set observations to be sorted for BY processing
 NOCAPS            Do not translate source input to uppercase
 NOCARDIMAGE       Do not process SAS source and data lines as 80-byte records
 CATCACHE=0        Number of SAS catalogs to keep in cache memory
 CBUFNO=0          Number of buffers to use for each SAS catalog

2) The second thing it does is to do the above job for a specific option. It also displays some extra information about that option when we use define in the end.

options linesize=80;
proc options option=linesize define;
run;

3) Displays info about the system option groups.

proc options listgroups;
run;

When the option issue has come up I will give a tip which I generally use :

If you require to change the value in a program and then restore its value back after sometime you generally declare that option 2 times once to change and other to reset it back. Though this does the job but is not very smart and you have to go and check the original value. You can use a trick like.

1) Get the option value dynamically. There are two ways to this

data _null_;
  call symput(‘lnsz_vlaue’,getoption('linesize'));
run;
%put & lnsz_vlaue;


%macro hello();
  %global  lnsz_vlaue ;
  %let lnsz_vlaue=%sysfunc(getoption(linesize));
  %put linesize=&lnsz_vlaue;
%mend;

%hello;

2) Change to your desired value.

options linesize=100;

3) Reset it back to the original value dynamically.

options linesize=&lnsz_vlaue;

Though it does the same job but with this you can be absolutely sure that in future also under different option values this code will work perfectly.

Conclusion : After reading this post the users will be having a good workable knowledge on the procedures outlined above and I will expect them to strat utilizing the capabilities of these procs.

Will be back with some more magic of SAS. Till then Goodbye.

References :
SAS 9.2 procedures guide by SAS Institute.

Saurabh Singh  Chauhan
(er.chauhansaurabh@gmail.com)         
Note: Comments and suggestions are always welcome.

Disclaimer :
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc.in the USA and other countries. ® Indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies. 
The contents of this post are the works of the author(s)and do not necessarily represent the opinions,recommendations, or practices of any organization whatsoever.

1 comment:

  1. Thank you for bringingthe up these options. Some where indeed unknown to me.
    -ap

    ReplyDelete