Friday, February 10, 2012

SAS Procedures Part 1


SOME UNDERUTILIZED SAS PROCEDURES
Part 1

Objective : The objective of this post is to make the readers aware of the capabilities of some unused and some used but underutilized procedures which can be of great help in day to day programming. The objective is not to give exhaustive knowledge about these procedures but to attract readers’ attention to these procs by interesting examples.

Theory:  SAS has a long and ever-growing list of procedures, I have covered just a few of them which are generally not used much but are very useful.

Procedures:

1) Proc RANK
This useful procedure is highly underutilized as it does a small job of assigning ranks.
The RANK procedure computes ranks for one or more numeric variables across the observations of a SAS data set and outputs the ranks to a new SAS data set. The syntax of the procedure is :

PROC RANK <option(s)>;
BY <DESCENDING> variable-1
<…<DESCENDING> variable-n>
<NOTSORTED>;
VAR data-set-variables(s);
RANKS new-variables(s);

By – Calculates ranks separately for each by group.
Ranks – Identifes the variable containing ranks. If omitted then original variable is replaced with ranks.
Var – Specifies the variable to be ranked.
There are different types of ranking methods that can be used. For full details about each ranking type read SAS Procedures guide.

This function also calculates statistical ranks.

Note :- One question frequently asked in interviews is, I want to select the third highest salary from employees. General answers are sort and select 3rd row etc etc.. but proc rank provides an elegent way of doing it.


data employees;
input id salary;
datalines;
1 2000
2 3000
3 1000
4 5000
5 1500
6 1200
;
run;

proc rank data=employees out=salaries(where=(rank=3)) descending;
var salary;
ranks rank;
run;

So now you can select any highest or lowest salary just by changing the number in where condition.

2) Proc COMPARE :

This procedure compares the contents of two sas datasets or selected variables from the same dataset.
Proc compare is generally used for validation of datasets created. For example- I create an inventory dataset for current month which is to be appended to the master inventory dataset. So it is good to do a Proc compare on two datasets excluding the observation so that you can get the differences in formats labels data types etc. before appending.
PROC COMPARE generates the following information about the two data sets that are being compared:

  • Whether matching variables have different values
  • Whether one data set has more observations than the other
  • What variables the two data sets have in common
  • How many variables are in one data set but not in the other
  • Whether matching variables have different formats, labels, or types.
  • A comparison of the values of matching observations.
The NOVALUES option suppresses the part of the output that shows the differences in the values of matching variables
Syntax :

PROC COMPARE base= compare=  <option(s)>;
  BY <DESCENDING> variable-1 <…<DESCENDING> variable-n> <NOTSORTED>;
  ID <DESCENDING> variable-1 <…<DESCENDING> variable-n> <NOTSORTED>;
  VAR variable(s);
  WITH variable(s);
Run;

By – Produces a separate comparison for each by group.
ID - Identify variables to use to match observations.
Var - Restrict the comparison to values of specific variables.
With – Used to compare variales of different names or two variables from same dataset.

3) Proc CONTENTS:

Syntax :
Proc contents data=dset  <out=dset1>;
Run;

The contents procedure is one of the most popular procedures in SAS. Generally this procedure is used to see the contents of a dataset in a listing output to check variable formats, informats, labels etc.

But there is another way to use this procedure, by giving an out= option which stores the comparison results in a output dataset. This dataset contains valuable information about the dataset on which the contents is applied.

This dataset contains very valuable information about the dataset (variables, types, formats, informats, labels etc.), Also it contains few very useful information like sorted by which can be used.

This output dataset can be utilized in automating repetitive processes For e.g. you need to report all datasets in a library and list the variable(s) it is sorted by, then this dataset can be used.

4) Proc COPY :

Syntax :

PROC COPY OUT=libref-1 IN=libref-2 <CLONE|NOCLONE> <CONSTRAINT=YES|NO>  
           <DATECOPY> <INDEX=YES|NO>  <MEMTYPE=(mtype-1 <...mtype-n>)>
           MOVE <ALTER=alter-password>>;
  EXCLUDE SAS-file-1 <...SAS-file-n> </ MEMTYPE=mtype>;
  SELECT SAS-file-1 <...SAS-file-n> </ <MEMTYPE=mtype>
         <ALTER=alter-password>>;
Run;

The COPY procedure copies one or more SAS files from a SAS library. Generally, the COPY procedure functions the same as the COPY statement in the DATASETS procedure. The two differences are as follows:
  • The IN= argument is required with PROC COPY. In the COPY statement, IN= is optional. If IN= is omitted, the default value is the libref of the procedure input library.
  • PROC DATASETS cannot work with libraries that allow only sequential data access.
The COPY procedure, along with the XPORT engine and the XML engine, can create and read transport files that can be moved from one host to another. PROC COPY can create transport files only with SAS data sets, not with catalogs or other types of SAS files.

Transporting is a three-step process:

1 Use PROC COPY to copy one or more SAS data sets to a file that is created with either the transport (XPORT) engine or the XML engine. This file is referred to as a transport file and is always a sequential file.

2 After the file is created, you can move it to another operating environment via communications software, such as FTP, or tape. If you use communications software, be sure to move the file in binary format to avoid any type of conversion. If you are moving the file to a mainframe, the file must have certain attributes.

3 After you have successfully moved the file to the receiving host, use PROC COPY to copy the data sets from the transport file to a SAS library.

5) Proc DATASETS :
Proc datasets is a multipurpose procedure designed to accomplish various tasks for managing your sas files. Proc datasets can do the following:
  • copy SAS files from one SAS library to another
  • rename SAS files
  • repair SAS files
  • delete SAS files
  • list the SAS files that are contained in a SAS library
  • list the attributes of a SAS data set, such as:
  • the date when the data was last modified
  • whether the data is compressed
  • whether the data is indexed
  • he DATASETS Procedure 􀀀 Sample PROC DATASETS Output 289
  • manipulate passwords on SAS files
  • append SAS data sets
  • modify attributes of SAS data sets and variables within the data sets
  • create and delete indexes on SAS data sets
  • create and manage audit files for SAS data sets
  • create and delete integrity constraints on SAS data sets

Syntax:

The syntax of proc datasets is mind-boggling and is no use writing (see syntax in SAS 9.2 procedures guide) as proc datasets support a variety of tasks, I’ll list a few of them which are used frequently.
One good use of proc datasets is to delete all or selected datasets from a library.

proc datasets lib=work kill memtype=data;
run;
quit;

Other important statements are:

APPEND - adds observations from one data set to another. This is most useful when the base file is large and a small file needs to be added.

CHANGE – changes the name of a SAS file in the input data library.

COPY – copies some or all members of one SAS library to another. This is primarily used to move datasets from one system or version to another.

Conclusion : This post aims at creating awareness on some procedures which are utilized to a very small part of their capability or procedures which are very handy for doing a small job but are unpopular.
This is the first post in procedures series, more will follow soon.

Will be back with some more magic of SAS. Till then Goodbye.


Saurabh Singh  Chauhan
(er.chauhansaurabh@gmail.com)
Note: Comments and suggestions are always welcome

References :
SAS 9.2 procedures guide by SAS Institute.

Disclaimer :
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc.in the USA and other countries. ® Indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies. 
The contents of this post are the works of the author(s)and do not necessarily represent the opinions,recommendations, or practices of any organization whatsoever.