Arrays in
SAS
You must have seen
arrays being used in various programming languages and must already know quite
a bit about them, but before we go into the details let me tell you one thing :
ARRAYS IN SAS ARE
DIFFERENT FROM ANY OTHER LANGUAGE..!!
In most of the
languages array is a data structure, holding data values, but in SAS it is not
a data structure, it is just a collective name given to a group of variables.
Being clear with this distinction is very integral to the part of using arrays
successfully in SAS.
In SAS the most
important function of the array is to reduce the lines of code where a programs
involves repetitive calculations on different variables. I have seen most
programmers shy away from their use, but let me tell you that it is a wonderful
tool to make your code simple. Also in some instances arrays provide flexibility to make your code
dynamic, I’ll get back to this later.
So let us get down
to the dirty details :
What are arrays?
An Array is a grouping
of variables of the same type used to perform repetitive operations on those
variables.
Syntax:
Array array_name[dimension] $ length variable-list
Array – The ARRAY keyword for decleration
Array_name – Any valid SAS name (Do not use
function names)
Dimension – Number of elements in the array(If
unknown can use * but then we have to provide the varable list)
$ - Tells that the array is character array.
Length – Length of each variable
Variable list – List of variables to be part
of the array(Can use named ranges)
Usage:
We have a dataset
with 10 variables(kg1 to kg10) containing weights of a patients measure for 10
consecutive weeks. We want to convert these weight to pounds. One way we can do
it is :
data
kg_to_lbs;
set
weights;
lbs1=kg1*2.2;
lbs2=kg2*2.2;
lbs3=kg3*2.2;
...
lbs10=kg10*2.2;
run;
But this generates 13 lines of code for one
simple calculation. Imagine when we have 100 such measurements, So here we can
use arrays to make code shorter and easy to understand:
data
kg_to_lbs;
set
weights;
array
kg_array {10} kg1-kg10;
array
lbs_array {10} lbs1-lbs10;
do
i = 1 to
10;
lbs_array{i} = (kg_array{i})*2.2;
end;
run;
cool na..and there is
nothing complicated about this, you just need to declare an array and use a
simple do loop.
Below are a few points
about arrays which will be all you need to know about them to utilize arrays to
their full potential.
Important points to note about arrays:
1) Variables
used in an array must be of the same type. Either all numeric or all character.
2) Variables
need not be preexisting variables, if they does not exist then SAS creates them
for you. This converts into a useful application of creating variables through
arrays.
3) SAS
needs to know the size(number of elements) of the array while you are creating
it. You can supply the size in brackets
next to the array name
Array test[10] $5;
Any array of 10 elements
Or you can let SAS count them for you using
the number of variables in the variable list.
Array
test1[*] $ var1 – var5
You cannot emit both the dimension and variable
list together.
4) Array
does not accept numeric variables in the brackets whose value may define the
dimension because it creates the array in compile time and the value of the
variable used in brackets will be available only in compile time.
Array
test[num_var];
ERROR : Array requires a numeric constant
5) If
we want to use all numeric or all character variables of a dataset without
bothering about their names then you can declare the array like :
Array
nums[*] _NUMERIC_;
Array
nums[*] _CHARACTER_;
6) Sometimes
we need an array to hold values temporarily but do not want to output those
variables to the final dataset then we can use temporary arrays. They are
declare as :
Array
arr_name[10] _TEMPORARY_;
Functions used with arrays :
DIM FUNCTION :
This function helps to determine the number
of elements in a array dynamically so
while looping you do not have to hardcode it.
For e.g.
Do i=1 to 10;
A[i]= b[i] + c[i]
End;
Can also be written as :
Do i=1 to dim(a);
A[i]=
b[i] + c[i]
End;
VNAME FUNCTION :
We get the values of variables in an array
using the array name and subscript but if we want the name of the element(variable)
by its subscript then we can use the vname function. For e.g.
array arr_name[*] X Y Z P Q R;
i=3;
vars = Vname(arr_name [i]);
So vars will be initialized to the name of
the third variable in the list which is Z.
OF OPERATOR :
This operator is very useful when we have to
perform and operation on all elements(or variables) of an array. For e.g
We need a sum of all elements of an array we
can write :
X=sum(a[1],a[2],a[3],a[4],a[5],a[6],a[7]…)
But if we don’t know how many elements are
there or it changes every time then you need to update it again and again, so
instead we can write
X=sum(of a[*]);
Cool and easy.. :-)
Here is the SAS page if you need to go
further and read more about arrays :
Conclusion: So
after reading this paper I hope readers will be able to utilize arrays in their
code for flexibility and more structured programs.
That’s it.
Will be back with some more SAS
magic. Goodbye Till then and keep learning.
Saurabh Singh
Chauhan
(er.chauhansaurabh@gmail.com)
Note: Comments and suggestions are always
welcome.
Disclaimer :
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc.in the USA and other countries. ® Indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies.
Disclaimer :
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc.in the USA and other countries. ® Indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies.
The contents of this post are the works of the author(s)and do not necessarily represent the opinions,recommendations, or practices of any organization whatsoever.
Please help with the following program why the second program is not working.
ReplyDeleteproc means data=learn.blood noprint;
var Chol;
output out = newds(keep=AveChol)
mean = AveChol;
run;
proc means data=learn.blood noprint;
var Chol;
output out = newds(keep= mean);
run;