READING EXTERNAL DATA FILES IN SAS
PART 1 : Input methods
Objective : This article aims at discussing
various Input methods used read raw data in a SAS dataset in an objective
manner which is easy to understand for the user and also help the user
understand when and which method is to be used.
Theory : In SAS reading raw data can be
accomplished in various ways and which method to use depends purely on the type
of data being imported into SAS.
There are
four major methods of reading raw data files. These include:
1) List Input
2) Column input
3) Formatted Input
4) Named Input
A)
LIST INPUT :
List input
sequentially reads your input data lines
for data values. Data values do not have to be aligned in columns, but they have
to be separated from one another by a delimiter such as a space or other
delimiter such as a comma (,). Space or blank is the default delimiter, to specify
some othe delimiter use DLM= option in infile statement.
List input
requires only that you specify the variable names to be assigned to the data
values in your input data line.If you are creating the variable for the first
time in the input statement (variable not defined in a length or attrib
statement), a dollar sign ($) must follow the name of a character variable.
List input
is the most basic form of data reading (of course after cards and datalines..!!
) so it has certain restrictions like :
- Data
values MUST be separated by a blank or a delimiter
- A
placeholder must specifically show a missing value wherever present(e.g.
period
for numeric missing value)
- For
reading a character value greater than 8 characters long you have to
specifically define the length in a LENGTH or ATTRIB statement. Etc etc…
Below is
the sample code where an external delimited raw data file is read with list
input. Assume that an external space delimited file results.dat exists in the given
directory.
data
results;
infile 'c:\myfiles\results.dat'
DLM=’,’;
input rollno name $ age subject1 subject2 subject3 subject4
;
run;
Simple
right ?
This is the
most basic way of reading data and as data can be arranged in different ways we
have other forms of input.
B) COLUMN INPUT
Column
input reads standard data values which are aligned in columns. In a column
input a column range is specified after the variable name, if it is a character
then a $ follows the variable name which in turn is followed by the column range. Column input solves certain limitation of
the above list input method:
-
In column
input missing values need not be specifically shown with a period or
consecutive delimiters, it just have to be missing..!!
-
You can
specify column names in input statement in any order you like.
-
Character
data values can have embedded delimiters.
But it has some
limitations as well:
-
Data values
must be in same column for all data lines (That’s why
it is called
column input right!!)
-
Leading
blanks are removed from the field.
Below is a
small example for reading data through column input:
data
results;
infile 'c:\myfiles\results.dat';
input roll_no 1-6 name $ 9-20
age 7-8 subject1 21-25
subject2 26-30;
run;
C) FORMATTED INPUT
Unlike the
above two cases where we are reading only standard data, formatted input
provides us a way to read non-standard data. The @ and + pointers which we use
in formatted input provide a lot of flexibility in reading data.
Some
advantages of formatted input include:
-
Character
values can contain embedded blanks.
-
Placeholders
for missing values are not required.
-
We
can move through data in a direct manner (@) or a manner relative to
current
position from where the data is being read(+) .
-
Data
values can be specified in any order in the input statement.
@ Pointer
tells the pointer to move to the column specified after the @ sign.
+ Pointer
tells the pointer to move further columns from the current position which is
specified after the + sign.
data
patients;
infile 'c:\readdata\results.dat';
input @9 name $12. @1 roll no 6.
@7 age 2. @21
(subject1-subject3) (8. +1) ;
run;
D) MODIFIED LIST INPUT
Modified
list input is a combination of list input and formatted input. Like list input,
this style is restricted to reading variables in order. The data values do not
have to be aligned in columns, but they do have to be separated from one
another by a space or other delimiter such as a comma. Additionally, you can
include informats that allow you to read more complex data values than you can
read with simple list input.
In modified
list input we use format modifiers to read complex data. There are three types
of format modifiers that are used:
1)
AMPERSAND (&) – This modifier specified after the variable name helps in
reading data with embedded delimiters. SAS will stop reading the value when 2
consecutive delimiters are encountered.
2) TILDE
(~) This character at the top left of your keyboard just below the Esc key
tells SAS to read single and double quotation marks and delimiters as part of
the data.
3) COLON
(:) The colon modifier is used to specify informats for data values in some
specific format like dollar8. The informat is specified right after the colon.
SAS will read data until the delimiter is encountered or the width specified in
the informat is reached.
An example
using column input can be like :
data
hospital;
infile datalines delimiter=',';
input pat_name : $20.
Med_hist ~ $50.;
datalines;
George Smith,Had a long history of bronchitis
Wiber Stuart,Sensory organs stopped functioning at the age of 14
Rosy Mary,Had acute heart pain and nausea
;;;;
run;
Conclusion : The aim of this report was not to
have an exhaustive pile of all concepts to read raw data files but to give a
concise and easy understanding of options available for reading raw data files
and their merits and demerits over each other.
Reading data is a very interesting
concept in SAS and there are many challenges that you can face. So keep
practicing with whatever data you get. Other types of data reading will be
covered in a different post.
Will be
back with some SAS Magic again. Till then Goodbye..!!
Saurabh Singh
Chauhan
Note : Comments and suggestions are always welcome.
Disclaimer :
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc.in the USA and other countries. ® Indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies. The contents of this post are the works of the author(s)and do not necessarily represent the opinions,recommendations, or practices of any organization whatsoever.
Disclaimer :
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc.in the USA and other countries. ® Indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies. The contents of this post are the works of the author(s)and do not necessarily represent the opinions,recommendations, or practices of any organization whatsoever.
T-Shirt, T-Shirt, T-Shirt, T-Shirt, T-Shirt, T-Shirt, T-Shirt, T-Shirt, T-Shirt, T-Shirt, T-Shirt, T-Shirt.
ReplyDeleteT-Shirt is a stylish and unique titanium wedding bands design for women who like micro titanium trim having a close and intimate citizen eco drive titanium watch look. titanium wok T-Shirt covers the whole body and makes it citizen promaster titanium stand out Rating: 5 · 3 votes · $35.99 · In stock
additional reading cheap jerseys,wholesale jerseys from china,Cheap Jerseys china,cheap nfl jerseys,Cheap Jerseys china,Cheap Jerseys free shipping,Cheap Jerseys china,wholesale nfl jerseys from china,Cheap Jerseys free shipping,cheap nfl jerseys their explanation
ReplyDeletea152x3uxtsr444 wholesale sex toys,huge dildos,sex chair,women sex toys,sex chair,realistic dildo,dildo,realistic dildo,women sexy toys j568n5qkbjg019
ReplyDelete