1 / 23

SAS Exam Preparation Chapters 13 & 14

Learn about combining data sets using various methods such as one-to-one reading, concatenating, interleaving, and match-merging. Also, explore SAS functions for data manipulation.

candyb
Télécharger la présentation

SAS Exam Preparation Chapters 13 & 14

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAS Exam PreparationChapters 13 & 14 Wally Altman 10 July 2007

  2. Chapter 13 – Combining Data Sets Combining two or more data sets is a common task. Data sets can be combined in a DATA step in the following ways: • One-to-one reading • Concatenating • Interleaving • Match-merging

  3. One-to-One Reading • General form: DATA output_data_set; SET input_data_set_1; SET input_data_set_2; RUN; • New data set has all variables from each input data set. When two or more data sets share a variable name, values from the last data set are kept. • Number of observations in the new data set is the same as the number of observations in the smallest input data set.

  4. One-to-One Reading data one2one; set one; set two; run;

  5. Concatenating • Observations from one (or more) data set(s) are appended to the end of another data set. • General form: DATA output_data_set; SET input_data_set_1 input_data_set_2; RUN; • New data set has all observations and all variables from each input data set.

  6. Concatenating data concat; set one two; run;

  7. Interleaving Identical to concatenating, except the output data set is sorted. Syntax is the same as concatenation with an additional BY statement. data interleave; set one two; by num; run;

  8. Match Merging • Combines observations from two or more data sets into a single observation in a new data set based on the values of a common variable. • Uses a MERGE statement instead of a SET statement. • General form: DATA output_data_set; MERGE input_data_set_1 input_data_set_2; BY <DESCENDING> variable_list; RUN;

  9. Match Merging data merged; merge a b; by num; run;

  10. Match Merging • The output data set has all variables from each input data set. For variables appearing in more than one data set, the length comes from the first data set having the variable, while the value is that of the last data set with the variable. • Every observation from all input data sets is represented in the output data set, whether it is matched or not. This behaviour can be modified using IN= data set options and subsetting IF statements.

  11. Match Merging This example uses the RENAME= and IN= options and a subsetting IF statement. data merged; merge one(in=a) two(in=b rename=(year=year2)); by num; if a; run;

  12. Chapter 14 – SAS Functions SAS Functions are built-in routines that are used in data manipulation. • SAS functions are used in DATA step statements, and occasionally in statistical procedures. • A SAS function can be specified anywhere you would use a SAS expression.

  13. SAS Functions • General form: function-name(argument_1,…,argument_n); • Arguments can be: • Variables – mean(x,y,z); • Constants – mean(456,502,612,498); • Expressions – mean(37*2,192/5,mean(22,34,56)) • Even if a function does not require any arguments,the function name still must be followed by parentheses. • Arguments are typically separated by commas, but some functions allow variable lists or arrays with the OF keyword. • mean(of x1-x3); mean(of somearray{*});

  14. Converting Data with Functions • If a character variable is referenced in a numeric context, SAS will attempt an automatic character-to-numeric conversion. The program log will show this message: NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column) • Automatic conversion uses the w.dinformat. If the values don’t fit the informat (for example, numbers with commas or currency with a dollar sign), the conversion will fail and SAS will return a missing value.

  15. Converting Data with Functions • The automatic conversion can be avoided by using the INPUT function. • General form: INPUT(source,informat); • In the following example, payrate is a character variable with a two-digit integer value: • data newtemp; set temp; salary = input(payrate,2.) * hours; run; • For non-standard numeric data, use an appropriate informat (such as COMMAw.d for numbers with commas).

  16. Converting Data with Functions • Automatic conversion will also occur if a numeric variable is referenced in a character context. The automatic conversion uses the BEST12. format, which can result in leading blanks if the numeric variable is too short. • To avoid automatic numeric-to-character conversion, use the PUT function. • General form: PUT(source,format); • Example: • data newtemp; set temp; Assignment = put(site,2.) || ’/’ || dept; run;

  17. SAS Date Functions • SAS stores dates as a numeric value equal to the number of days from January 1, 1960. • Today’s date (July 10, 2007) as a SAS date is 17357. • SAS dates can be manipulated with arithmetic operators just like any other numbers. • The MDY function converts a readable date to a SAS date: MDY(month,day,year); • The TODAY(); and DATE() functions return the current date. • The TIME() function returns the current time as a SAS time (number of seconds since midnight).

  18. SAS Date Functions • DAY(date) returns the day of the month. • QTR(date) returns the quarter of the year. • WEEKDAY(date) returns the day of the week. • MONTH(date) returns the month of the year. • YEAR(date) returns a four-digit year. • INTCK(‘interval’,start,end) gives the number of specified time units between the start and end dates. • INTCK(‘day’,date1,date2) gives the number of days between date1 and date2. • INTCX(‘interval’,start,increment) modifies the start date by adding the ‘interval’ the number of times indicated by increment.

  19. Modifying Character Values • Many functions can be used to modify character values. • SCAN(source, n <,delimiters>) retrieves the nth word of source as determined by the specified delimiters (or a set of default delimiters). • scan(“607/555-1273”,2) returns 555. • TRIM(argument) removes trailing blanks. • fname = “STEVE “ lname = “STEVESON “ fname || lname returns “STEVE STEVESON “. trim(fname) || ‘ ‘ || trim(lname) returns “STEVE STEVESON”.

  20. Modifying Character Values • The SUBSTR function can be used on the left or right side of an assignment statement, and its behavior is different in each position. • SUBSTR(argument, startpos <,n>) retrieves n characters from argument starting at startpos, when used on the right of an equals sign. • mid=substr(“ABCDEFG”,2,5) assigns “BCDEF” to mid. • SUBSTR(argument, startpos <,n>) replaces n characters of argument (which must be a character variable name), starting from startpos, with the value on the right side of the equals sign when the function is called on the left. • substr(mid,3,1) = “Z” assigns “BCZEF” to mid.

  21. Modifying Numeric Values • There are lots of functions which modify numeric values. A few examples: • INT(value) returns the integer part of the number. • int(367.89) gives 367. • ROUND(value <, round-off>) rounds value to the nearest round-off. The default for round-off is 1. • round(367.89) returns 368 • round(367.89,10) returns 370 • round(367.89,.1) returns 367.9

  22. Nesting SAS Functions • If a function accepts a SAS expression as an argument, then SAS functions can be nested within it. • name = “STEVESON, STEVEN CARL” MiddleInitial = substr(name,scan(name,3),1); • The SCAN function finds the position of the first character in the third word, which is the middle name. The SUBSTR function starts at this position and captures one character from the name: “C”, the middle initial.

  23. It’s Over!

More Related