1 / 87

Introduction to SAS LISA Short Course Series

Introduction to SAS LISA Short Course Series. Mark Seiss, Dept. of Statistics. Reference Material. The Little SAS Book – Delwiche and Slaughter SAS Programming I: Essentials SAS Programming II: Manipulating Data with the DATA Step Presentation and Data

emele
Télécharger la présentation

Introduction to SAS LISA Short Course Series

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to SASLISA Short Course Series Mark Seiss, Dept. of Statistics

  2. Reference Material • The Little SAS Book – Delwiche and Slaughter • SAS Programming I: Essentials • SAS Programming II: Manipulating Data with the DATA Step • Presentation and Data • http://www.lisa.stat.vt.edu/?q=node/167

  3. Presentation Outline 1. Introduction to the SAS Environment 2. Working With SAS Data Sets 3. Summary Procedures 4. Basic Statistical Analysis Procedures

  4. Presentation Outline • Questions/Comments

  5. 1. SAS Programs 2. SAS Data Sets and Data Libraries 2. Creating SAS Data Sets Introduction to the SAS Environment

  6. File extension - .sas Editor window has four uses: Access and edit existing SAS programs Write new SAS programs Submitting SAS programs for execution Saving SAS programs SAS program – sequence of steps that the user submits for execution Submitting SAS programs Entire program Selection of the program SAS Programs

  7. Syntax Rules for SAS statements Free-format – can use upper or lower case Usually begin with an identifying keyword Can span multiple lines Always end with a semicolon Multiple statements can be on the same line Errors Misspelled key words Missing or invalid punctuation (missing semi-colon common) Invalid options Indicated in the Log window SAS Programs

  8. 2 Basic steps in SAS programs: Data Steps Typically used to create SAS datasets and manipulate data, Begins with DATA statement Proc Steps Typically used to process SAS data sets Begins with PROC statement The end of the data or proc steps are indicated by: RUN statement – most steps QUIT statement – some steps Beginning of another step (DATA or PROC statement) SAS Programs

  9. Output generated from SAS program – 2 Windows SAS log Information about the processing of the SAS program Includes any warnings or error messages Accumulated in the order the data and procedure steps are submitted SAS output Reports generated by the SAS procedures Accumulates output in the order it is generated SAS Programs

  10. SAS Data Set Specifically structured file that contains data values. File extension - .sas7bdat Rows and Columns format – similar to Excel Columns – variables in the table corresponding to fields of data Rows – single record or observation Two types of variables Character – contain any value (letters, numbers, symbols, etc.) Numeric – floating point numbers Located in SAS Data Libraries SAS Data Sets and Data Libraries

  11. SAS Data Libraries Contain SAS data sets Identified by assigning a library reference name – libref Temporary Work library SAS data files are deleted when session ends Library reference name not necessary Permanent SAS data sets are saved after session ends SASUSER library You can create and access your own libraries SAS Data Sets and Data Libraries

  12. SAS Data Libraries cont. Assigning library references Syntax LIBNAME libref ‘SAS-data-library’; Rules for Library References 8 characters or less Must begin with letter or underscore Other characters are letters, numbers, or under scores SAS Data Sets and Data Libraries

  13. SAS Data Libraries cont. Identifying SAS data sets within SAS Data Libraries libref.filename Accessing SAS data sets within SAS Data Libraries Example: DATA new_data_set; set libref.filename; run; Creating SAS data sets within SAS Data Libraries Example: DATA libref.filename; set old_data_set; run; SAS Data Sets and Data Libraries

  14. Creating a SAS data sets from raw data 4 methods 1. Importing existing raw data in SAS program 2. Manually entering raw data in SAS program 3. Importing existing data sets using Import menu option 4. Manually entering raw data using Table Editor Creating SAS Data Sets

  15. Importing existing raw data in SAS program 1. Start Data step and name the SAS data set to be created (include SAS Data library to be stored in) DATA libref.SAS-data-set; 2. Identify the file that contains the raw data file (.dat file) INFILE ‘raw-data-filename’; 3. Provide instruction on how to read data from raw data file INPUT input-specifications; Creating SAS Data Sets

  16. Input Specifications Specifies the names of the SAS variables in the new data set Specifies whether the SAS variables are character or numeric Identifies the locations of the variables in the raw data file List Input Column Input Formatted Input Mixed Input Creating SAS Data Sets

  17. List Input Used when raw data is separated by spaces All data in a row must be read in All missing data must be indicated by period Simple character data – no embedded spaces, no lengths greater than 8 INPUT statement Simply list variables after the INPUT keyword in the order they appear on file. If variables are character format, place a $ after the variable name Example) INPUT Name $ City $ Age Height Weight Sex $; Creating SAS Data Sets

  18. Column Input Used when raw data file does not have delimiters between values (large data sets) Each variable’s values are found in the same columns in each row Numeric data must be standard – numbers, decimals, signs, and scientific notation only Advantages No spaces required Missing values left blank Character data can have embedded spaces Ability to skip unwanted variables Creating SAS Data Sets

  19. Column Input cont. INPUT Statement Numeric variables – list variable name then list column or range of columns where the variable is found on the raw data file Character variables – list variable name, dollar sign, and then column or range of columns Example) INPUT Name $ 1-10 Age 26-28 Sex $ 35; Creating SAS Data Sets

  20. Formatted Input Appropriate for reading: Data in fixed columns Standard and nonstandard character and numeric data Calendar values to be converted to SAS date value Read data in using SAS informats Instruction that SAS uses to read in data values General forms Character - $informatw. Numeric – informatw.d Date – informatw. Creating SAS Data Sets

  21. Formatted Input cont. Character Informats $w. – character string with a width of w, trims leading blanks $charw. – character string with a width of w, does not trim leading or trailing blanks Numeric Informats w.d – standard numeric data with width w and d numbers after the decimal Raw Data Value = 1234567  informat = 8.2  SAS Data Value = 12345.67 COMMAw.d – numeric data with embedded commas Raw Data Value =1,000,001  informat=COMMA10. SAS Data Value=1000001 Creating SAS Data Sets

  22. Formatted Input cont. SAS date values Stored as special numeric number data Number of days between January 1, 1960 and the specified data Informats are used to read and convert the dates Creating SAS Data Sets

  23. Formatted Input cont. Columns read are determined by the starting point and width of the informat Example: INPUT Name $10. Age 3. Height 5.1 BirthDate MMDDYY10.; - Name – Character of length 10, columns 1-10 - Age – Numeric with length 3, columns 11-13 - Height – Numeric with length 5 (including decimal) and one decimal place (120.9 for instance), columns 14-18 - Birthdate – Date format MMDDYY (11-04-2009 for instance), columns 19 - 28 Creating SAS Data Sets

  24. Formatted Input cont. Pointer controls +n moves pointer n positions @n moves pointer to column n Example: INPUT Flight 3. +4 Date mmddyy8. @20 Destination $3.; Flight - Number of length 3, columns 1 through 3 Date – Date format mmddyy (11/04/09) of length 8, columns 8 through 15 Destination – Character of length 3, columns 20 through 22 Creating SAS Data Sets

  25. Mixed Formatted Input Styles Mix and match the previous 3 input styles Example: Raw Data: Great Smoky Mountains NC/TN 1926 520,269 INPUT ParkName $ 1-22 State $ Year @40 Acreage COMMA9.; - Parkname - Character of length 22, columns 1 through 22 - State - Character, separated by spaces - Year - Numeric, separated by spaces - Acreage - Numeric with informat COMMA9., starts column 40 Creating SAS Data Sets

  26. Manually Entering Raw Data Files in SAS program 1. Start Data step and name the SAS data set to be created DATA library.SAS-data-set; 2. Provide instructions on how to read data from raw data file INPUT input-specifications; 3. Manually enter raw data DATALINES; <Raw Data> Creating SAS Data Sets

  27. Manually Entering Raw Data Files in SAS program Example: Data uspresidents; INPUT President $ Party $ Number; DATALINES; Adams F 2 Lincoln R 16 Grant R 18 Kennedy D 35 ; Run; Creating SAS Data Sets

  28. Using the import data menu option 1. File  Import Data 2. Standard data source  select the file format 3. Specify file location or Browse to select file 4. Create name for the new SAS data set and specify location Creating SAS Data Sets

  29. Compatible file formats Microsoft Excel Spreadsheets Microsoft Access Databases Comma Separate Files (.csv) Tab Delimited Files (.txt) dBASE Files (.dbf) JMP data sets SPSS Files Lotus Spreadsheets Stata Files Paradox Files Creating SAS Data Sets

  30. Enter raw data directly into a SAS data set 1. Tools  Table Editor 2. Enter data manually into table - Observations in each row - Variables in each column 3. Left Click Column  Column Attributes - Variable Name, Variable Label, Type – Character/Numeric, Format, Informat Note: Informats determine how raw data is read. Formats determine how variable is displayed. 4. Close window  Save Changes – Yes  Specify File name and directory Creating SAS Data Sets

  31. Introduction to the SAS Environment • Questions/Comments

  32. 1. Data Set Manipulation 2. Data Set Processing 3. Combining Data Sets A. Concatenating/Appending B. Merging Working With SAS Data Sets

  33. Create a new SAS data set using an existing SAS data set as input Specify name of the new SAS data set after the DATA statement Use SET statement to identify SAS data set being read Syntax: DATA output_data_set; SET input_data_set; <additional SAS statements>; RUN; By default the SET statement reads all observations and variables from the input data set into the output data set. Data Set Manipulation

  34. Assignment Statements Evaluate an expression Assign resulting value to a variable General Form: variable = expression; Example: miles_per_hour = distance/time; SAS Functions Perform arithmetic functions, compute simple statistics, manipulate dates, etc. General Form: variable=function_name(argument1, argument2,…); Example: Time_worked = sum(Day1,Day2, Day3, Day4, Day5); Data Set Manipulation

  35. Selecting Variables Use DROP and KEEP to determine which variables are written to new SAS data set. 2 Ways DROP and KEEP as statements Form: DROP = Variable1 Variable2; KEEP = Variable3 Variable4 Variable5; DROP and KEEP options in SET statement Form: SET input_data_set (KEEP=Var1); Data Set Manipulation

  36. Conditional Processing Uses IF-THEN-ELSE logic General Form: IF <expression1> THEN <statement>; ELSE IF <expression2> THEN <statement>; ELSE <statement>; <expression> is a true/false statement, such as: Day1=Day2, Day1 > Day2, Day1 < Day2 Day1+Day2=10 Sum(day1,day2)=10 Day1=5 and Day2=5 Data Set Manipulation

  37. Conditional Processing Data Set Manipulation

  38. Conditional Processing cont. If <expression1> is true, <statement> is processed ELSE IF and ELSE are only processed if <expression1> is false Only one statement specified using this form Use DO and END statements to execute group of statements General Form: IF <expression> THEN DO; <statements>; END; ELSE DO; <statements>; END; Data Set Manipulation

  39. Subsetting Rows (Observations) We will look at two ways Using IF statement Using WHERE option in SET statement IF statement Only writes observations to the new data set in which an expression is true; General Form: IF <expression>; Example: IF career = ‘Teacher’; IF sex ne ‘M’; In the second example, only observations where sex is not equal to ‘M’ will be written to the output data set Data Set Manipulation

  40. Subsetting Rows (Observations) cont. Where Option in SET statement Use option to only read rows from the input data set in which the expression is true General Form: SET input_data_set (where=(<expression>)); Example: SET vacation (where=(destination=‘Bermuda’)); Only observations where the destination equals ‘Bermuda’ will be read from the input data set Comparison Resulting output data set is equivalent IF statement – all rows read from the input data set Where option – only rows where expression is true are read from input data set Difference in processing time when working with big data sets Data Set Manipulation

  41. PROC SORT sorts data according to specified variables General Form: PROC SORT DATA=input_data_set <options>; BY Variable1 Variable2; RUN; Sorts data according to Variable1 and then Variable2; By default, SAS sorts data in ascending order Number low to high A to Z Use DESCENDING statement for numbers high to low and letters Z to A BY City DESCENDING Population; SAS sorts data first by city A to Z and then Population high to low Data Set Manipulation

  42. Some Options NODUPKEY Eliminates observations that have the same values for the BY variables OUT=output_data_set By default, PROC SORT replaces the input data set with the sorted data set Using this option, PROC SORT creates a newly sorted data set and the input data set remains unchanged Data Set Manipulation

  43. Data Set Processing DATA steps read in data from existing data sets or raw data files one row at a time, like a loop DATA step reads data from the input data set in the following way: 1. Read in current row from input data set to Program Data Vector (PDV) 2. Process SAS statements 3. PDV to output data set 4. Set current row to the next row in the input data set 5. Iterate to Step 1 One row at a time is processed Thus we cannot simply add the value of a variable in one row to the value in another row Data Set Processing

  44. Data Set Processing – Example Let the following be the input data set dfwlax: Data Set Processing

  45. Data Set Processing – Example Consider the following submitted code: DATA onboard; SET dfwlax; Total=FirstClass+Economy; IF FirstClass=20 then FirstClassFull=1; ELSE FirstClassFull=0; RUN; Data Set Processing

  46. Data Set Processing – Example Execution of the Data Step DATA onboard; Current  SET dfwlax; Total=FirstClass+Economy; IF FirstClass=20 then FirstClassFull=1; ELSE FirstClassFull=0; RUN; PDV Onboard Data Set Processing

  47. Data Set Processing – Example Execution of the Data Step DATA onboard; SET dfwlax; Current  Total=FirstClass+Economy; IF FirstClass=20 then FirstClassFull=1; ELSE FirstClassFull=0; RUN; PDV Onboard Data Set Processing

  48. Data Set Processing – Example Execution of the Data Step DATA onboard; SET dfwlax; Total=FirstClass+Economy; Current IF FirstClass=20 then FirstClassFull=1; ELSE FirstClassFull=0; RUN; PDV Onboard Data Set Processing

  49. Data Set Processing – Example Execution of the Data Step DATA onboard; SET dfwlax; Total=FirstClass+Economy; IF FirstClass=20 then FirstClassFull=1; ELSE FirstClassFull=0; Current RUN; PDV Onboard Data Set Processing

  50. Data Set Processing – Example Execution of the Data Step Current DATA onboard; SET dfwlax; Total=FirstClass+Economy; IF FirstClass=20 then FirstClassFull=1; ELSE FirstClassFull=0; RUN; PDV Onboard Data Set Processing

More Related