1 / 77

- 101 for CMS Employees

Learn about SAS, an integration of software products that enables data manipulation, statistical analysis, forecasting, report writing, web reporting, and more in the Windows environment.

jasonperez
Télécharger la présentation

- 101 for CMS Employees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. - 101for CMS Employees

  2. What is SAS? • An integration of software products that enables you to perform: • data manipulation • statistical and mathematical analysis • planning, forecasting, and decision support • report writing and graphics • quality improvement • applications development • web reporting • data entry, retrieval, and management • data warehousing and data mining

  3. The Windows Environment Log Window ExplorerWindow Program Editor Output Window

  4. Opening and Saving SAS Programs • The SAS for Windows environment works much like any other Microsoft Windows application. • Click on “File” then “Open” from the program editor to open a SAS program. (Or click on the open folder icon.) • Click on “File” then “Save” from the program editor to save a SAS program. (Or click on the diskette icon.)

  5. SAS Commands • The following can be executed from the command line, a command box, or by clicking on a Windows button or pull-down menu. • SUBMIT: Click on the running man icon (or type “submit”) to execute SAS code from the Program Editor. • RECALL: Type “recall” from the program editor to bring back the code that was most recently executed. • CLEAR: Type “clear” to clear any window. • ZOOM: Click on the full screen icon or type “zoom” to make any window the full screen. (Other windows become icons across the bottom of the screen.) Command Box Running Man Icon Command Line

  6. Exercise: Open and Run a SAS Program • Open the program D:\BasicSAS\ex_01_Football.sas in the program editor. • Submit the code. • Check out the PROC PRINT report in the output window. • Check out the log window.

  7. Base SAS • SAS code primarily consists of: • DATA steps - provide a mechanism for manipulating data one record at a time and • PROCedures - processes that have been created to perform a specific task like sorting or summarizing data. • SAS DATA steps and procedures are made up of SAS statements, which are used to inform the system of a desired task.

  8. SAS Statements • SAS statements are used to inform the system of a desired task. All SAS statements end with a semicolon. • There are three SAS statements used in the procedure below: • PROC signals that a procedure is about to be executed. • BY identifies the variable that will be used to sort. • RUN tells the SAS system to execute the procedure. PROC SORT DATA = Hospital_Data_Set; BYHosp_name; RUN;

  9. SAS Statements (continued) • There are a over one hundred SAS statements listed at the SAS On-LineDoc. • New users only need concern themselves with about twenty-five. • The following statements are used within a DATA STEP: DATA - The beginning of a DATA STEP SET - Identifies an existing SAS data set MERGE - Identifies data sets to be joined together BY - Variables for by-group processing and merging ARRAY - An array of variables (var1, var2, var3) DO/END - A group of statements executed as a unit IF-THEN/ELSE - Executes statements that meet specific conditions WHERE - Selects observations that meet a particular condition OUTPUT - Writes the current observation to a SAS data set RUN - Executes the previously entered SAS statements

  10. SAS Statements (continued) • This next set of SAS statements are also used within a DATA step. FILE - Specifies the current output file for PUT statements INFILE - Identifies an external file to read with an INPUT statement PUT - Writes lines to the SAS log, to a SAS procedure output file, or to an external file specified with the FILE statement INPUT - Describes variables listed in the INFILE statement KEEP - Restricts the number of variables kept in SAS data sets LABEL - Assigns descriptive labels to variables FORMAT - Associates formats with variables (e.g. mmddyy) LENGTH - Specifies the number of bytes for storing variables RENAME - Specifies new names for variables in output SAS data sets RETAIN - Causes a variable that is created by an INPUT or assignment statement to retain its value from one iteration of the DATA step to the next

  11. SAS Statements (continued) • This next set of SAS statements are used outside of the SAS DATA Step. LIBNAME - Associates a SAS libref (a shortcut name for a directory) with data library (or directory). FILENAME - Associates a SAS fileref with an external file. OPTIONS - Changes the value of one or more SAS system options. TITLE - Prints title lines at the top of SAS output. (often used within PROCs) FOOTNOTE - Prints footnotes at the bottom of SAS output. (often used within PROCs)

  12. LIBNAME Statement • The LIBNAME statement is used to associate a directory to a library reference (libref), which is used in DATA STEPs and PROCedures. LIBNAMEhosp ‘N:\CMSSUG\SAS101’ ; DATA hosp.out_hos ; * This is a two-level data set name; SET hosp.in_hos; TotalCosts = SUM(Routine, Ancillary); RUN ; • The above code creates a SAS data set, out_hos, and stores it in the N:\CMSSUG\SAS101 directory. • Using the libref, hosp, in the SET statement tells SAS where to get the input data set, in_hos, which already exists on the N:\CMSSUG\SAS101 directory.

  13. Permanent vs. Temporary Data Sets • To work with data in SAS, you must convert your raw data into a SAS data set (or file). • Data Sets can be: • permanent • Two-level name - libref.SASdataset (e.g. mydir.BigHosps) • Stored in permanent directory - remains after the SAS session • temporary • One-level name – SASdataset or work.SASdataset (e.g. work.Bighosps) • Stored in temporary WORK directory - deleted when you exit the SAS session /* Permanent */ /* Temporary */ DATA mydir.BigHosps; SET temphosp; IF Beds > 500; RUN; DATA BigHosps; SET temphosp; IF Beds > 500; RUN; DATA work.BigHosps; SET temphosp; IF Beds > 500; RUN; =

  14. LIBNAME Statement (Exercise) • Execute the following LIBNAME statement: • In your program editor, type: LIBNAMEmydir ‘D:\Basic SAS'; • Then click on the running man (or type SUBMIT in the command box). • This code will assign the alias, mydir, to the directory D:\. We can use mydir in our SAS code. LIBNAME mydir ‘D:\’;

  15. SAS Data Sets • To work with data in SAS, you must convert your raw data into a SAS data set (or file). There are multiple ways to create a SAS data set from a raw data file: • Manually type data into a DATA step • Use a PROC IMPORT • Use a DATA step with an INPUT statement • Use the Import Wizard

  16. Turning Raw Data into Information SAS DATA SETS RAW DATA DATA STEP PROCs Information

  17. Creating SAS Data Sets • Manually typing data into a DATA step to create a SAS data set called, mydata1, with three records and three variables: DATA mydata1; LENGTH Team $ 8; Team = ‘Ravens’; Wins = 16; Losses=0; OUTPUT; Team = ‘Colts’; Wins = 0; Losses=16; OUTPUT; Team = ‘Steelers’; Wins = 8; Losses=8; OUTPUT; RUN ; PROC PRINT DATA=mydata1; RUN;

  18. Creating SAS Data Sets • Use mydata1 as the input data set to create another data set, mydata2, with an additional variable, Status: DATA mydata2; SET mydata1; IF Wins > Losses THEN Status = ‘Winning’; ELSE IF Wins < Losses THEN Status = ‘Losing’; ELSE Status = ‘Even’; RUN ; PROC PRINT DATA=mydata2; RUN;

  19. Turning Raw Data into Information SAS DATA SETS RAW DATA DATA STEP PROCs = PROC IMPORT Information

  20. Importing Data from Excel to SAS • Use Import Wizard • Click on: • File - Import Data • Check Standard data source (Microsoft Excel 97 or 2000) - Next • Browse (find your Excel file) – Open – Next • Enter a directory (default is work) and member name – Next • (Optional) Enter the filename to save PROC IMPORT code - Next • Finish • SAS pulls the variable names from the first row of the Excel file (This option can be turned off using an Import Wizard option.) • After running, check your log to see that the data set has been successfully created.

  21. Importing Data from Excel to SAS (Exercise) • Use the Import Wizard to import the following Excel file into SAS: • Excel File: D:\Basic SAS\MCR.xls • Libref (Directory): mydir (D:\Basic SAS) • SAS Data set: MCRdata • The Excel file has the following attributes: • The data source is the Medicare Cost Reports for Hospitals. • There is one record per hospital. • The variables are provider number, hospital name, urban/rural code, control type, FY begin date, FY end date, beds, Medicare payments, Medicare costs, Total revenue, and Total Costs. • Save the source code to D:\Basic SAS\ProcImport.sas

  22. Importing Data from Excel to SAS (continued) • This is the code generated by the Import Wizard (and stored in D:\Basic SAS\ProcImport.sas): PROC IMPORT OUT= mydir.MCRdata DATAFILE= “D:\Basic SAS\MCR.xls" DBMS=EXCEL2000 REPLACE; GETNAMES=YES; RUN; • You can run this code directly from the program editor.

  23. Importing Data from Excel to SAS (Exercise) • KNOW THY DATA!! • SAS determines the format of the data in a column using the first 8 rows of data. If a column has both character and numeric data missing values can occur. • To prevent this, use the “Data - Text to Columns” option in Excel. • Open the Excel spreadsheet • Highlight the entire (character / numeric) column • Click on the Data pull-down menu and the Text to columns option • Click Next twice • In Step 3 of 3 click on the Text data format • Click Finish • Import the spreadsheet N:\CMSSUG\SAS101\ex_02_import.xls using the SAS Import Wizard and look at the results • Then open the spreadsheet and following the Text to Columns procedure above, re-import the data

  24. SAS INPUT Statement • Describes the record layout of the input data and assigns values to corresponding SAS variables • Input Styles • Column Input • List Input • Formatted Input • Pointer Controls • Column Pointer Controls • Line-Hold Specifiers

  25. Column Input • The column numbers follow the variable name in the INPUT statement. These numbers indicate where the variable values are found in the input data records: INPUT name $ 1-8 age 11-12; • This INPUT statement can read the following data records: ----+----1----+----2----+ Peterson 21 Morgan 17 • Because NAME is a character variable, a $ appears between the variable name and column numbers.

  26. List Input • The variable names are simply listed in the INPUT statement. A $ follows the name of each character variable: INPUT name $ age; • This INPUT statement can read data values that are separated by blanks or aligned in columns (with at least one blank between): ----+----1----+----2----+ Peterson 21 Morgan 17

  27. Formatted Input • An INFORMAT follows the variable name in the INPUT statement. The INFORMAT gives the data type and the field width of an input value. INPUT name $char8. +2 age 2.; • This INPUT statement reads these data records correctly: ----+----1----+----2----+ Peterson 21 Morgan 17 • The pointer control of +2 moves the input pointer to the field that contains the value for the variable AGE.

  28. Pointer Controls • Column Pointer Controls indicate the column in which an input value starts. INPUT @1 name $char8. @11 age 2.; • This INPUT statement can read the following data records: ----+----1----+----2----+ Peterson 21 Morgan 17 • After reading a value for NAME, the pointer stops in column 9.

  29. Line-Hold Specifiers • Line-hold specifiers keep the pointer on the current input record when • a data record is read by more than one INPUT statement (trailing @) • Use a single trailing @ to allow the next INPUT statement to read from the same record. • Normally, each INPUT statement in a DATA step reads a new data record into the input buffer. • When you use a trailing @, the following occurs: • The pointer position does not change. • No new record is read into the input buffer.

  30. Line-Hold Specifiers Example • Assume this input data: ----+----1----+----2----+ 01 Peterson 21 02 Morgan 17 01 Johnson 69 • The following will read the DEPT variable first and check the value before reading the remainder input @1 dept $2. @; if dept=’01’ then input @4 name $8. @14 age 2.;

  31. SAS INFILE Statement • Identifies an external file to read with an INPUT statement • Useful INFILE Options • FIRSTOBS - first record to be read • OBS - last record to be read • DELIMITER - delimiter separating values • DSD - treats two consecutive delimiters as a missing value • EOF - identifies the end of the file

  32. Reading Past the End of a Line • When using @ or + pointer controls with a value that moves the pointer past the end of the current record, SAS goes to column 1 of the next record to read it. It also writes this message to the SAS log: NOTE: SAS went to a new line when INPUT statement reached past the end of a line. • Use MISSOVER in the INFILE statement to set the remaining INPUT statement variables to missing values if the pointer reaches the end of a record. INFILE ‘D:\my_file.txt’ MISSOVER; • Other options exist depending on the situation, though MISSOVER should almost always be used.

  33. INPUT Exercise #1 • Import the data located in the file: • N:\CMSSUG\SAS101\ex_01_input.txt • The record layout for this file is:VariableBegEndLengthFormat DRG 1 3 3 CHAR DIAG 5 9 5 CHAR CHRG_AMT 11 15 5 NUM ALWD_AMT 17 21 5 NUM

  34. INPUT Exercise #2 • Click on File - Import Data • Select: Comma Separated Values (*.csv) - Click Next • Click Browse • Open: N:\CMSSUG\SAS101\example.csv • Name the Member: csv_example • Click Finish • Type FSV in the Command Box and press Enter • Notice the DRG and DIAG fields were imported as numbers in lieu of characters • i.e. The leading zeros in the DRG are missing!

  35. INPUT Exercise #2 (cont.) • Close the FSVIEW window • Click on View – Enhanced Editor • Type RECALL in the Command Box and press Enter • Change the DRG FORMAT and INFORMAT to $3. • Change the DIAG FORMAT and INFORMAT to $5. • SUBMIT the program • Check the LOG for errors • Type FSV in the Command Box and press Enter • Note the difference!

  36. SAS Data Set Structure • The structure of a SAS data set includes: • Descriptor portion • Includes data set name, number of observations, number of variables, and characteristics of the variables • To view, use a PROC CONTENTS. • Data portion (a two-dimensional table) • Each row contains one observation (or record) and each column contains one variable (or field). • To view, use a PROC PRINT.

  37. PROC CONTENTS • The PROC CONTENTS prints the description of a SAS data set. PROC CONTENTS DATA= mydir.MCRdata; RUN; • To print only the variable names of a SAS data set: PROC CONTENTS DATA= mydir.MCRdataSHORT; RUN; • To print descriptor portion for all data sets in a library: PROC CONTENTS DATA=mydir._ALL_; RUN;

  38. PROC PRINT • The PROC PRINT prints the observations in a SAS data set using all or some of the variables. PROC PRINToptions; VARvariables; BYvariables; PAGEBYvariables; SUMvariables; SUMBY variables; RUN; • Common options include:DATA= (to specify the data set), (OBS=) (to specify the number of records to print) NOOBS (to suppress printing of the observation number), and SPLIT= (to split column headings).

  39. PROC PRINT • To print all observations and all variables: PROC PRINT DATA= mydir.MCRdata; RUN; • To print the first 10 observations: PROC PRINT DATA= mydir.MCRdata(OBS=10); RUN; • To print and sum the selected variables for all observations: PROC PRINT DATA= mydir.MCRdataNOOBS; VAR Prvdr_num HospName MedPay1; SUM MedPay1; RUN;

  40. SAS Variables • Variables can be character or numeric: • Character variables • Contain any value including letters, numbers, special characters, and blanks • Stored with a length of 1 – 32,767 bytes (1byte =1 character) • Numeric variables • Stored as floating point numbers in 8 bytes of storage by default. • 8 bytes provides space for 16 or 17 digits. • To create a new variable, use an assignment statement: • VariableName=expression;

  41. Naming SAS Data Sets and Variables • Rules for naming SAS data sets and variables: • Must be 1 to 32 characters in length • Must start with a letter (A-Z) or an underscore (_) • Can continue with any combination of numbers, letters, and underscores Valid Names Total Revenue TotRev96 tot_rev_1996 _totrev_ Invalid Names Total Revenue 1996TotRev tot#rev

  42. Functions • Used in an assignment statement to return values determined from a specified argument. • Examples: • TotalRevenue = SUM(Rev1, Rev2, Rev3); • TotalRevenue = SUM(OF Rev1-Rev3); • LastInitial = SUBSTR(LastName,1,1); • sas_date = INPUT(char_date, MMDDYY10.); Produce same result Can use OF and dash if variables have a common prefix.

  43. Dropping and Keeping Variables Method One • Use a DROP or KEEP statement in a DATA step to control which variables are written to a new SAS data set. Syntax: DROP variables; KEEP variables; Method Two • Use a DROP= or KEEP= data set option to control which variables are read from an existing data set or written to a new dataset. Syntax: DATA mydataset(DROP=variables); DATA mydataset(KEEP=variables);

  44. Dropping and Keeping Variables • In the data step below, we use a KEEP= data step option to control which variables are read from data set A. • We use the DROP statement to determine which variables are written to data set B . DATA B; SET A (KEEP=hospbeds snfbeds rehabbeds discharges days); Beds=SUM( hospbeds, snfbeds, rehabbeds); DROP hospbeds snfbeds rehabbeds; RUN; • We input 5 variables from A– hospbeds snfbeds, rehabbeds, discharges, days • We output 3 variables to B - discharges, days, beds

  45. Creating Subsets of Observations • To subset an existing data set, use one of the following: • Subsetting IF statement IF Payments > 0; IF State = ‘New York’; • WHERE clause WHERE Payments > 0; WHERE State = ‘New York’; • Proper Usage: PROC DATA Step

  46. Operators • Comparison, logical, and special operators are used with IF and WHERE statements. OPERATOR DEFINITION EQ (=) equal to NE (^=) not equal to GT (>) greater than LT (<) less than GE (>=) greater than or equal to LE (<=) less than or equal to IN equal to one of a list AND (&) if both are true, compound is true OR (|) if either is true, compound is true NOT (^) reverses logic of comparison BETWEEN – AND falls in a range of values, inclusively CONTAINS (?) includes the specified substring Comparison Logical Special

  47. DATA Step Exercise • Create a data set mydir.MCRdata2 using mydir.MCRdata2 as the input dataset. • Create a variable StateNum which is the first two digits of the Prvdr_num, using the SUBSTR function. • Use an IF-THEN-ELSE statement to create a variable, State, which is defined as follows (Include a LENGTH statement.) : • Create three variables: 1. MedPayments, which is the sum of MedPay1, MedPay2, and MedPay3 2. MedCosts, which is the sum of MedCost1 and MedCost2 3. ProfitMargin = 100*((SUM(MedPayments, – MedCosts))/MedPayments) • Use a subsetting IF to drop hospitals with less than 100 beds. • Use the DROP statement to drop StateNum, MedPay1 – MedPay 4, MedCost1, and MedCost2. • Print Prvdr_num, HospName, State, and ProfitMargin to the output window. Then, use the WHERE statement to print those variables for PA and VA hospitals only. (Use the IN operator.)

  48. Formatting Variables • SAS formats change the appearance of printed values in a report. • The actual stored values in the data set do not change. Syntax: FORMATvariable(s) format; PROC PRINT DATA= mydataset ; FORMAT InpMargin 6.1 ; RUN;

  49. Formatting Variables Displayed Value 12345.67 12,345.67 $12,345.67 SOPHIA Stored Data 12345.6689 12345.6689 12345.6689 SOPHIA Description Standard Numeric Format Width = 8 Decimal Places = 2 Numbers with commas Width = 9 Decimal Places = 2 Numbers with commas and $ Width = 10 Decimal Places = 2 Standard Character Format Width = 6 Format w.d 8.2 commaw.d Comma9.2 dollarw.d dollar10.2 $w. $6.

  50. SAS Dates • Stored as numeric. • Stored as the number of days between January 1, 1960 and the specified date. • (e.g. 1/1/1960 would be stored as 0; 12/31/1959 = -1; 1/2/1960 = 1 Stored Value 17323 17323 17323 17323 17323 17323 17323 17323 Format MMDDYY6. MMDDYY8. MMDDYY10. DDMMYY10. DATE7. DATE9. WORDDATE. WEEKDATE. Displayed Date (for June 6, 2007) 060607 06/06/07 06/06/2007 06/06/2007 06JUN07 06JUN2007 June 6, 2007 Wednesday, June 6, 2007 • To convert a character variable to a date variable, use: • sas_date = INPUT(char_date, informat);

More Related