Chapter 1: Getting Started

Chapter 1: Getting Started • Overview of basic components • Data Sets • Data Steps • Windowing (DM) environment • Submitting programs • Reviewing Output • System options

The SAS Language • Actually, SAS contains several languages. • SAS statements vs. SAS commands • All statements end with “;” . • Free format language. Can have • multiple statements per line • multiple lines for a single statement. • Neither is a good idea most of the time.

SAS Names • Used to be limited to 8 characters. With v7 the limit went to 32. • First character a letter or underscore (_). • Subsequent characters in name can be letters, digits or underscores. • Case is significant only for cosmetic/display purposes. SAS stores names in mixed case but will match totpop and TotPop.

Exception: librefs & filerefs • Names associated with SAS data libraries and ascii files are still limited to 8 characters. (Because of platform limits on MVS, CMS, others?) • Also applies to names of SAS formats created with PROC FORMAT.

SAS Comments • Two kinds: • “statement style” begin with “*” and end with “;” • Other kind begin with “/*’ and end w. “*/” • If you use statement-style for your real comments then you can use the other kind to “comment out” sections of code.

Ex. of “Commented Out” Code • /* ===========Begin commented out code========= • *---Step 1: Read the data--; • data one; infile ‘name_of_the_file’; • input a b c d e f g; • if a=1 then a=0; else if b=2 then b=3; *--edit vals; • run; • *--Step 2: Sort and Print the Data--; • proc sort data=one; by d e g; run; • proc print data=one; by d; title ‘Data Set One’; run; • ================end comment================= */ • *---Step 3: Begin statistical analysis of the data--; • proc univariate data=one ; • ----etc-----

SAS Data Sets • This is where SAS stores the data. • Statistical vs. database terminology: • Observations = Rows • Variables = Columns • Data Sets = Tables • The observations describe entities, the variables are attributes of those entities. • In our environment the rows are usually geographic areas and the variables are summary statistics regarding those areas.

Variable Attributes • Type (character or numeric) • Length (3-8 bytes for nums, 1-2000 for character strings. • Label: Up to 256 characters. • Format: Used by default when the variable is displayed. E.g. comma9. or $mocnty. • Informat: Format used to convert typed values entered interactively.

Date Variables • No such thing as an explicit date var type. • Dates are stored as numeric values as the # of days since Jan. 1, 1960. • Format codes are used to read and display data variables. I.e. read it with mmddyy6. And display it with date9.

Sample Pgm: Dates • data dates; • input dateval mmddyy6. sales; • format dateval date9.; • datalines; • 020198 1234 • 122501 5678 • 80199 725 • 091101 1,023 • run; • options ls=80; • procprintdata=dates; • title'Listing of dates'; • run;

Sample Pgm: Dates - Output • Obs dateval sales • 1 01FEB1998 1234 • 2 25DEC2001 5678 • 3 01AUG1999 725 • 4 11SEP2001 .

Program Steps • Data steps and Proc(edure) steps. • Some stmts (e.g. title, options, %let) are not part of any specific step. (“global statements”). • Step boundaries: • Begin with data or proc statements. • End with run stmt or next step or EOF. • Highly recommended: always use run;

How Many Steps? • data dates2; • input date mmddyy6. sales; • informat sales comma.; • format date date9.; • datalines; • 020198 1234 • 122501 5678 • 10299 725 • 091101 1,023 • run; • options ls=80; • procprintdata=dates2; • procsortdata=dates2; by date;

Data Step Cycles (“Built-In Loop”) • Most data steps have 1 and only 1 data source. Usually an infile/input or a set or merge statement represent the data source. • SAS executes the data step stmts once for each input line/observation. • The data step stmts are compiled and, if no errors, executed -- once for each set of data. • Variable _n_ (“automatic”) counts the cycles through the implicit loop.

SAS Windowing Environment • AKA DM - “Display Manager” • You can run SAS without using it -- edit code with a text editor and use batch mode. • It takes some getting used to, but it’s worth it. • The Windows version is different than all the rest. Platform independence vs. MS software standards clashed and MS won.

The Enhanced Editor • Only mentioned in TLSB. It is here and it makes the PROGRAM window obsolete under Windows. (But still needed for Unix and all other platforms.) • It is a Windows editor. The text editor used in the Program window was modeled after the SPF editor developed by IBM in the 70s.

Major Differences • Code does not disappear and have to be recalled when you submit it. • Code is color-coded as you type to serve as a serious debugging aide. • Does not support many of the commands that the pgm window does. New users won’t care. • You can have bunches of them open at the same time.

Other Windows • Log: see what happened with submitted code. Error messages, notes, warnings,etc. • Output: “Printed” output goes here. Results of most SAS procs. • Explorer and Results. • Notepad: another text editor; for data usually. • Keys: Define function keys. Different ones for different window types. • Filename, Libname, Dir and Var very handy.

Ways to Issue Commands • Not only are there lots of different windows with lots of different commands, but there are lots of ways to specify those commands. • Pull down menus. (The pmenu option can be used to turn on/off these menus.) • Toolbar icons associated with commands. • Entering command in the Command box. • Function keys! (Not mentioned in TLSB).

Accessing Windows • To bring a window to the foreground and make it the “active” window: • Click within it if it is visible • Enter the name of the window as a command • Use Window pull-down and select it. • Use a function key associated with the window name. (E.g. if F10 = “Log”, just hit F10 to go to the log window. • Enter Next command to go to “next” window. • Click on the window name tab in bar at bottom.

Submitting Code • Differs somewhat between pgm window and Enhanced Editor window. • If text is selected in the window then only selected text is submitted. Otherwise, the entire program is submitted. • In Program window you need to use Recall command to bring the submitted code back.

Viewing Results of Submit • The log window tells you what happened. Rather detailed. Error messages color coded. • If no errors and code executes, “printed” results go to Output window and/or to a html file (output destinations can be specified.) • Results window is a sort of index to the Output window.

Compile & Go Phases • Code must be compiled prior to executing. The execution phase will be skipped if there are errors at compile phase. • In batch runs, SAS will set “options obs=0” when it detects an error. In this mode, later steps will compile but not execute. • Once a step fails, it can cause lots of bogus error messages in subsequent steps.

SAS System Options • System opts control all sorts of things regarding how SAS runs. • Options can be specified in many ways at different times (at SAS startup, or during execution.) • Can be specified via: • config file with “-set ..” stmt • as a parameter at invocation • using options statement or Options window.

Common Options • Printing options: • linesize= ; pagesize= ; date/nodate; center/nocenter; number/nonumber • DMS, DMR (invocation options) • Obs= (limit # observations to process) • (no)source (show source code in log) • (no)mprint (show code generated by macros)

Sample SAS Code • Follow the URL:mcdc2.missouri.edu/cgi-bin/uexplore?/pub/data/indctrs@secure • Click on the “Tools” subdirectory and then on the mocopop.sas file. • The direct URL for this file is: mcdc2.missouri.edu/data/indctrs/Tools/mocopop.sas

Browsing .sas, .log and .lst Files • The Windows Registry may associate the SAS program with these 3 filetypes. • With IE, this can cause an instance of SAS to start up when all you want to do is browse the contents of a .sas file. • You can do a manual remove of the registry entry. • Netscape does not recognize the association.

mocopop.sas • You are NOT yet expected to understand (completely) most of what’s in the program. • It has lots of steps, and accesses a set of 5 data sources -- 4 SAS data sets from the archive and 1 dbf file. • A common key, fipco, is kept on each data set. Such keys are critical. • Step 5 uses a merge stmt to bring all the data together into a single permanent SAS data set. Note the by fipco; statement.

mocopop.sas - 2 • Note how all data definition statements -- libname and filename statements -- are grouped at the top of the program. Not required, but a good convention. • Note (extensive) use of only statement-style comments. In debugging this setup, we used /* - */ “commenting out” extensively.

mocopop.sas - 3 • Note the “classic” SAS data step for accessing the data archive: • data <set-name>; • set / merge <set(s)>;(often with data set options specified). • where statement to filter observations. • Assignment stmts to edit data or create derived variables. Sometimes as part if if … then . • Keep or drop stmts to specify variables to be included on output set.

mocopop.sas - 4 • Note ability to access dbf file via proc dbf. Could also have used proc import. • Note use of attrib statements in Step 5 to establish not only the attributes of the variables (labels, length/types and formats) but also the order of the variables on the output set. • Note that the obs identifier variables are of type character, but all indicator variables are numeric.

mocopop.sas - 5 • The creation of indctrs.mocopopg as a sas data step view is way too advanced for us now. • For now, just know that there is a way to combine data sets logically rather than physically. Indctrs.mocopopg looks like a data set to SAS, but is stored as code, not actual data.

mocopop.sas - 6 • The step to aggregate the data in mocopopg to DED regions is still further beyond what we have covered so far. • Involves use of an application macro named %agg. This macro is like an extension of the language for us. • Aggregation of our data is a critically important capability.

mocopop.sas - 7 • Use the uexplore utility application to browse the indictrs data directory. • Display hypercon reports for the mocopop and mocopopg data sets. • Extract data regarding the pop change over the decade of the 90’s with components of change. Create a listing report and a csv (opened with Excel) file.

mocopop.sas - Summary • A typical “real world” SAS program. • In a way, quite complex; but with SAS it becomes just a little long. • Most of the processing is fairly routine once you have mastered a small subset of the SAS language. • Organizing such applications into carefully structured and commented modules makes it easy for us to document how we got our data.

The Data Archive • The source of most data you’ll be working with. Specialists create these sets and verify the data. • The uexplore/xtract/hypercon tools are - for now - critical in making these data accessible to the outside world. Wide use helps insure reliability. • For you, access directly via SAS is much faster and flexible. • The key indicators data base is just one -- very important -- component of the archive.

Chapter 1: Getting Started

Chapter 1: Getting Started

Presentation Transcript

Getting Started as a New Investigator

Getting Started With TeleStaff

The Life and Ministry of

Chapter One

AFM 101 Midterm

Chapter 11: Fahren Vocabulary Practice

The Acts of the Apostles Chapter 1

ON THE CUSP: STOP CAUTI Cohort 8 - Getting Started April 25, 2014 2-3:30 pm ET

Getting Started with the MSP430 LaunchPad

Excel Tutorial 1: Getting Started with Excel

Chapter 5

Unit Two Vocabulary Getting Started

The Acts of the Apostles Chapter 1

NCPTS Foundation

Chapter 23: XML

JAVA BASICS

Molecular Diagnostics – How To Get Started

ENGT 122 – CAD I

Getting Started with Python

Chapter 3: Getting Started with Tasks

Biochemistry

Scrum