261 likes | 535 Vues
Statistical Methods. Lynne Stokes Department of Statistical Science. Lecture 7: Introduction to SAS Programming Language. Preliminaries. Create a Folder: c:/Stat6337 Send to the Desktop Access Blackboard Download the Eysenck Data File Download the lecture7Eysenck.sas File
E N D
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language
Preliminaries • Create a Folder: c:/Stat6337 • Send to the Desktop • Access Blackboard • Download the Eysenck Data File • Download the lecture7Eysenck.sas File • Download the lecture7class.sas File • Download the lecture7SASSummary.doc File
Open the SAS Program • Double-click the lecture7.sas File • Press the Run Icon (Runner Image) • Editor • Create and Modify SAS Command Files • Can Save in the Stat 6337 Folder : File / Save As … • Log • Messages about the Compilation and Execution of the SAS Program • Contains Error Messages (in red), if any • Can Save in the Stat 6337 Folder : File / Save As … • Output • Results of the Execution of the SAS Program • Can Save in the Stat 6337 Folder : File / Save As … To Erase the Contents of the Log or Output Files Right Click, Select “Clear All”
SAS Structure • DATAStep • Describe the data, provide names for variables, define new or transformed variables • PROCs : SAS Procedures • Descriptive Statistics: Proc Univariate, Proc Means • Graphics: Proc Chart, Proc Plot • Regression: Proc Reg • Two-sample t-tests: Proc Ttest • Analysis of Variance: Proc Anova, Proc GLM, Proc Mixed • Specialized Data Operations: Proc Sort • etc.
SAS Syntax • Every command MUST end with a semicolon • Commands can continue over two or more lines • This WILL be Your #1, #2 & #3 Mistakes !!!! • Variable names are 1-8 characters (letters and numerals, beginning with a letter or underscore), but no blanks or special characters • Note: values for character variables can exceed 8 characters • Comments • Begin with *, end with ; • Can comment several lines: begin with /* and end with */
Data Input in the SAS File • Datafname ; • creates temporary file with the data that are described in the data step • Inputname . . . name $ . . . ; • list input: lists the variable names (1 – 8 characters/letters), name is assumed to be a quantitative variable • nameMUST be followed by $if name is a character variable • alternatives: comma separated, column specified • Datalines (or Cards) ; • indicates that the data follow, line by line • ; • indicates that the last line of data has been input, the semicolon is on a line by itself • Example: lecture7class.sas • Open lecture7class.sas • Change filename, if necessary • Clear output and log files; Run lecture7class.sas
Data Input with Multiple Responses on a Single Line of the Data File • SAS Requires that Each Response Value be on a Separate Line of Data • When n Responses are on One Line of Data • Input y1 y2 … yn • y = y1; output; • y = y2; output; • . . . • y = yn; output; • If y1 …yn Represent Responses for n Levels of a Factor • Input y1 y2 … yn • factor = ‘Level 1’; y = y1; output; • factor = ‘Level 2’; y = y2; output; • . . . • factor = ‘Level n’; y = yn; output; • Example: lecture7.sas • Data Flow2 Creates n Data Lines with 1 Response Value on Each Line Creates n Data Lines with 1 Factor & Response Value on Each Line
Data Input from an External File • Filename fn ‘complete directory/file specification’ ; • e.g.,filename eysdata ‘c:/Stat6337/EysenckRecall.dat’ • Be Careful with Spaces in Directories and File Names !!! • Data fname ; • creates temporary file with the data that are described in the data step • Infile fn ; • input the data from the file labeled fn • Input name . . . name $ . . . ; • lists the variable names (1 – 8 characters/letters), name is assumed to be a quantitative variable • nameMUST be followed by $if name is a character variable • Run ; • indicates that the data step is completed • Example: lecture7class.sas • Data Recall
Program Data Vector • One line of data is stored, as indicated on the Input statement of the Data Step • Any calculations, deletions, etc. in the Data Step are performed on that line of data • When the Data Step is completed, the variables in the Program Data Vector are output to a temporary (work) file • Can force data lines to be written at any time with the Output statement
Operations in the Data Step • Arithmetic Operations • x = u + v ; • Transformations • x = log(y) ; • Logical • If x > 0 then z = y/x ; • Recoding • If gender = ‘m’ then gender = ‘Male’; else if gender = ‘f’ then gender = ‘Female’; • Note: SAS formats based on the first value of a variable • To force a length (e.g., character variable), use length
Titles and Labels • Title# ‘…’ ; • Up to 10 title lines: title# ‘include your title here’; • Can be placed in Data Steps or Procs • Changing Title# replaces that title and eliminates Titlex, where x > # • Labelname = ‘…’ ; • Can be in a Data Step or Proc Print
Some Useful PROCs • Proc Chart • vertical or horizontal bar charts • Proc Freq • frequency distributions, cross tabs • Proc Means • select summary statistics • Proc Plot • scatterplots • Proc Print • prints data files • Proc Sort • sorts data files by the values of one or more variables • Proc Univariate • a wide range of summary statistics, box plots
General Form of PROCs PROC xxxx data=fname options; by groups;proc-specific statements; title . . . ; output out = fn . . . ;run ;
Printing to the Output File • Proc Printdata = fname ; • var . . . ; lists the variables to be printed (can be omitted) • run ; indicates the print commands are complete
Group Analyses • Sort the Groups • Proc Sort data= … ; • by group; • run; • Execute the Proc, by Group • Proc xxx data= … ; • by group; • . . . • run;
Calculate the average, standard deviation, minimum, and maximum to 2 decimal places Proc Means Graph a histogram of the recall data Proc Chart Calculate frequencies for each condition/group and each age Proc Freq Summarize the Recall Data
Calculate descriptive statistics for each condition/group Proc Means, Proc Univariate Note: Sort First, then Use the BY Command. Graph Average Recall for All Combinations of Recall Condition/Group and Age Use a Group Identifier as the Plotting Symbol Proc Plot Summarize the Recall Data
Proc Anova • Only for Complete Factorial Experiments in Completely Randomized Designs • Otherwise: Proc GLM • MUST have an Equal Number of Repeats for Each Factor-Level Combination
Proc Anova • Proc Anova data = fn ; • By … ; • Separate ANOVA Fits for Each Value of the BY variable(s). • Class … ; • List all the factors. • Model … / options; • e.g., model recall = age group age*group ; • factors: list individually; e.g. age group • interactions: connect with asterisk(s); e.g., age*group • Means … / options; • e.g., means age group age*group / t bon; • Run;
Eysenck’s Study of Incidental Learning Make analysis of variance calculations, use only recall condition as factor. Calculate factor-level averages, with the t option.
Effect of Cocaine Usage on Newborn Infant Body Lengths • Usage Groups: First Trimester • Throughout Pregnancy • Drug-Free Research Question: Do Mean Body Lengths (cm) Differ by Cocaine Usage?
Assignment • Create a Data File • Input the Data File into a SAS Program • Cocaine Usage Groups • Calculate Averages and Standard Deviations • Make Comparative Box Plots • Test the Equality of the Group Means • Email Me ONLY the FINAL .log File