1 / 24

Statistical Methods

Statistical Methods. Lynne Stokes Department of Statistical Science. Lecture 7: Introduction to SAS Programming Language. Preliminaries. Create a Folder: c:/Stat6337 Send to the Desktop Access Blackboard Download the Eysenck Data File Download the lecture7Eysenck.sas File

thor
Télécharger la présentation

Statistical Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language

  2. Preliminaries • Create a Folder: c:/Stat6337 • Send to the Desktop • Access Blackboard • Download the Eysenck Data File • Download the lecture7Eysenck.sas File • Download the lecture7class.sas File • Download the lecture7SASSummary.doc File

  3. Eysenck’s Data File

  4. Open the SAS Program • Double-click the lecture7.sas File • Press the Run Icon (Runner Image) • Editor • Create and Modify SAS Command Files • Can Save in the Stat 6337 Folder : File / Save As … • Log • Messages about the Compilation and Execution of the SAS Program • Contains Error Messages (in red), if any • Can Save in the Stat 6337 Folder : File / Save As … • Output • Results of the Execution of the SAS Program • Can Save in the Stat 6337 Folder : File / Save As … To Erase the Contents of the Log or Output Files Right Click, Select “Clear All”

  5. SAS Structure • DATAStep • Describe the data, provide names for variables, define new or transformed variables • PROCs : SAS Procedures • Descriptive Statistics: Proc Univariate, Proc Means • Graphics: Proc Chart, Proc Plot • Regression: Proc Reg • Two-sample t-tests: Proc Ttest • Analysis of Variance: Proc Anova, Proc GLM, Proc Mixed • Specialized Data Operations: Proc Sort • etc.

  6. SAS Syntax • Every command MUST end with a semicolon • Commands can continue over two or more lines • This WILL be Your #1, #2 & #3 Mistakes !!!! • Variable names are 1-8 characters (letters and numerals, beginning with a letter or underscore), but no blanks or special characters • Note: values for character variables can exceed 8 characters • Comments • Begin with *, end with ; • Can comment several lines: begin with /* and end with */

  7. Data Input in the SAS File • Datafname ; • creates temporary file with the data that are described in the data step • Inputname . . . name $ . . . ; • list input: lists the variable names (1 – 8 characters/letters), name is assumed to be a quantitative variable • nameMUST be followed by $if name is a character variable • alternatives: comma separated, column specified • Datalines (or Cards) ; • indicates that the data follow, line by line • ; • indicates that the last line of data has been input, the semicolon is on a line by itself • Example: lecture7class.sas • Open lecture7class.sas • Change filename, if necessary • Clear output and log files; Run lecture7class.sas

  8. Data Input with Multiple Responses on a Single Line of the Data File • SAS Requires that Each Response Value be on a Separate Line of Data • When n Responses are on One Line of Data • Input y1 y2 … yn • y = y1; output; • y = y2; output; • . . . • y = yn; output; • If y1 …yn Represent Responses for n Levels of a Factor • Input y1 y2 … yn • factor = ‘Level 1’; y = y1; output; • factor = ‘Level 2’; y = y2; output; • . . . • factor = ‘Level n’; y = yn; output; • Example: lecture7.sas • Data Flow2 Creates n Data Lines with 1 Response Value on Each Line Creates n Data Lines with 1 Factor & Response Value on Each Line

  9. Data Input from an External File • Filename fn ‘complete directory/file specification’ ; • e.g.,filename eysdata ‘c:/Stat6337/EysenckRecall.dat’ • Be Careful with Spaces in Directories and File Names !!! • Data fname ; • creates temporary file with the data that are described in the data step • Infile fn ; • input the data from the file labeled fn • Input name . . . name $ . . . ; • lists the variable names (1 – 8 characters/letters), name is assumed to be a quantitative variable • nameMUST be followed by $if name is a character variable • Run ; • indicates that the data step is completed • Example: lecture7class.sas • Data Recall

  10. Program Data Vector • One line of data is stored, as indicated on the Input statement of the Data Step • Any calculations, deletions, etc. in the Data Step are performed on that line of data • When the Data Step is completed, the variables in the Program Data Vector are output to a temporary (work) file • Can force data lines to be written at any time with the Output statement

  11. Operations in the Data Step • Arithmetic Operations • x = u + v ; • Transformations • x = log(y) ; • Logical • If x > 0 then z = y/x ; • Recoding • If gender = ‘m’ then gender = ‘Male’; else if gender = ‘f’ then gender = ‘Female’; • Note: SAS formats based on the first value of a variable • To force a length (e.g., character variable), use length

  12. Titles and Labels • Title# ‘…’ ; • Up to 10 title lines: title# ‘include your title here’; • Can be placed in Data Steps or Procs • Changing Title# replaces that title and eliminates Titlex, where x > # • Labelname = ‘…’ ; • Can be in a Data Step or Proc Print

  13. Some Useful PROCs • Proc Chart • vertical or horizontal bar charts • Proc Freq • frequency distributions, cross tabs • Proc Means • select summary statistics • Proc Plot • scatterplots • Proc Print • prints data files • Proc Sort • sorts data files by the values of one or more variables • Proc Univariate • a wide range of summary statistics, box plots

  14. General Form of PROCs PROC xxxx data=fname options; by groups;proc-specific statements; title . . . ; output out = fn . . . ;run ;

  15. Printing to the Output File • Proc Printdata = fname ; • var . . . ; lists the variables to be printed (can be omitted) • run ; indicates the print commands are complete

  16. Group Analyses • Sort the Groups • Proc Sort data= … ; • by group; • run; • Execute the Proc, by Group • Proc xxx data= … ; • by group; • . . . • run;

  17. Calculate the average, standard deviation, minimum, and maximum to 2 decimal places Proc Means Graph a histogram of the recall data Proc Chart Calculate frequencies for each condition/group and each age Proc Freq Summarize the Recall Data

  18. Calculate descriptive statistics for each condition/group Proc Means, Proc Univariate Note: Sort First, then Use the BY Command. Graph Average Recall for All Combinations of Recall Condition/Group and Age Use a Group Identifier as the Plotting Symbol Proc Plot Summarize the Recall Data

  19. Proc Anova • Only for Complete Factorial Experiments in Completely Randomized Designs • Otherwise: Proc GLM • MUST have an Equal Number of Repeats for Each Factor-Level Combination

  20. Proc Anova • Proc Anova data = fn ; • By … ; • Separate ANOVA Fits for Each Value of the BY variable(s). • Class … ; • List all the factors. • Model … / options; • e.g., model recall = age group age*group ; • factors: list individually; e.g. age group • interactions: connect with asterisk(s); e.g., age*group • Means … / options; • e.g., means age group age*group / t bon; • Run;

  21. Eysenck’s Study of Incidental Learning Make analysis of variance calculations, use only recall condition as factor. Calculate factor-level averages, with the t option.

  22. Effect of Cocaine Usage on Newborn Infant Body Lengths • Usage Groups: First Trimester • Throughout Pregnancy • Drug-Free Research Question: Do Mean Body Lengths (cm) Differ by Cocaine Usage?

  23. Effect of Cocaine Usage on Newborn Infant Body Lengths

  24. Assignment • Create a Data File • Input the Data File into a SAS Program • Cocaine Usage Groups • Calculate Averages and Standard Deviations • Make Comparative Box Plots • Test the Equality of the Group Means • Email Me ONLY the FINAL .log File

More Related