Download
chapter 8 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Chapter 8 PowerPoint Presentation

Chapter 8

272 Vues Download Presentation
Télécharger la présentation

Chapter 8

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Chapter 8 Producing Summary Reports

  2. Section 8.1 Introduction toSummary Reports

  3. Objectives • Identify the different report writing procedures. • Create one-way and two-way frequency tables using the FREQ procedure. • Restrict the variables processed by the FREQ procedure. • Generate simple descriptive statistics using the MEANS procedure. • Group observations of a SAS data set for analysis using the CLASS statement in the MEANS procedure.

  4. Summary Reports Small Data Set LastName FirstName Age TORRES JAN 23LANGKAMM SARAH 46SMITH MICHAEL 71WAGSCHAL NADJA 37TOERMOEN JOCHEN 16 Report Writing Step Report Large Data Set Summarize Data and Report Writing Step LastName FirstName Age TORRES JAN 23LANGKAMM SARAH 46SMITH MICHAEL 71WAGSCHAL NADJA 37TOERMOEN JOCHEN 16 . . . . . . Ingersol Hans 32 Himelewski Janice 87 ...

  5. Summary Report Procedures PROC MEANSproducessimple statistics. PROC FREQproduces frequency counts. Toolbox PROC REPORTproduces flexibledetail and summary reports.

  6. PROC FREQ Output Distribution of Job Code Values The FREQ Procedure Job Cumulative Cumulative Code Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ FLTAT1 14 20.29 14 20.29 FLTAT2 18 26.09 32 46.38 FLTAT3 12 17.39 44 63.77 PILOT1 8 11.59 52 75.36 PILOT2 9 13.04 61 88.41 PILOT3 8 11.59 69 100.00

  7. PROC MEANS Output Salary by Job Code The MEANS Procedure Analysis Variable : Salary Job N Code Obs N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ FLTAT1 14 14 25642.86 2951.07 21000.00 30000.00 FLTAT2 18 18 35111.11 1906.30 32000.00 38000.00 FLTAT3 12 12 44250.00 2301.19 41000.00 48000.00 PILOT1 8 8 69500.00 2976.10 65000.00 73000.00 PILOT2 9 9 80111.11 3756.48 75000.00 86000.00 PILOT3 8 8 99875.00 7623.98 92000.00 112000.00 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

  8. PROC REPORT Output Salary Analysis Job Code Home Base Salary ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ FLTAT1 CARY $131,000 FRANKFURT $100,000 LONDON $128,000 FLTAT2 CARY $245,000 FRANKFURT $181,000 LONDON $206,000 FLTAT3 CARY $217,000 FRANKFURT $134,000 LONDON $180,000 PILOT1 CARY $211,000 FRANKFURT $135,000 LONDON $210,000 PILOT2 CARY $323,000 FRANKFURT $240,000 LONDON $158,000 PILOT3 CARY $300,000 FRANKFURT $205,000 LONDON $294,000 ========== $3,598,000

  9. Section 8.2 Basic Summary Reports

  10. SAS Vocabulary • PROC FREQ • TABLES • NLEVELS • Crosstabular • * • PROC MEANS • VAR • CLASS • MAXDEC=

  11. Goal Report 1 • International Airlines wants to know how many employees are in each job code. Distribution of Job Code Values The FREQ Procedure Job Cumulative Cumulative Code Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ FLTAT1 14 20.29 14 20.29 FLTAT2 18 26.09 32 46.38 FLTAT3 12 17.39 44 63.77 PILOT1 8 11.59 52 75.36 PILOT2 9 13.04 61 88.41 PILOT3 8 11.59 69 100.00

  12. Goal Report 2 • Categorize job code and salary values to determine how many employees fall into each group. Salary Distribution by Job Codes The FREQ Procedure Table of JobCode by Salary JobCode Salary Frequency ‚ Percent ‚ Row Pct ‚ Col Pct ‚Less tha‚25,000 t‚More tha‚ Total ‚n 25,000‚o 50,000‚n 50,000‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Flight Attendant ‚ 5 ‚ 39 ‚ 0 ‚ 44 ‚ 7.25 ‚ 56.52 ‚ 0.00 ‚ 63.77 ‚ 11.36 ‚ 88.64 ‚ 0.00 ‚ ‚ 100.00 ‚ 100.00 ‚ 0.00 ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Pilot ‚ 0 ‚ 0 ‚ 25 ‚ 25 ‚ 0.00 ‚ 0.00 ‚ 36.23 ‚ 36.23 ‚ 0.00 ‚ 0.00 ‚ 100.00 ‚ ‚ 0.00 ‚ 0.00 ‚ 100.00 ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 5 39 25 69 7.25 56.52 36.23 100.00

  13. Creating a Frequency Report • PROC FREQ displays frequency counts of the data values in a SAS data set. • General form of a simple PROC FREQ step: PROC FREQ DATA=SAS-data-set; RUN; Example: procfreqdata=ia.crew; run;

  14. Creating a Frequency Report • By default, PROC FREQ • analyzes every variable in the SAS data set • displays each distinct data value • calculates the number of observations in which each data value appears (and the corresponding percentage) • Indicates, for each variable, how many observations have missing values.

  15. Distribution of HireDate Distribution of Salary Distribution of LastName Distribution of JobCode Distribution of FirstName Distribution of EmpID Distribution of Location Distribution of Phone Default Frequency Reports ia.crew procfreqdata=ia.crew; run; ...

  16. Variables to Analyze • PROC FREQ is appropriate for variables with only a few values. • For example, if you have a class list with one row for each student, it would not be very meaningful to analyze the student ID if there is one row per person in the table. • PROC FREQ enables you to choose the variables to analyze.

  17. Printing Selected Variables • SAS enables you to select the variables to display or analyze. • In PROC PRINT, what statement selected the variables for the output? ...

  18. Printing Selected Variables • SAS enables you to select the variables to display or analyze. • In PROC PRINT, what statement selected the variables for the output? • The VAR statement

  19. Printing Selected Variables • SAS enables you to select the variables to display or analyze. • In PROC FREQ, what statement selects the variables? ...

  20. Printing Selected Variables • SAS enables you to select the variables to display or analyze. • In PROC FREQ, what statement selects the variables? • The TABLES statement

  21. Printing Selected Variables • SAS enables you to select the variables to display or analyze.

  22. One-Way Frequency Report • Use the TABLES statement to limit the variables included in the frequency counts. • These are typically variables that have a limited number of distinct values. • General form of a PROC FREQ step with a TABLES statement: PROC FREQ DATA=SAS-data-set; TABLESSAS-variables < / options >; RUN; Ignore the option for now.

  23. One-Way Frequency Report • Use the TABLE statement to analyze JobCode. • For example: proc freq data=ia.crew; tables JobCode ; run;

  24. Creating a Frequency Report – Example title'Distribution of Job Code Values'; procfreqdata=ia.crew; tables JobCode; run; Distribution of Job Code Values The FREQ Procedure Job Cumulative Cumulative Code Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ FLTAT1 14 20.29 14 20.29 FLTAT2 18 26.09 32 46.38 FLTAT3 12 17.39 44 63.77 PILOT1 8 11.59 52 75.36 PILOT2 9 13.04 61 88.41 PILOT3 8 11.59 69 100.00

  25. One-Way Frequency Report • You can select more than one variable to analyze by listing them all in the TABLES statement. Separate them with a space. • This creates one report for each variable. • For example: proc freq data=ia.crew; tables JobCode Location; RUN;

  26. Creating a Frequency Report – Example title; procfreqdata=ia.crew; tables JobCode Location; run; The FREQ Procedure Job Cumulative Cumulative Code Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ FLTAT1 14 20.29 14 20.29 FLTAT2 18 26.09 32 46.38 FLTAT3 12 17.39 44 63.77 PILOT1 8 11.59 52 75.36 PILOT2 9 13.04 61 88.41 PILOT3 8 11.59 69 100.00 Cumulative Cumulative Location Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ CARY 27 39.13 27 39.13 FRANKFURT 19 27.54 46 66.67 LONDON 23 33.33 69 100.00 JobCode Report Location Report

  27. Displaying the Number of Levels – Example • Use the NLEVELS option in the PROC FREQ statement to displaythe number of levels for the variables included in the frequency counts. title'Distribution of Location Values'; procfreqdata=ia.crew nlevels; tables Location; run;

  28. Creating a Frequency Report – Example Distribution of Location Values The FREQ Procedure Number of Variable Levels Variable Levels ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Location 3 Cumulative Cumulative Location Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ CARY 27 39.13 27 39.13 FRANKFURT 19 27.54 46 66.67 LONDON 23 33.33 69 100.00

  29. Creating a Frequency Report • To display the number of levels without displaying the frequency counts, add the NOPRINT option to the TABLES statement. procfreqdata=ia.crew nlevels; tables JobCode Location / noprint; title'Number of Levels for Job Code and Location'; run; Number of Levels for Job Code and Location The FREQ Procedure Number of Variable Levels Variable Levels ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ JobCode 6 Location 3

  30. Creating a Frequency Report • To display the number of levels for all variables without displaying any frequency counts, use the _ALL_ keyword and the NOPRINT option in the TABLES statement. • (You must also use the NLEVELS option.) title'Number of Levels for All Variables'; procfreqdata=ia.crew nlevels; tables _all_ / noprint; run;

  31. Stored values Formatted values PILOT1 PILOT2 PILOT3 Pilot FLTAT1 FLTAT2 FLTAT3 Flight Attendant Analyzing Categories of Values • International Airlines wants to use formats to categorize the flight crew by job code.

  32. Analyzing Categories of Values – Example procformat; value $codefmt 'FLTAT1'-'FLTAT3'='Flight Attendant' 'PILOT1'-'PILOT3'='Pilot'; run; procfreqdata = ia.crew; format JobCode $codefmt.; tables JobCode; run;

  33. Analyzing Categories of Values – Example Distribution of Job Code Values The FREQ Procedure Cumulative Cumulative JobCode Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Flight Attendant 44 63.77 44 63.77 Pilot 25 36.23 69 100.00 PROC FREQ automatically groups the data by the formatted value of a variable if a format is associated with that variable.

  34. Exercise • This exercise reinforces the concepts discussed previously.

  35. Exercises • Use the StudyAbroad2 delimited data file to create a data set called StudyLocations. The variables in order are Country,Cost,Time,andBeginDate. • Format Cost to reflect currency values and BeginDate to a readable date value. • Change the column headings to Country, TripCost, Length of Program, and Trip Begin Date. • Create a listing report to verify all the work for #1 above. • Use PROC FREQ to determine the frequencies for Country and Time.

  36. Exercises – A Solution data StudyLocations; infile'StudyAbroad2.csv'dsd; input Country :$15. Cost Time :$8. BeginDate :mmddyy10.; format BeginDate mmddyy10. Cost dollar8.; label Cost='Trip Cost' Time='Length of Program' BeginDate = 'Trip Begin Date'; run; procprintdata= StudyLocations noobslabel; run; procfreqdata=StudyLocations; tables Country Time; run;

  37. Exercises Partial PROC PRINT Output Length Trip of Trip Begin Country Cost Program Date Germany $4,200 Semester 09/01/2007 France $8,162 Year 10/01/2007 Great Britain $8,225 Year 09/01/2007 Australia $7,500 Year 06/01/2007 Sweden $5,286 Semester 12/01/2007 Spain $3,500 Semester 09/01/2007 Mexico $2,300 Semester 09/01/2007 France $3,971 Semester 10/01/2007 Great Britain $8,225 Year 09/01/2007 Sweden $5,286 Semester 12/01/2007 Germany $4,200 Semester 09/01/2007 Great Britain $4,700 Semester 09/01/2007 Germany $7,625 Year 09/01/2007

  38. Exercises Partial PROC FREQ Output The FREQ Procedure Cumulative Cumulative Country Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Australia 11 20.75 11 20.75 France 9 16.98 20 37.74 Germany 7 13.21 27 50.94 Great Britain 10 18.87 37 69.81 Mexico 4 7.55 41 77.36 Spain 5 9.43 46 86.79 Sweden 7 13.21 53 100.00 Length of Program Cumulative Cumulative Time Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Semester 25 47.17 25 47.17 Year 28 52.83 53 100.00

  39. Crosstabular Frequency Reports • A two-way, or crosstabular, frequency report analyzes all possible combinations of the distinct values of two variables. • The asterisk (*) operator in the TABLES statement is used to cross variables. • General form of the FREQ procedure to create a crosstabular report: PROC FREQ DATA=SAS-data-set; TABLESvariable1*variable2; RUN;

  40. Crosstabular Frequency Reports – Example procformat; value $codefmt 'FLTAT1'-'FLTAT3'='Flight Attendant' 'PILOT1'-'PILOT3'='Pilot'; value money low-<25000 ='Less than 25,000' 25000-50000='25,000 to 50,000' 50000<-high='More than 50,000'; run; procfreqdata=ia.crew; tables JobCode*Salary; format JobCode $codefmt. Salary money.; title'Salary Distribution by Job Codes'; run;

  41. Crosstabular Frequency Reports Salary Distribution by Job Codes The FREQ Procedure Table of JobCode by Salary JobCode Salary Frequency ‚ Percent ‚ Row Pct ‚ Col Pct ‚Less tha‚25,000 t‚More tha‚ Total ‚n 25,000‚o 50,000‚n 50,000‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Flight Attendant ‚ 5 ‚ 39 ‚ 0 ‚ 44 ‚ 7.25 ‚ 56.52 ‚ 0.00 ‚ 63.77 ‚ 11.36 ‚ 88.64 ‚ 0.00 ‚ ‚ 100.00 ‚ 100.00 ‚ 0.00 ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Pilot ‚ 0 ‚ 0 ‚ 25 ‚ 25 ‚ 0.00 ‚ 0.00 ‚ 36.23 ‚ 36.23 ‚ 0.00 ‚ 0.00 ‚ 100.00 ‚ ‚ 0.00 ‚ 0.00 ‚ 100.00 ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 5 39 25 69 7.25 56.52 36.23 100.00 Second Variable First Variable

  42. Crosstabular Frequency Reports – Example • To display the crosstabulation results in a listing form, add the CROSSLIST option to the TABLES statement. procfreqdata=ia.crew; tables JobCode*Location / crosslist; title'Location Distribution for Job Codes'; run;

  43. Crosstabular Frequency Reports Partial Output Location Distribution for Job Codes The FREQ Procedure Table of JobCode by Location Job Row Column Code Location Frequency Percent Percent Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ FLTAT1 CARY 5 7.25 35.71 18.52 FRANKFURT 4 5.80 28.57 21.05 LONDON 5 7.25 35.71 21.74 Total 14 20.29 100.00 ------------------------------------------------------------- FLTAT2 CARY 7 10.14 38.89 25.93 FRANKFURT 5 7.25 27.78 26.32 LONDON 6 8.70 33.33 26.09 Total 18 26.09 100.00 -------------------------------------------------------------

  44. Exercise • This exercise reinforces the concepts discussed previously.

  45. Exercises Using the StudyLocationsdata set you created in a previous exercise, create a crosstabular frequency report using the CROSSLIST option. Display the length of the program by country.

  46. Exercises procfreqdata=StudyLocations; tables Time*Country /crosslist; run;

  47. Exercises

  48. Business Task • International Airlines wants to determine the minimum, maximum, and average salary for each job code.

  49. Calculating Summary Statistics – Example • The MEANS procedure displays simple descriptive statistics for the numeric variables in a SAS data set. • General form of a simple PROC MEANS step: PROC MEANS DATA=SAS-data-set; RUN; How many variables will be analyzed? proc means data=ia.crew; title 'Salary Analysis'; run;

  50. Calculating Summary Statistics Salary Analysis The MEANS Procedure Variable N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ HireDate 69 9812.78 1615.44 7318.00 12690.00 Salary 69 52144.93 25521.78 21000.00 112000.00 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ In this example, PROC MEANS analyzed two variables, HireDate and Salary.