Statistical Techniques I

# Statistical Techniques I

Télécharger la présentation

## Statistical Techniques I

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Start here Statistical Techniques I EXST7005 Frequency Tables

2. Course Progression • Objective - Hypothesis testing Background • A probability will be involved - we need to cover frequency distributions in general and probability distributions in particular • A transformation will also be involved - we need to cover transformations • We will test primarily means, but also variances - we will need to cover these statistics

3. Constructing a FREQUENCY Table • DIVIDE the population into a number of classes or groups based on the characteristics studied. • Categories are often quantitative, but not necessarily • DETERMINE the number of observations in each class (i.e. the frequency of occurrence of observations in each class). • CONSTRUCT the table with both classes and frequencies. The frequencies may also be relative (i.e. percentages) or cumulative.

4. EXAMPLE • Construct a frequency table for a population of fish age groups. • N = 10 • Y = age of fish in years • 8, 4, 4, 0, 1, 5, 6, 5, 3, 4 • These values are placed into discrete age groups (0 to 8)

5. FREQUENCY TABLE Class value Frequency (f.) cumulative frequency (c.f.) 0 1 1 1 1 2 2 0 2 3 1 3 4 3 6 5 2 8 6 1 9 7 0 9 8 1 10 SUM 10

6. FREQUENCY TOTAL the total number of observations. The sum of the class frequencies. • FREQUENCY (f) the number of observations in each class • CUMULATIVE FREQUENCY (c.f.) The sum of all class frequencies up to and including the class in question. Implies an order or rank, so this is usually done only with QUANTITATIVE VARIABLES Define the additional terms

7. Define the additional terms (continued) • RELATIVE FREQUENCY (r.f.) the ratio of the class frequencies to the total frequency. These always sum to 1.0 • r.f. * 100% gives the percentage frequency (sums to 100%) • RELATIVE CUMULATIVE FREQUENCY (r.c.f.) the sum of the r.f. up to and including the class in question (for QUANTITATIVE VARIABLES).

8. FREQUENCY TABLE frequency cumulative frequency relative frequency (r.f.) relative cumulative frequency (r.c.f) Class value 0 1 1 0.1 0.1 1 1 2 0.1 0.2 2 0 2 0.0 0.2 3 1 3 0.1 0.3 4 3 6 0.3 0.6 5 2 8 0.2 0.8 6 1 9 0.1 0.9 7 0 9 0.0 0.9 8 1 10 0.1 1.0 SUM 10 1.0

9. A SAS example (#1) from Freund & Wilson (1997) Table 1.1 The program: Part 1 - the DATA step ***************************************; *** Data from Freund & Wilson, 1997 ***; *** TABLE 1.1 - HOUSES DATA ***; ***************************************; OPTIONS NOCENTER NODATE LS=78 PS=61; DATA ONE; INFILE CARDS MISSOVER; TITLE1 'Analysis of house sale price data'; TITLE2 'Table 1.1 from Freund & Wilson, 1997'; INPUT OBS QUALITY \$ EXTERIOR \$ DSQF SP; SP_INT = INT(SP / 10); CARDS; RUN;

10. A SAS example from Freund & Wilson (1997) Table 1.1 (#1 continued) 1 POOR FRAME 0.816 19.000 2 POOR FRAME 0.907 19.800 3 POOR FRAME 0.938 15.000 4 POOR FRAME 1.032 18.900 5 POOR FRAME 1.100 25.500 6 POOR FRAME 1.180 25.900 7 MEDIUM BRICK 1.278 33.500 8 MEDIUM BRICK 1.289 36.500 9 MEDIUM BRICK 1.320 38.500 10 MEDIUM BRICK 1.328 37.000 11 GOOD BRICK 1.337 34.555 12 MEDIUM BRICK 1.366 35.500 13 POOR FRAME 1.382 22.000 14 MEDIUM BRICK 1.387 39.500 15 MEDIUM BRICK 1.426 43.600 16 MEDIUM BRICK 1.436 39.100 17 MEDIUM BRICK 1.440 37.000 18 GOOD BRICK 1.447 39.000 19 MEDIUM BRICK 1.450 36.800 20 GOOD BRICK 1.498 42.800 21 GOOD BRICK 1.530 46.600 22 GOOD BRICK 1.612 43.750 23 GOOD BRICK 1.616 49.900 24 GOOD BRICK 1.675 35.000 25 MEDIUM BRICK 1.694 44.500 26 MEDIUM BRICK 1.704 48.300 27 MEDIUM BRICK 1.708 42.900 28 MEDIUM BRICK 1.711 43.500 29 MEDIUM BRICK 1.716 49.600 30 MEDIUM BRICK 1.731 54.000 31 MEDIUM BRICK 1.740 48.500 32 MEDIUM BRICK 1.741 49.900 33 MEDIUM BRICK 1.741 48.900 34 MEDIUM BRICK 1.764 39.500 35 GOOD BRICK 1.833 55.500 36 MEDIUM BRICK 1.934 46.000 37 MEDIUM BRICK 1.977 47.900 38 GOOD BRICK 2.012 52.800 39 GOOD BRICK 2.049 56.350 40 GOOD BRICK 2.054 58.500 41 GOOD BRICK 2.207 61.350 42 GOOD FRAME 2.233 75.000 ; The program: Part 2 - the raw DATA Note ending ;

11. The program: Part 3 - the Procedures PROC PRINT DATA=ONE; RUN; PROC FREQ DATA=ONE; TABLE SP_INT; TITLE3 'Frequency table of Sale Price (\$1000)';RUN; PROC FREQ DATA=ONE; TABLE EXTERIOR; TITLE3 'Frequency table of house Exterior (F&W Table 1.3)'; RUN; PROC FREQ DATA=ONE; TABLE QUALITY; TITLE3 'Frequency table of house Quality (F&W Table 1.4)'; RUN; PROC FREQ DATA=ONE; TABLE SP_INT; TITLE3 'Frequency table of house Sale Price (F&W Table 1.5)'; RUN; PROC CHART DATA=ONE; VBAR QUALITY; TITLE3 'Frequency table of house Quality (F&W Fig 1.3)'; RUN; OPTIONS PS=40; PROC CHART DATA=ONE; VBAR EXTERIOR; TITLE3 'Bar chart of house Exterior (F&W Fig 1.1)'; RUN; PROC CHART DATA=ONE; VBAR SP; TITLE3 'Bar chart of house Sale Price (F&W Fig 1.2)'; RUN; PROC CHART DATA=ONE; HBAR QUALITY; TITLE3 'Bar chart of house Quality (F&W Fig 1.3)'; RUN; OPTIONS PS=61; PROC UNIVARIATE DATA=ONE PLOT; VAR SP; TITLE3 'Frequency table of house Sale Price'; RUN; SAS example (#1 continued)

12. PROC PRINT DATA=ONE; RUN; Analysis of house sale price data Table 1.1 from Freund & Wilson, 1997 OBS OBS QUALITY EXTERIOR DSQF SP SP_INT 1 1 POOR FRAME 0.816 19.000 1 2 2 POOR FRAME 0.907 19.800 1 3 3 POOR FRAME 0.938 15.000 1 4 4 POOR FRAME 1.032 18.900 1 5 5 POOR FRAME 1.100 25.500 2 6 6 POOR FRAME 1.180 25.900 2 7 7 MEDIUM BRICK 1.278 33.500 3 8 8 MEDIUM BRICK 1.289 36.500 3 9 9 MEDIUM BRICK 1.320 38.500 3 10 10 MEDIUM BRICK 1.328 37.000 3 11 11 GOOD BRICK 1.337 34.555 3 12 12 MEDIUM BRICK 1.366 35.500 3 13 13 POOR FRAME 1.382 22.000 2 14 14 MEDIUM BRICK 1.387 39.500 3 15 15 MEDIUM BRICK 1.426 43.600 4 16 16 MEDIUM BRICK 1.436 39.100 3 17 17 MEDIUM BRICK 1.440 37.000 3 18 18 GOOD BRICK 1.447 39.000 3 19 19 MEDIUM BRICK 1.450 36.800 3 20 20 GOOD BRICK 1.498 42.800 4 21 21 GOOD BRICK 1.530 46.600 4 22 22 GOOD BRICK 1.612 43.750 4 23 23 GOOD BRICK 1.616 49.900 4 24 24 GOOD BRICK 1.675 35.000 3 25 25 MEDIUM BRICK 1.694 44.500 4 26 26 MEDIUM BRICK 1.704 48.300 4 27 27 MEDIUM BRICK 1.708 42.900 4 28 28 MEDIUM BRICK 1.711 43.500 4 29 29 MEDIUM BRICK 1.716 49.600 4 30 30 MEDIUM BRICK 1.731 54.000 5 31 31 MEDIUM BRICK 1.740 48.500 4 32 32 MEDIUM BRICK 1.741 49.900 4 33 33 MEDIUM BRICK 1.741 48.900 4 34 34 MEDIUM BRICK 1.764 39.500 3 35 35 GOOD BRICK 1.833 55.500 5 36 36 MEDIUM BRICK 1.934 46.000 4 37 37 MEDIUM BRICK 1.977 47.900 4 38 38 GOOD BRICK 2.012 52.800 5 39 39 GOOD BRICK 2.049 56.350 5 40 40 GOOD BRICK 2.054 58.500 5 41 41 GOOD BRICK 2.207 61.350 6 42 42 GOOD FRAME 2.233 75.000 7 SAS example (#1 continued)

13. PROC FREQ DATA=ONE; TABLE SP_INT; TITLE3 'Frequency table of Sale Price (\$1000)';RUN; SAS example (#1 continued) Analysis of house sale price data Table 1.1 from Freund & Wilson, 1997 Frequency table of Sale Price (\$1000) Cumulative Cumulative SP_INT Frequency Percent Frequency Percent ---------------------------------------------------- 1 4 9.5 4 9.5 2 3 7.1 7 16.7 3 13 31.0 20 47.6 4 15 35.7 35 83.3 5 5 11.9 40 95.2 6 1 2.4 41 97.6 7 1 2.4 42 100.0

14. PROC FREQ DATA=ONE; TABLE EXTERIOR; TITLE3 'Frequency table of house Exterior (F&W Table 1.3)'; RUN; SAS example (#1 continued) Analysis of house sale price data Table 1.1 from Freund & Wilson, 1997 Frequency table of house Exterior (F&W Table 1.3) Cumulative Cumulative EXTERIOR Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ BRICK 34 81.0 34 81.0 FRAME 8 19.0 42 100.0

15. PROC FREQ DATA=ONE; TABLE QUALITY; TITLE3 'Frequency table of house Quality (F&W Table 1.4)'; RUN; SAS example (#1 continued) Analysis of house sale price data Table 1.1 from Freund & Wilson, 1997 Frequency table of house Quality (F&W Table 1.4) Cumulative Cumulative QUALITY Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ GOOD 13 31.0 13 31.0 MEDIUM 22 52.4 35 83.3 POOR 7 16.7 42 100.0

16. PROC FREQ DATA=ONE; TABLE SP; TITLE3 'Frequency table of house Sale Price (F&W Table 1.5)'; RUN; Analysis of house sale price data Table 1.1 from Freund & Wilson, 1997 Frequency table of house Sale Price (F&W Table 1.5) Cumulative Cumulative SP Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 4 9.5 4 9.5 2 3 7.1 7 16.7 3 13 31.0 20 47.6 4 15 35.7 35 83.3 5 5 11.9 40 95.2 6 1 2.4 41 97.6 7 1 2.4 42 100.0 SAS example (#1 continued)

17. PROC CHART DATA=ONE; VBAR QUALITY; TITLE3 'Frequency table of house Quality'; RUN; Analysis of house sale price data Table 1.1 from Freund & Wilson, 1997 Frequency table of house Quality Frequency 22 ê ***** ‚ ***** 21 ê ***** ‚ ***** 20 ê ***** ‚ ***** 19 ê ***** ‚ ***** 18 ê ***** ‚ ***** 17 ê ***** ‚ ***** 16 ê ***** ‚ ***** 15 ê ***** ‚ ***** 14 ê ***** ‚ ***** 13 ê ***** ***** ‚ ***** ***** 12 ê ***** ***** ‚ ***** ***** 11 ê ***** ***** ‚ ***** ***** SAS example (#1 continued) 10 ê ***** ***** ‚ ***** ***** 9 ê ***** ***** ‚ ***** ***** 8 ê ***** ***** ‚ ***** ***** 7 ê ***** ***** ***** ‚ ***** ***** ***** 6 ê ***** ***** ***** ‚ ***** ***** ***** 5 ê ***** ***** ***** ‚ ***** ***** ***** 4 ê ***** ***** ***** ‚ ***** ***** ***** 3 ê ***** ***** ***** ‚ ***** ***** ***** 2 ê ***** ***** ***** ‚ ***** ***** ***** 1 ê ***** ***** ***** ‚ ***** ***** ***** µƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ GOOD MEDIUM POOR QUALITY

18. OPTIONS PS=40; PROC CHART DATA=ONE; VBAR EXTERIOR; TITLE3 'Histogram of house Exterior (F&W Fig 1.1)'; RUN; SAS example (#1 continued) Analysis of house sale price data Table 1.1 from Freund & Wilson, 1997 Bar chart of house Exterior (F&W Fig 1.1) Frequency | ***** | ***** 30 + ***** | ***** | ***** | ***** | ***** 20 + ***** | ***** | ***** | ***** | ***** 10 + ***** | ***** ***** | ***** ***** | ***** ***** | ***** ***** -------------------------------- BRICK FRAME EXTERIOR

19. OPTIONS PS=40; PROC CHART DATA=ONE; VBAR SALEPRIC; TITLE3 'Histogram of house Sale Price (F&W Fig 1.2)'; RUN; Analysis of house sale price data Table 1.1 from Freund & Wilson, 1997 Bar chart of house Sale Price (F&W Fig 1.2) Frequency | ***** | ***** | ***** 15 + ***** | ***** | ***** | ***** | ***** ***** 10 + ***** ***** | ***** ***** | ***** ***** | ***** ***** | ***** ***** ***** 5 + ***** ***** ***** ***** | ***** ***** ***** ***** | ***** ***** ***** ***** | ***** ***** ***** ***** | ***** ***** ***** ***** ***** ***** ------------------------------------------------------------------------- 18 30 42 54 66 78 SP Midpoint SAS example (#1 continued)

20. OPTIONS PS=40; PROC CHART DATA=ONE; HBAR QUALITY; TITLE3 'Histogram of house Quality (F&W Fig 1.3)'; RUN; OPTIONS PS=61; SAS example (#1 continued) Analysis of house sale price data Table 1.1 from Freund & Wilson, 1997 Bar chart of house Quality (F&W Fig 1.3) QUALITY Cum. Cum. Freq Freq Percent Percent | GOOD |************* 13 13 30.95 30.95 | MEDIUM |********************** 22 35 52.38 83.33 | POOR |******* 7 42 16.67 100.00 | -----+----+----+----+-- 5 10 15 20 Frequency

21. GRAPHIC DISPLAYS OF FREQUENCIES using relative frequencies • HISTOGRAM or bar-chart - representation of a frequency table • The area under each bar is proportional to the relative frequency (r.f.) of the class. • FREQUENCY POLYGON a variation of a histogram type plot in which the midpoints of each class relative frequency is connected with a straight line.

22. CHARACTERISTICS OF HISTOGRAMS • When done with relative frequencies, the total area of a graph of relative frequencies is 1.0 • Any subsection of a graph of relative frequencies will have an area such that, 0  subsection area  1.0

23. Summary • Frequencies are a common and useful technique for descriptive statistics • We would usually do the calculations in SAS • The distributions that we will use for hypothesis testing will be frequency distributions