DESCRIPTIVE STATISTICS

# DESCRIPTIVE STATISTICS

Télécharger la présentation

## DESCRIPTIVE STATISTICS

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. DESCRIPTIVE STATISTICS UNIT 4: Measures of Position TouchText • Z-Scores • Percentile Rankings Problems and Exercises Next

2. Position of a Single Observation Within a Distribution If a student were told that he/she had received an 83 on an exam, the questions that student would most likely (and rightly) ask are: Dictionary • 83 out of how many possible? • Is an 83 good or bad? • What was the highest score in the class? • What was the lowest score in the class? • What was the average score in the class? • Etc. These are questions of position – of one observation (in this case, a single test score) relative to the rest of the population or sample. Take Notes Back Next

3. Z-Score: Position of a Single Observation Standardized, Relative to the Mean and Standard Deviation One way of describing the position of a single variable relative to a distribution is to report its distance from the mean. To make this independent of units or scale, we measure this distance in units of standard deviation. Dictionary An observation’s Z-score measures how many standard deviations it is away from the mean: Take Notes Back Next

4. Calculating the Z-Score To calculate the Z-score, it is first necessary to calculate the distribution’s mean and standard deviation; then, calculate a specific observation’s Z-score as above. Dictionary *Because the deviations sum to zero, so too will the z-scores. Take Notes Back Next

5. Observations Measured in Standard Deviations Away From the Mean In the following exercises, calculate how many standard deviations away from the mean is each observation. Dictionary Example: Take Notes Back Next

6. Observations Measured in Standard Deviations Away From the Mean In the following exercises, calculate how many standard deviations away from the mean is each observation. Dictionary Solutions: Zi = (30 – 40)/10 = -10/10 = -1 s below the mean. Zi = (100 – 100)/6 = 0/6 = 0 s exactly at the mean. Zi = (82.5 – 80)/5 = 2.5/5 = 0.5 s above the mean. Zi = (295 – 300)/25 = -5/25 = -0.2 s below the mean. Zi – (760 – 1000)/120 = -240/120 = -2 s below the mean. Take Notes Back

7. Z-Score Within a Distribution How good or bad was the score of 83 in this distribution? Dictionary Score = 83 high score = 94 (2 students) low score = 56 mean = 75.3 Take Notes Back Next

8. Z-score: Example In the previous example, the Z-score for 83 is: Dictionary Score = 83 high score = 94 (2 students) low score = 56 mean = 75.3 Z83 = 1.033 So, we can see that a positive Z-score of 1.033 is above average. What, if anything, more precisely can we say from the Z-score alone? Take Notes Back Next

9. What Does and Doesn’t the Z-score Say? It is generally not possible to translate an observation’s Z-score into a specific ranking or position in an ordered data set unless we know the exact distribution of that data. However, we can make the following statements: Dictionary • Z-score = 0: Exactly at the mean (average) • Z-score > 0: Above the mean (average). Also, the greater the Z-score, the greater (larger, higher, etc.) and rarer is that score. • Z-score < 0: Below the mean (average). Also, the more negative the Z-score, the lesser (smaller, lower, etc.) and rarer is that score. * In a subsequent unit, we introduce the standard normal distribution, in which Z-scores can be translated exactly into a percentile ranking. Take Notes Back Next

10. Distributions of Z-Scores Not only can a single observation be translated into a Z-score, but the entire data set can be re-characterized in terms of Z-scores. Dictionary The following example (task icon, below) was introduced in the previous unit. It is a sample of tourist expenditures of passengers leaving Bangkok Airport. On completion of the task(s), one will be able to say, in this example, how many of the observations are within one, two or three standard deviations of the mean (Z-score < 1, Z-score < 2, Z-score < 3). The ability to do this provides more information about the relative ranking and rarity of the observation. Take Notes Back Next

11. Example: Calculating Observations as Z-Scores Example: Expenditures of passengers leaving Bangkok Airport On your previous MS Excel table (linked below), you should already have a column which calculates deviations (Xi – X). From this, create another column which calculates deviations per unit of standard deviation ((Xi – X)/s). Use the Data Analysis > Histogram feature to create a frequency distribution. Use bins of 0.5 between -3 and +3 (i.e. -3.0, -2.5, -2, -1.5,…., 0, 0.5, ….2.5, 3.0). Once the histogram is created, change the distribution to a relative distribution, so that you can calculate the percentage of scores within each group of data (bin). Dictionary Take Notes Back Next

12. Calculating Observations as Deviations From the Mean This is what your table should look like (formulas below). Dictionary Cells E6, E7, etc. Cell H\$100 is the standard deviation Cell E\$100 is the mean Take Notes Back

13. Calculating Observations as Deviations From the Mean * If you did the exercise properly, you should have gotten a histogram similar to the one depicted at right. Dictionary Relatively how many scores are within 1 standard deviation of the mean? Two standard deviations? Three standard deviations? Does this distribution appear to be symmetrical or skewed? Can you confirm this using statistics (mean and median)? Take Notes Back Next

14. Calculating Observations as Deviations From the Mean The distribution results are shown below right. This distribution appears to be positively skewed. Dictionary Positive skewness can be confirmed by generating descriptive statistics with the Data Analysis add-in, and noting that the mean (5,085 baht) is greater than the median (4,377 baht). Positive kurtosis in the distribution confirms this. Take Notes Back

15. Using Z-Scores Backwards: Retrieving the Raw Data From the Z-Score As we’ve seen, one can calculate a Z-score from an observation’s value, knowing the mean and standard deviation of the distribution. Thus, it is also possible to retrieve the observation’s raw value, knowing its Z-score and the distribution’s mean and standard deviation. Dictionary Example: In a sample distribution with mean X = 200 and standard deviation s = 25, a Z-score of Zi = -0.2 translates into a raw score of Xi = -0.2(25) + 200 = 195. Take Notes Back Next

16. Percentiles Observations in quantitative data sets can be ordered from smallest/lowest to largest/highest, and then assigned percentiles (hundredths). An observation’s percentile indicates the percentage of observations that are below or lower than that particular observation. Dictionary For example, in an ordered database (smallest to largest values), the 64th percentile P64 would appear as follows: Take Notes Back Next

17. Calculating Percentiles Percentiles are calculated as follows: Dictionary • Adding 0.5 to the numerator of the calculation essentially splits Xi in half, placing one half of Xi in the “below Xi” category, and the other half in the “above Xi”category. This way, the two categories sum to 100%. • When there are two or more identical scores of Xi, the percentile (below) calculation only measures scores strictly below Xi, essentially treating the observation as the lowest of all identical scores. The percentile ranking is strictly below the score, not “at or below”. Take Notes Back Next

18. Calculating Percentiles: Example Dictionary Example: Using the same data sate introduced earlier, calculate the percentile of the score of 88. % above Answer: Rank the scores, revealing nine scored of 14 below X = 88. The percentile is therefore (9 + 0.5)/14 = 67.86%. % below Take Notes Back Next

19. Calculating Percentiles: Further Examples Examples: Using the same data sate as in the preceding example, calculate the percentile rankings of the following scores: 97, 81, 68 and 47. Dictionary Take Notes Back Next

20. Calculating Percentiles: Further Examples Examples: Using the same data sate as in the preceding example, calculate the percentile rankings of the following scores: 97, 81, 68 and 47. Dictionary Answers: P97 = {(13.5)/14}x100% = 96.43%. P81= {(7.5)/14}x100% = 53.57%. P68= {(3.5)/14}x100% = 25.00%. P47= {(0.5)/14}x100% = 3.57%. * Because (only) half of each score is included in the “below…” category, the highest score doesn’t have a 100% percentile ranking, and the lowest score does not have a 0% percentile ranking. Take Notes Back

21. Using MS Excel for Ranks and Percentiles MS Excel’s Data Analysis Add-In (Data > Data Analysis > Rank and Percentile) also calculates rank and percentile from a set of data. However, Excel (a) ranks the data from highest (#1) to lowest (#n) and (b) includes the whole score (rather than half) in the numerator of their calculation. Dictionary • When there is more than one instance of the same value, MS Excel’s formula also calculates the percentile relative to scores strictly below the observation, plus the (one only) observation itself. • The two alternate formulas converge in value for large n. Take Notes Back

22. Raw Data From Percentiles: Example If one wants to know a raw score associated with a particular percentile, use the percentile formula backwards: Dictionary However, instead of simply adding 0.5, make the following adjustment, (a) if the Raw Score Rank is not a whole number, simply round up to the next whole number; or, (b) if the Raw Score Rank is a whole number, get the average if it a and the next highest ordered whole number in the ordered set. Take Notes Back Next

23. Raw Data From Percentiles: Example Dictionary Example: Using the same data sate introduced earlier, calculate the raw data associated with a percentile ranking of 67.86. % above Answer: The unadjusted Raw Score is (67.86 x 14)/100 = 9.5. Rounding up gives a rank of 10. The 10th highest score is X = 88; an exact reverse as previously. % below Take Notes Back Next

24. Calculating Raw Scores From Percentiles: Further Examples Examples: Using the same data sate as in the preceding examples, calculate the raw data score associated with the following percentiles: 94%, 80%, 50%, 39%. Dictionary Take Notes Back Next

25. Calculating Raw Scores From Percentiles: Further Examples Examples: Using the same data sate as in the preceding examples, calculate the raw data score associated with the following percentiles: 94, 80, 50, 39. Dictionary Rank and Raw Score Round Up: Rank = 14th (top score of 97). Round Up: Rank = 12th (score of 89). Average Up: Rank = Average 7th and 8th.(score of (78+81)/2 = 79.5). Round Up: Rank = 6th (score of 76). Take Notes Back

26. Special Percentiles: The Median Recall that the median is one of three measures of central tendency. It is the score with an equal number of scores below it as above it. In percentile terms, the median is simply the 50th percentile. Dictionary Median: If n is an odd number, rounding up would be to the median score. If n is an even number, averaging it with the next highest score would be taking the average of the two most middle scores. These steps are just as the median rule suggested. Previous example: n = 14 (even), so the median is the average of the middle most 7th and 8th ranked scores: (78 + 81)/2) = 79.5, just as above. Take Notes Back Next

27. Special Percentiles: The Median and the Interquartile Range The Median, the middle ranked score with exactly one half scores higher and one half scores lower, can also be described as: (a) the 50th percentile, (b) the 5thdecile, or (c) the 2nd quartile. Dictionary Median The Interquartile Range (IQR) is the range of the middle two quartiles (Q3 – Q1). IQR Take Notes Back Next

28. Identifying “Outliers”: Using the Interquartile Range An Outlieris an extreme score either far below or far above the other scores, and is viewed as non-representative of the data set. Outliers can result in statistics mis-representing the data set, and must be dealt with. Many times, outliers are dealt with by focusing on only the interquartile range of the data set. Dictionary Outliersare often identified using some rule comparing raw scores to the Interquartile Range. For example: (low) Scores less than: Q1 – 0.5 x (Q3 – Q1) and (high) Scores more than: Q3 + 0.5 x (Q3 – Q1) Can be considered outliers. IQR Outliers Outliers Take Notes Back Next

29. Special Percentiles: Deciles and Quartiles Decilesdivide a data set into ten ordered and ranked subsets of 10% each of the data set. There are nine deciles markers needed to divide the whole data set into ten parts. Quartilesdivide a data set into four ordered and ranked quarters. Three quartile markers are needed to divide the whole data set into four quarters. Dictionary To calculate decile and quartile scores, just use the formula introduced above. (A whole number) So the score is the average of the 55th and 56th ranked scores. Example: In a data set of n = 220, the first quartile would be: Take Notes Back Next

30. Ignoring Outliers: The Interquartile Range Decilesdivide a data set into ten ordered and ranked subsets of 10% each of the data set. There are nine deciles markers needed to divide the whole data set into ten parts. Quartilesdivide a data set into four ordered and ranked quarters. Three quartile markers are needed to divide the whole data set into four quarters. Dictionary To calculate decile and quartile scores, just use the formula introduced above. Example: In a data set of n = 220, the first quartile would be: In this example, because it is a whole number, the score is the average of the 55th and 56th ranked scores (from lowest). Take Notes Back Next

31. Box Plots A Box Plot illustrates the interquartile range, the median, and the high and low values (excluding outliers) of a data set. Outliers are separately identified. Dictionary Example: low score = 46; Q1 (median) = 60; Q3 = 71 high score (excluding outlier) = 85; outlier = 98. Because the median is towards the left of the (yellow) interquartile box, this data set would appear to be positively skewed. * Box plots provide a useful visual aid comparing two or more data sets. However, they are not part of the MS Excel chart menu, and are not utilized here. Take Notes Back Next

32. End of Unit 4 Questions and Problems The following problems require the calculation of various statistics using MS Excel. The problems are linked to actual Excel spreadsheets, where students should do their work. xxxx Dictionary Take Notes Back End