1 / 75

Chapter 3 Data Summary Using Descriptive Measures

KVANLI PAVUR KEELING. Chapter 3 Data Summary Using Descriptive Measures. Chapter Objectives. At the completion of this chapter, you should be able to define and use the following measures: ∙ Measures of Central Tendency: Mean, Median, Mode and Midrange

billie
Télécharger la présentation

Chapter 3 Data Summary Using Descriptive Measures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KVANLI PAVUR KEELING Chapter 3Data Summary Using Descriptive Measures

  2. Chapter Objectives • At the completion of this chapter, you should be able to define and use the following measures: ∙ Measures of Central Tendency: Mean, Median, Mode and Midrange ∙ Measures of Variation: Range, Standard Deviation, Variance, and Coefficient of Variation

  3. Chapter Objectives - Continued • At the completion of this chapter, you should be able to define and use the following measures: ∙ Measures of Position: Percentiles, Quartiles, and z-scores ∙ Measures of Shape: Skewness and Kurtosis

  4. Summarizing a Sample • Chapter 2 described a sample using a graph or chart • This chapter summarizes a sample by crunching a number or two, such as an average • We refer to these number as descriptivemeasures • There are four different types of descriptive measures

  5. Descriptive Measures • There are measures of: • central tendency • variation • position • shape • Consider a sample consisting of the number of purchased textbooks this semester for 5 randomly selected students • The sample values are {6, 9, 7, 23, 5} Here, the sample size is n = 5

  6. Measures of Central Tendency • These are: • mean • median • midrange • mode These determine where the “middle” of the sample is; that is, a “typical” value The mode is that value that occurs the most often

  7. The Sample Mean • The sample mean is the sample average • Our sample: {6, 9, 7, 23, 5} • The sample mean is books • The symbol for the sample mean is • So, = 10 Read as “x bar”

  8. The Sample Median • To find the median, you must first put the values in order from smallest to largest • For our sample, this would be {5, 6, 7, 9, 23} • When n is odd, the median is the value in the middle of the ordered data • The symbol for the median is Md • Here, Md = 7 • In general for n odd, Md is the value Here, this would be the 3rd value

  9. The Sample Median – n is Even • Consider this sample: {2, 4, 8, 12, 16, 18} (n = 6) • When n is even, the median is the average of the middle two values • Here, Md = books • In general for n even, Md is the average of the value and the next one

  10. The Sample Midrange • The midrange is the average of lowest (L) and highest (H) sample values • The symbol for the midrange is Mr • Mr= • The textbook sample is {6, 9, 7, 23, 5} • Here, Mr = This is H This is L

  11. The Sample Mode • The mode is that value that occurs the most often in the sample • For the textbook example, there is no mode since there are no repeat values • If there is a 2-way tie, you state that the modes are ____ and ____ • For continuous data, don’t bother looking for a mode

  12. More on the Sample Mode • If your company manufactures clothing, the sample mode is more likely to be of interest rather than the other three measures of central tendency • Example: You company manufactures hats • The statistic of interest in a sample of head sizes would be the most popular head size since we should manufacture more hats of that size • The mean (say, 6.82) would be of little interest • Ditto for the median and midrange

  13. Choosing between the mean, median, and midrange • Consider the textbook sample {5, 6, 7, 9, 23} • The value of 23 is called an outlier since it is unusually large and doesn’t fit with the other four values • When trying to determine the middle (a “typical value”), which of these three measures of central tendency were most affected by this outlier?

  14. The Effect of an Outlier • This outlier had the biggest impact on the midrange • Its value is 14, which exceeds 4 of the 5 sample values • This outlier also had a big impact on the mean • Its value is 10, which also exceeds 4 of the 5 sample values • But the outlier had NO effect on the sample median

  15. Outliers and the Median • To illustrate this, suppose the sample values are {5, 6, 7, 9, 2300} • The midrange and mean are considerably larger than before • But the sample median is still 7 • It didn’t even change!

  16. Moral to the Story • If you expect (or know) your sample contains outliers, use the median. Otherwise, use the mean. • Examples Incomes usually contain a few very large values. Use the median. House prices in a particular neighborhood typically contain a few very large values. Use the median.

  17. Calculators • Most any calculator will work in this course. • If you prefer to use the TI-83 or TI-86, there are links on the DSCI 2710 website that show you how to crunch numbers on these two calculators. • If you’re going to purchase a calculator, I’d recommend the TI BA II Plus. It works very well in this course and is easy to use.

  18. Measures of Variation • These are: • range (R) • variance (s2) • standard deviation (s) • coefficient of variation (CV) The most popular

  19. The Sample Range (R) • The range is the difference of the highest and lowest sample values • R = H – L • Textbook sample: {5, 6, 7, 9, 23} • R = 23 – 5 = 18 • The sample range is a good measure of variation (and easy to compute) for small samples ( n ≤ 10)

  20. The Sample Variance x 5 5 – 10 = -5 25 6 6 – 10 = -4 16 7 7 – 10 = -3 9 9 9 – 10 = -1 1 23 23 – 10 = 13169 0 220 Always is

  21. The Sample Variance • The sum of the squared deviations (220) is then divided by n – 1 (not the sample size (n) as you might expect) • This is the sample variance (s2) • s2 = • In general, s2 =

  22. The Sample Standard Deviation • The sample standard deviation (s) is the square root of the variance • s = • Here, s = • The units on the standard deviation are the same as the units on the sample data • For this example, s = 7.416 books

  23. Using a Calculator • When deriving the standard deviation, we first found the variance and then found the square root of this value • When using a calculator, you reverse this sequence by ∙ entering the sample values ∙ hit the standard deviation key ∙ if you want a variance, hit the x2 key This is the way to do it!

  24. The Sample Coefficient of Variation • The coefficient of variation (CV) is useful for comparing variation in two or more samples • Consider the following two samples: Sample #1 {5, 6, 7, 9, 23} the textbook sample Sample #2 {500, 600, 700, 900, 2300} • Question: Which sample has more variation?

  25. The Sample Coefficient of Variation • For sample #1: x = 10 and s = 7.416 • For sample #2, it turns out that the previous mean and standard deviation are simply multiplied by 100; that is, x = 1000 and s = 741.6 • If you compared the two standard deviations, you might conclude that sample #2 has more variation • But, these sample values are simply larger

  26. The Sample Coefficient of Variation • In fact, relative to the mean, the variation in these two samples is the same • To effectively compare the variation in these two samples, you should compute the coefficient of variation (CV) for each sample, where CV = · 100

  27. The Sample Coefficient of Variation • For sample #1, CV = · 100 = 74.16 • For sample #2, CV = · 100 = 74.16 • For both samples, the standard deviation is 74.16% of the mean • MORAL: Don’t compare sample standard deviations unless the sample means are about the same

  28. Describing a Population • Populations also have a mean, variance, and standard deviation Population (size is N) The population mean is μ (mu, pronounced “myoo”) and the population standard deviation is σ (sigma) The sample mean is and the sample standard deviation is s Sample (size is n)

  29. The Dreaded Formulas Sample Population You divide by N (not N-1) σ2 is the population variance

  30. Measures of Position • Measures of position include • Percentiles ∙ Special percentiles • z-scores These are called quartiles

  31. Percentiles • Consider the 50 aptitude test scores introduced in Chapter 2 These values must be sorted Table 3.2

  32. Percentiles • There are two rules to apply here • What is the 35th percentile? This uses Rule #1 • To find: 50 · .35 = 17.5 • Rule #1: Round this product up (always up) • So, the 35th percentile is the 18th value in the ordered array Rule #1 applies when this is not a counting number

  33. The 35th Percentile

  34. Percentiles • What is the 60th percentile? This uses Rule #2 • To find: 50 · .60 = 30 • Rule #2: The percentile is the average of this value (the 30th value) and the next one (the 31st value) Rule #2 applies when this is a counting number

  35. The 60th Percentile

  36. Quartiles • These are special percentiles • There are three quartiles (Q1, Q2, and Q3) • Q1 is the 25th percentile • Q2 is the 50th percentile • Q3 is the 75th percentile

  37. Quartiles • Using the 50 aptitude scores, determine Q1 • This is the 25th percentile: 50 · .25 = 12.5 • So, Q1 is the 13th value in the ordered array • This value is 46 Not a counting number. So, use Rule #1

  38. Quartiles • Q2 is the 50th percentile: 50 · .50 = 25 • Q2 is the average of the 25th and the 26th values in the ordered array • Q2 = (61 + 63)/2 = 62 • This is also the sample median • These two rules guarantee that Q2 is always equal to the sample median (and the 50th percentile) This is a counting number. So, use Rule #2

  39. Quartiles • Q3 is the 75th percentile: 50 · .75 = 37.5 • So, Q3 is the 38th value in the ordered array • This value is 75 Not a counting number. So, use Rule #1 It was just a fluke that the 75th percentile was equal to 75

  40. Another Measure of Position • A z-score is another measure of position • Every value in your sample has a corresponding z-score • How to find: z-score = where x is the sample value, x is the sample mean, and s is the sample standard deviation • The value of a z-score is how many standard deviations that sample value is to the left or right of the sample mean

  41. Finding a z-score =60.36 s = 18.61

  42. Finding a z-score • The corresponding z-score is • 90 is 1.59 standard deviations to the right of the mean • A z-score is positive if the sample value lies to the right of the mean and is negative if the sample value lies to the left of the mean • Typically, about half the z-scores will be positive and about half will be negative

  43. Interpreting a z-score • These results are usually true Your z-score is 2.3 Approx. 68% z-score -3 -2 -1 0 1 2 3 Approx. 95% Nearly all

  44. Interpreting a z-score - Assumptions • The previous slide is called the Empirical Rule • This rule assumes that the population from which you got the sample is bell-shaped • This means that if you were able to get the entire population and make a histogram of it, it would resemble the histogram on the next slide. • This is generally (approximately) true – but not always

  45. Interpreting a z-score - Assumptions

  46. Another Measure of Variation • The interquartile range (IQR) is another measure of variation and is the difference of the third and first quartiles • IQR = Q3 – Q1 • In the aptitude test scores, Q3 = 75 (the 75th percentile) and Q1= 46 (the 25th percentile) • IQR = 75 – 46 = 29 and so the middle 50% of the sample values cover a range of 29 • The larger this is, the more variation there is in the sample data

  47. Measures of Shape • There are two measures in this category • skewness – A measure of the symmetry in the sample values (histogram) • kurtosis – A measure of how peaked the sample histogram is

  48. Kurtosis • We’ll give this a very light treatment (no formulas). These two histograms illustrate high and low kurtosis High kurtosis – very peaked Low kurtosis – very flat

  49. Skewness • Pearson’s measure of skewness Sk = ∙ Ranges from -3 to 3 ∙ Not the formula Excel uses • The next three slides demonstrate what Sk tells you about the shape of the histogram Subtract the mean and median, multiply by 3, and divide by the standard deviation

  50. Frequency x = Md Histogram of Symmetric Data Sk ≈ 0 Figure 3.6

More Related