1 / 81

Chapter 3 Data Summary Using Descriptive Measures

KVANLI PAVUR KEELING. Chapter 3 Data Summary Using Descriptive Measures. Chapter Objectives. At the completion of this chapter, you should be able to define and use the following measures: ∙ Measures of Central Tendency: Mean, Median, Mode and Midrange

niel
Télécharger la présentation

Chapter 3 Data Summary Using Descriptive Measures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KVANLI PAVUR KEELING Chapter 3Data Summary Using Descriptive Measures

  2. Chapter Objectives • At the completion of this chapter, you should be able to define and use the following measures: ∙ Measures of Central Tendency: Mean, Median, Mode and Midrange ∙ Measures of Variation: Range, Standard Deviation, Variance, and Coefficient of Variation

  3. Chapter Objectives - Continued • At the completion of this chapter, you should be able to define and use the following measures: ∙ Measures of Position: Percentiles, Quartiles, and z-scores ∙ Measures of Shape: Skewness and Kurtosis

  4. Summarizing a Sample • Chapter 2 described a sample using a graph or chart • This chapter summarizes a sample by crunching a number or two, such as an average • We refer to these number as descriptivemeasures • There are four different types of descriptive measures

  5. Descriptive Measures • There are measures of: • central tendency • variation • position • shape • Consider a sample consisting of the number of purchased textbooks this semester for 5 randomly selected students • The sample values are {6, 9, 7, 23, 5} Here, the sample size is n = 5

  6. Measures of Central Tendency • These are: • mean • median • midrange • mode These determine where the “middle” of the sample is; that is, a “typical” value The mode is that value that occurs the most often

  7. Example • Consider a sample consisting of the number of purchased textbooks this semester for 5 randomly selected students • The sample values are {6, 9, 7, 23, 5} Here, the sample size is n = 5

  8. The Sample Mean • The sample mean is the sample average • Our sample: {6, 9, 7, 23, 5} • The sample mean is books • The symbol for the sample mean is • So, = 10 Read as “x bar”

  9. The Sample Median • Median is the center of the ordered array • Two steps are involved here: • Step 1: • Order the data from smallest to largest • For our sample, this would be {5, 6, 7, 9, 23} • Step 2: • For an odd number of data values in the distribution, Median = Middle data value of the ordered data Here, Md = 7 • In general for n odd, Md is the value Here, this would be the 3rd value

  10. The Sample Median – n is Even • For an even number of data values in the distribution, Median = Sum of the middle two values 2 • Consider this sample: {2, 4, 8, 12, 16, 18} (n = 6) • Here, Md = books • In general for n even, Md is the average of the value and the next one

  11. The Sample Midrange • The midrange is the average of lowest (L) and highest (H) sample values • The symbol for the midrange is Mr • Mr= • The textbook sample is {6, 9, 7, 23, 5} • Here, Mr = This is H This is L

  12. The Sample Mode • The mode (Mo) is that value that occurs the most often in the sample • For the textbook example, there is no mode since there are no repeat values • If there is a 2-way tie, you state that the modes are ____ and ____ • For continuous data, don’t bother looking for a mode

  13. More on the Sample Mode • If your company manufactures clothing, the sample mode is more likely to be of interest rather than the other three measures of central tendency • Example: You company manufactures hats • The statistic of interest in a sample of head sizes would be the most popular head size since we should manufacture more hats of that size • The mean (say, 6.82) would be of little interest • Ditto for the median and midrange

  14. Outliers • In statistics, Outliers in a data set are data values that are very different from other measurements in the data set. • Consider the sample {5, 6, 7, 9, 54} • 54 is called an outliersince it is unusually large and doesn’t fit with the other four values • When trying to determine the middle (a “typical value”), which of these three measures of central tendency ( , Md & Mr)were most affected by this outlier?

  15. The Effect of an Outlier • This outlier had the biggest impact on the midrange • This outlier also had a big impact on the mean • But the outlier had NO effect on the sample median

  16. Outliers and the Median • To illustrate this, suppose the sample values are {5, 6, 7, 9, 54} • The midrange and mean are considerably larger than before (Mr = 24.5, = 16.2 ) • But the sample median is still 7 • It didn’t even change!

  17. Moral to the Story • If you expect (or know) your sample contains outliers, use the median. Otherwise, use the mean. • Examples Incomes usually contain a few very large values. Use the median. House prices in a particular neighborhood typically contain a few very large values. Use the median.

  18. Calculators • Most any calculator will work in this course. • If you prefer to use the TI-83 or TI-86, there are links on the DSCI 2710 website that show you how to crunch numbers on these two calculators. • If you’re going to purchase a calculator, I’d recommend the TI BA II Plus. It works very well in this course and is easy to use.

  19. Measures of Variation • Variation are measures to determine how much the sample values jump around the mean. • These are: • range (R) • variance (s2) • standard deviation (s) • coefficient of variation (CV) The most popular

  20. The Sample Range (R) • The range is the difference of the highest and lowest sample values of a data distribution • R = H – L • Textbook sample: {5, 6, 7, 9, 23} • R = 23 – 5 = 18 • The sample range is a good measure of variation (and easy to compute) for small samples ( n ≤ 10)

  21. The Sample Variance x 5 5 – 10 = -5 25 6 6 – 10 = -4 16 7 7 – 10 = -3 9 9 9 – 10 = -1 1 23 23 – 10 = 13169 0 220 Always is

  22. The Sample Variance • The sum of the squared deviations (220) is then divided by n – 1 (not the sample size (n) as you might expect) • This is the sample variance (s2) • s2 = • In general, s2 =

  23. The Sample Standard Deviation • The sample standard deviation (s) is the square root of the variance • s = • Here, s = • The units on the standard deviation are the same as the units on the sample data • For this example, s = 7.416 books

  24. Using a Calculator • When deriving the standard deviation, we first found the variance and then found the square root of this value • When using a calculator, you reverse this sequence by ∙ entering the sample values ∙ hit the standard deviation key ∙ if you want a variance, hit the x2 key This is the way to do it!

  25. The Sample Coefficient of Variation • The coefficient of variation (CV) is useful for comparing variation in two or more samples • Consider the following two samples: Sample #1 {5, 6, 7, 9, 23} the textbook sample Sample #2 {500, 600, 700, 900, 2300} • Question: Which sample has more variation?

  26. The Sample Coefficient of Variation • For sample #1: x = 10 and s = 7.416 • For sample #2, it turns out that the previous mean and standard deviation are simply multiplied by 100; that is, x = 1000 and s = 741.6 • If you compared the two standard deviations, you might conclude that sample #2 has more variation • But, these sample values are simply larger

  27. The Sample Coefficient of Variation • In fact, relative to the mean, the variation in these two samples is the same • To effectively compare the variation in these two samples, you should compute the coefficient of variation (CV) for each sample, where CV = · 100

  28. The Sample Coefficient of Variation • For sample #1, CV = · 100 = 74.16 • For sample #2, CV = · 100 = 74.16 • For both samples, the standard deviation is 74.16% of the mean • MORAL: Don’t compare sample standard deviations unless the sample means are about the same

  29. Describing a Population • Populations also have a mean, variance, and standard deviation Population (size is N) The population mean is μ (mu, pronounced “myoo”) and the population standard deviation is σ (sigma) The sample mean is and the sample standard deviation is s Sample (size is n)

  30. The Dreaded Formulas Sample Population You divide by N (not N-1) σ2 is the population variance

  31. Question Belleview College must make a report to the budget committee about the average credit hour load a full-time student carries. ( A 12-credit-hour load is the minimum requirement for full-time status. For the same tuition, students may take up to 20 credit hours.) A random sample of 20 students yielded the following information (in credit hours): 15 12 15 16 12 18 20 19 12 15 18 14 16 17 15 19 12 13 12 15 • Find the median credit hour load. • Find the modal credit hour load.

  32. Question • Cabela’s in Sidney, Nebraska, is a very large outfitter that carries a broad selection of fishing tackle. It markets its products nationwide through a catalog service. A random sample of 10 spinners from Cabela’s extensive spring catalog gave the following prices (in dollars): 1.69 1.49 3.09 1.79 1.39 2.89 1.49 1.39 1.49 1.99 Compute the CV for the spinner prices at Cabela’s.

  33. Question Big Blossom Greenhouse was commissioned to develop an extra large rose for the Rose Bowl Parade. A random sample of blossoms from Hybrid A bushes yielded the following diameters (in inches) for mature peak blooms. 2 3 3 8 10 10 Find the sample variance and standard deviation.

  34. Question Certain kinds of tumors tend to recur. The following data represent the lengths of time, in months, for a tumor to recur after chemotherapy. Using five classes, construct a frequency distribution table (class number, class, frequency and relative frequency) for the following data set. 19 18 17 1 21 22 54 46 25 49 50 1 59 39 43 39 5 9 38 18 14 45 54 46 50 29 12 19 36 38 40 43 41 10 50 41 25 19 39 27 20 59

  35. Measures of Position • Measures of position include • Percentiles ∙ Special percentiles • z-scores These are called quartiles

  36. Percentiles (the most common measure of position) • Consider the 50 aptitude test scores introduced in Chapter 2 These values must be sorted Table 3.2

  37. Percentiles • There are two rules to apply here • What is the 35th percentile? This uses Rule #1 • To find: 50 · .35 = 17.5 • Rule #1: Round this product up (always up) • So, the 35th percentile is the 18th value in the ordered array Rule #1 applies when this is not a counting number

  38. The 35th Percentile

  39. Percentiles • What is the 60th percentile? This uses Rule #2 • To find: 50 · .60 = 30 • Rule #2: The percentile is the average of this value (the 30th value) and the next one (the 31st value) Rule #2 applies when this is a counting number

  40. The 60th Percentile

  41. Quartiles • These are special percentiles • There are three quartiles (Q1, Q2, and Q3) • Q1 is the 25th percentile • Q2 is the 50th percentile • Q3 is the 75th percentile

  42. Quartiles • Using the 50 aptitude scores, determine Q1 • This is the 25th percentile: 50 · .25 = 12.5 • So, Q1 is the 13th value in the ordered array • This value is 46 Not a counting number. So, use Rule #1

  43. Quartiles • Q2 is the 50th percentile: 50 · .50 = 25 • Q2 is the average of the 25th and the 26th values in the ordered array • Q2 = (61 + 63)/2 = 62 • This is also the sample median • These two rules guarantee that Q2 is always equal to the sample median (and the 50th percentile) This is a counting number. So, use Rule #2

  44. Quartiles • Q3 is the 75th percentile: 50 · .75 = 37.5 • So, Q3 is the 38th value in the ordered array • This value is 75 Not a counting number. So, use Rule #1 It was just a fluke that the 75th percentile was equal to 75

  45. Another Measure of Position • A z-score is another measure of position • Every value in your sample has a corresponding z-score • How to find: z-score = where x is the sample value, x is the sample mean, and s is the sample standard deviation • The value of a z-score is how many standard deviations that sample value is to the left or right of the sample mean

  46. Finding a z-score =60.36 s = 18.61

  47. Finding a z-score • The corresponding z-score is • 90 is 1.59 standard deviations to the right of the mean • A z-score is positive if the sample value lies to the right of the mean and is negative if the sample value lies to the left of the mean • Typically, about half the z-scores will be positive and about half will be negative

  48. Question Let us consider these sets of data from a ransom sample of blossoms from Hybrid A bushes: • 3 3 8 10 10 Find the corresponding z-scores for sample values 3 and 8.

  49. Interpreting a z-score • These results are usually true Your z-score is 2.3 Approx. 68% z-score -3 -2 -1 0 1 2 3 Approx. 95% Nearly all

  50. Interpreting a z-score - Assumptions • The previous slide is called the Empirical Rule • This rule assumes that the population from which you got the sample is bell-shaped • This means that if you were able to get the entire population and make a histogram of it, it would resemble the histogram on the next slide. • This is generally (approximately) true – but not always

More Related