1 / 44

Chapter 3 Describing Data Using Numerical Measures

Business Statistics: A Decision-Making Approach 7 th Edition. Chapter 3 Describing Data Using Numerical Measures. Chapter Goals. After completing this chapter, you should be able to: Compute and interpret the mean, median, and mode for a set of data

vita
Télécharger la présentation

Chapter 3 Describing Data Using Numerical Measures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Business Statistics: A Decision-Making Approach 7th Edition Chapter 3Describing Data Using Numerical Measures

  2. Chapter Goals After completing this chapter, you should be able to: • Compute and interpret the mean, median, and mode for a set of data • Compute the range, variance, and standard deviation and know what these values mean • Construct and interpret a box and whisker graph • Compute and explain the coefficient of variation and z scores • Use numerical measures along with graphs, charts, and tables to describe data

  3. Chapter Topics • Measures of Center and Location • Mean, median, mode • Other measures of Location • Weighted mean, percentiles, quartiles • Measures of Variation • Range, interquartile range, variance and standard deviation, coefficient of variation • Using the mean and standard deviation together • Coefficient of variation, z-scores

  4. Summary Measures Describing Data Numerically Center and Location Other Measures of Location Variation Mean Range Percentiles Median Interquartile Range Quartiles Mode Variance Weighted Mean Standard Deviation Coefficient of Variation

  5. Measures of Center and Location Center and Location Mode Weighted Mean Mean Median

  6. Mean (Average) (continued) • The most common measure of central tendency • Mean = sum of values divided by the number of values • Affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 Mean = 4

  7. Median • In an ordered array, the median is the “middle” number, i.e., the number that splits the distribution in half • The median is not affected by extreme values • Useful when data are highly skewed 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Median = 3 Median = 3

  8. Meanis generally used, unless extreme values (outliers) exist Then Medianis often used, since the median is not sensitive to extreme values. Example: Realtors say “Median Home Prices”….not average – less sensitive to outliers Which measure of location is the “best”?

  9. Shape of a Distribution based on Mean and Median • Describes how data is distributed • Symmetric or skewed Right-Skewed Symmetric Left-Skewed Mean < Median Mean = Median Median <Mean (Longer tail extends to left) (Longer tail extends to right)

  10. Mode • The value that is repeated most often • Not affected by extreme values • Used for either numerical or categorical data • There may be no mode • There may be several modes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6 Mode = 5 No Mode

  11. Five houses on a hill by the beach Implication House Prices: $2,000,000 500,000 300,000 100,000 100,000

  12. Mean: ($3,000,000/5) = $600,000 Median: middle value of ranked data = $300,000 Mode: most frequent value = $100,000 Implication House Prices: $2,000,000 500,000 300,000 100,000 100,000 Sum 3,000,000

  13. Practice • From the class website, download “describing numerical data using Excel.” • Try followings: • Mean vs. Median • Hockey • Shoes

  14. Weighted Mean • Used when values are grouped by frequency or relative importance (i.e., GPA) Example: Sample of 26 Repair Projects Weighted Mean Days to Complete:

  15. Other Location Measures • A percentile is a measure that tells us what percent of the total frequency scored at or below that measure.  • SAT score 90th percentile: the score is as high as or higher than 90% of other students who took the SAT test. Other Measures of Location Percentiles Quartiles • When statistical data is ordered sequentially, the data can be divided into 4 groups of data. • Each of the groups of data is separated by a line called a Quartile. • Example 3-8 on pp. 99

  16. Percentiles • Example 1:Find the 60th percentile in an ordered array of 19 values. So, use value in the i = 12th position If i is not an integer, round offto the next higher integer value. If i is an integer, the pth percentile is the average of the values in position i and i+1.

  17. Quartiles • Quartiles split the ranked data into 4 equal groups: • Note that the second quartile (the 50th percentile) is the median 25% 25% 25% 25% Q1 Q2 Q3

  18. Quartiles • Example: Find the first quartile (25th percentile) Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22 (n = 9) Q1 = 25th percentile, so find i : i = (9) = 2.25 so round up and use the value in the 3rd position: Q1 = 13 25 100

  19. Practice • Use “describing numerical data using Excel” file, try following: • Salary

  20. Box and Whisker Plot • Statistics assumes that your data points (the numbers in your list) are clustered around some central value. • The "box" in the box-and-whisker plot contains and highlights the middle half of data points. • It is useful for quickly finding outliers - data points out of line with the rest of the data set. • The Box and Whisker plot can be shown in either vertical or horizontal format. • Example 3-9 on pp.100

  21. Box and Whisker Plot • Outliers are plotted outside the calculated limits and the center box extends from Q1 to Q3. 25% 25% 25% 25% * * Outliers Lower 1st Median 3rd Upper Limit Quartile Quartile Limit The lower limit is Q1 – 1.5 (Q3 – Q1) The upper limit is Q3 + 1.5 (Q3 – Q1)

  22. Box-and-Whisker Plot Example • Below is a Box-and-Whisker plot for the following data: 0 2 2 2 3 3 4 5 6 11 27 • This data is right skewed, as the plot depicts Min Q1 Q2 Q3 Max * 0 2 3 6 12 27 Upper limit = Q3 + 1.5 (Q3 – Q1) = 6 + 1.5 (6 – 2) = 12 27 is above the upper limit so is shown as an outlier

  23. Distribution Shape and Box and Whisker Plot Left-Skewed Symmetric Right-Skewed Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3

  24. Practice • Using Free Online Tool • Using the data set on the website, develop a Box-and-Whisker Plot (free online tool).

  25. Measures of Variation Variation Range Variance Standard Deviation Coefficient of Variation Population Standard Deviation Interquartile Range Population Variance Sample Variance Sample Standard Deviation

  26. Measures of variation give information on the spread of the data values. Variation Same center, different variation

  27. Range (refer to Measures of Variability) • Simplest measure of variation • Difference between the largest and the smallest observations: Range = xmaximum – xminimum Example: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13

  28. Disadvantages of the Range • Ignores the way in which data are distributed • Sensitive to outliers (extreme values) 7 8 9 10 11 12 7 8 9 10 11 12 Range = 12 - 7 = 5 Range = 12 - 7 = 5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 Range = 5 - 1 = 4 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = 120 - 1 = 119

  29. Interquartile Range (refer to Measures of Variability) • Eliminate range’s susceptibility to extreme values • Can eliminate some outlier problems • Eliminate some high-and low-valued observations and calculate the range from the remaining values. • Interquartile range = 3rd quartile – 1stquartile • Example 3-10 on pp.108

  30. Interquartile Range Example Example: Median (Q2) X X Q1 Q3 maximum minimum 25% 25% 25% 25% 12 30 45 57 70 Interquartile range = 57 – 30 = 27

  31. Variance • Average of squared deviations of values from the mean • Populationvariance: • Samplevariance:

  32. Standard Deviation • Most commonly used measure of variation • Shows variation about the mean • Has the same units as the original data • Populationstandard deviation: • Samplestandard deviation:

  33. Comparing Standard Deviations Same mean, but different standard deviations: Data A Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21 Data B Mean = 15.5 s = .9258 11 12 13 14 15 16 17 18 19 20 21 Data C Mean = 15.5 s = 4.57 11 12 13 14 15 16 17 18 19 20 21

  34. Practice • Try the “Variance Vs. Standard Deviation” link on the class website • Use “describing numerical data using Excel” file, try following: • Std.

  35. Coefficient of Variation • The coefficient of variation measures the relative variation for distributions with different means. • In finance, CV measures the relative risk of a stock portfolio • It is usually expressed as a percentage. PopulationSample

  36. Comparing Coefficients of Variation • Stock A: • Average price last year = $50 • Standard deviation = $5 • Stock B: • Average price last year = $100 • Standard deviation = $5 Both stocks have the same standard deviation, but stock B is less variable relative to its price

  37. The Empirical Rule • If the data distribution is bell-shaped, then the interval: • contains about 68% of the values in the population or the sample 68%

  38. The Empirical Rule • contains about 95% of the values in the population or the sample • contains about 99.7% of the values in the population or the sample 95% 99.7%

  39. Practice: Getting Descriptive Statistics Using Excel • Descriptive Statistics are easy to obtain from Microsoft Excel • Use menu choice:Data / data analysis / descriptive statistics • Enter details in dialog box

  40. Open Excel and then enter the data • Select:Data / data analysis / descriptive statistics

  41. Using Excel (continued) • Enter dialog box details • Check box for summary statistics • Click OK

  42. Excel output Microsoft Excel descriptive statistics output, using the house price data: How to read “Skewness”? House Prices: $2,000,000 500,000 300,000 100,000 100,000

  43. Chapter Summary • Described measures of center and location • Mean, median, mode, weighted mean • Discussed percentiles and quartiles • Created Box and Whisker Plots • Illustrated distribution shapes • Symmetric, skewed

  44. Chapter Summary (continued) • Described measure of variation • Range, interquartile range, variance, standard deviation, coefficient of variation • Discussed Tchebysheff’s Theorem • Calculated standardized data values

More Related