1 / 69

Descriptive Statistics (Part 1)

4. Chapter. Descriptive Statistics (Part 1). Numerical Description Central Tendency Dispersion. Numerical Description. Statistics are descriptive measures derived from a sample ( n items). Parameters are descriptive measures derived from a population ( N items). Numerical Description.

Télécharger la présentation

Descriptive Statistics (Part 1)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 4 Chapter Descriptive Statistics (Part 1) Numerical Description Central Tendency Dispersion

  2. Numerical Description • Statistics are descriptive measures derived from a sample (n items). • Parameters are descriptive measures derived from a population (N items).

  3. Numerical Description • Three key characteristics of numerical data:

  4. Consider the data set of vehicle defect rates from J. D. Power and Associates. • Defect rate = total no. defects x 100 no. inspected Numerical Description • Example: Vehicle Quality • Numerical statistics can be used to summarize this random sample of brands. • Must allow for sampling error since the analysis is based on sampling.

  5. Numerical Description • Number of defects per 100 vehicles, 1004 models.

  6. To begin, sort the data in Excel.

  7. Numerical Description • Sorted data provides insight into central tendency and dispersion.

  8. Numerical Description • Visual Displays • The dot plot offers a visual impression of the data.

  9. Numerical Description • Visual Displays • Histograms with 5 bins (suggested by Sturge’s Rule) and 10 bins are shown below. • Both are symmetric with no extreme values and show a modal class toward the low end.

  10. Descriptive Statistics in Excel Go to Tools | Data Analysis and select Descriptive Statistics

  11. Highlight the data range, specify a cell for the upper-left corner of the output range, check Summary Statistics and click OK.

  12. Here is the resulting analysis.

  13. Descriptive Statistics in MegaStat

  14. Here is the resulting MegaStat analysis:

  15. Central Tendency • The central tendency is the middle or typical values of a distribution. • Central tendency can be assessed using a dot plot, histogram or more precisely with numerical statistics.

  16. Central Tendency • Six Measures of Central Tendency

  17. Central Tendency • Six Measures of Central Tendency

  18. Central Tendency • Six Measures of Central Tendency

  19. Central Tendency • Mean • A familiar measure of central tendency. • In Excel, use function =AVERAGE(Data) where Data is an array of data values.

  20. Central Tendency • Mean • For the sample of n = 37 car brands:

  21. Central Tendency • Characteristics of the Mean • Arithmetic mean is the most familiar average. • Affected by every sample item. • The balancing point or fulcrum for the data.

  22. = (42 – 65) + (60 – 65) + (70 – 65) + (75 – 65) + (78 – 65) = (-23) + (-5) + (5) + (10) + (13) = -28 + 28 = 0 Central Tendency • Characteristics of the Mean • Regardless of the shape of the distribution, absolute distances from the mean to the data points always sum to zero. • Consider the following asymmetric distribution of quiz scores whose mean = 65.

  23. Central Tendency • Median • The median (M) is the 50th percentile or midpoint of the sorted sample data. • M separates the upper and lower half of the sorted observations. • If n is odd, the median is the middle observation in the data array. • If n is even, the median is the average of the middle two observations in the data array.

  24. For n = 8, the median is between the fourth and fifth observations in the data array. Central Tendency • Median

  25. For n = 9, the median is the fifth observation in the data array. Central Tendency • Median

  26. For even n, Median = Central Tendency • Median • Consider the following n = 6 data values:11 12 15 17 21 32 • What is the median? n/2 = 6/2 = 3 and n/2+1 = 6/2 + 1 = 4 M = (x3+x4)/2 = (15+17)/2 = 16 11 12 15 16 17 21 32

  27. For odd n, Median = Central Tendency • Median • Consider the following n = 7 data values:12 23 23 25 27 34 41 • What is the median? (n+1)/2 = (7+1)/2 = 8/2 = 4 M = x4 = 25 12 23 23 25 27 34 41

  28. Central Tendency • Median • Use Excel’s function =MEDIAN(Data) where Data is an array of data values. • For the 37 vehicle quality ratings (odd n) the position of the median is (n+1)/2 = (37+1)/2 = 19. • So, the median is x19 = 121. • When there are several duplicate data values, the median does not provide a clean “50-50” split in the data.

  29. Central Tendency • Characteristics of the Median • The median is insensitive to extreme data values. • For example, consider the following quiz scores for 3 students: Tom’s scores: 20, 40, 70, 75, 80 Mean =57, Median = 70, Total = 285 Jake’s scores: 60, 65, 70, 90, 95 Mean = 76, Median = 70, Total = 380 Mary’s scores: 50, 65, 70, 75, 90 Mean = 70, Median = 70, Total = 350 • What does the median for each student tell you?

  30. Central Tendency • Mode • The most frequently occurring data value. • Similar to mean and median if data values occur often near the center of sorted data. • May have multiple modes or no mode.

  31. Central Tendency • Mode • For example, consider the following quiz scores for 3 students: Lee’s scores: 60, 70, 70, 70, 80 Mean =70, Median = 70, Mode = 70 Pat’s scores: 45, 45, 70, 90, 100 Mean = 70, Median = 70, Mode = 45 Sam’s scores: 50, 60, 70, 80, 90 Mean = 70, Median = 70, Mode = none Xiao’s scores: 50, 50, 70, 90, 90 Mean = 70, Median = 70, Modes = 50,90 • What does the mode for each student tell you?

  32. Central Tendency • Mode • Easy to define, not easy to calculate in large samples. • Use Excel’s function =MODE(Array)- will return #N/A if there is no mode.- will return first mode found if multimodal. • May be far from the middle of the distribution and not at all typical.

  33. Central Tendency • Mode • Generally isn’t useful for continuous data since data values rarely repeat. • Best for attribute data or a discrete variable with a small range (e.g., Likert scale).

  34. Central Tendency • Example: Price/Earnings Ratios and Mode • Consider the following P/E ratios for a random sample of 68 Standard & Poor’s 500 stocks. • What is the mode?

  35. Central Tendency • Example: Price/Earnings Ratios and Mode • Excel’s descriptive statistics results are: • The mode 13 occurs 7 times, but what does the dot plot show?

  36. Central Tendency • Example: Price/Earnings Ratios and Mode • The dot plot shows local modes (a peak with valleys on either side) at 10, 13, 15, 19, 23, 26, 29. • These multiple modes suggest that the mode is not a stable measure of central tendency.

  37. Central Tendency • Example: Rose Bowl Winners’ Points • Points scored by the winning NCAA football team tends to have modes in multiples of 7 because each touchdown yields 7 points. • Consider the dot plot of the points scored by the winning team in the first 87 Rose Bowl games. • What is the mode?

  38. Central Tendency • Mode • A bimodal distribution refers to the shape of the histogram rather than the mode of the raw data. • Occurs when dissimilar populations are combined in one sample. For example,

  39. Central Tendency • Skewness • Compare mean and median or look at histogram to determine degree of skewness.

  40. Central Tendency • Symptoms of Skewness

  41. Central Tendency • Skewness • For the sample of J.D. Power quality ratings, the mean (125.38) exceeds the median (121). What does this suggest?

  42. Central Tendency • Geometric Mean • The geometric mean (G) is a multiplicative average. • For the J. D. Power quality data (n=37): • In Excel use =GEOMEAN(Array) • The geometric mean tends to mitigate the effects of high outliers.

  43. Central Tendency • Growth Rates • A variation on the geometric mean used to find the average growth rate for a time series. • For example, from 1998 to 2002, Spirit Airlines revenues are:

  44. In Excel use =(403/131)^(1/5)-1 Central Tendency • Growth Rates • The average growth rate is given by taking the geometric mean of the ratios of each year’s revenue to the preceding year. • Due to cancellations, only the first and last years are relevant: = 1.2421 = .242 or 24.2% per year

  45. Midrange = Midrange = = Central Tendency • Midrange • The midrange is the point halfway between the lowest and highest values of X. • Easy to use but sensitive to extreme data values. • For the J. D. Power quality data (n=37): • Here, the midrange (130) is higher than the mean (125.38) or median (121).

  46. Central Tendency • Trimmed Mean • To calculate the trimmed mean, first remove the highest and lowest k percent of the observations. • For example, for the n = 68 P/E ratios, we want a 5 percent trimmed mean (i.e., k = .05). • To determine how many observations to trim, multiply k x n = 0.05 x 68 = 3.4 or 3 observations. • So, we would remove the three smallest and three largest observations before averaging the remaining values.

  47. Central Tendency • Trimmed Mean • Here is a summary of all the measures of central tendency for the n = 68 P/E values. • The trimmed mean mitigates the effects of very high values, but still exceeds the median.

  48. Central Tendency • Trimmed Mean • The Federal Reserve uses a 16% trimmed mean to mitigate the effects of extremes in its analysis of the Consumer Price Index.

  49. Dispersion • Variation is the “spread” of data points about the center of the distribution in a sample. Consider the following measures of dispersion: • Measures of Variation

More Related