1 / 29

STA 291 Fall 2009

STA 291 Fall 2009. Lecture 5 Dustin Lueker. Measures of Central Tendency. Mean - Arithmetic Average . Median - Midpoint of the observations when they are arranged in increasing order. Notation: Subscripted variables n = # of units in the sample N = # of units in the population

traci
Télécharger la présentation

STA 291 Fall 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STA 291Fall 2009 Lecture 5 Dustin Lueker

  2. Measures of Central Tendency Mean - Arithmetic Average Median - Midpoint of the observations when they are arranged in increasing order Notation: Subscripted variables n = # of units in the sample N = # of units in the population x = Variable to be measured xi= Measurement of the ith unit Mode - Most frequent value. STA 291 Fall 2009 Lecture 5

  3. Median • Measurement that falls in the middle of the ordered sample • When the sample size n is odd, there is a middle value • It has the ordered index (n+1)/2 • Ordered index is where that value falls when the sample is listed from smallest to largest • An index of 2 means the second smallest value • Example • 1.7, 4.6, 5.7, 6.1, 8.3 n=5, (n+1)/2=6/2=3, index = 3 Median = 3rd smallest observation = 5.7 STA 291 Fall 2009 Lecture 5

  4. Median • When the sample size n is even, average the two middle values • Example • 3, 5, 6, 9, n=4 (n+1)/2=5/2=2.5, Index = 2.5 Median = midpoint between 2nd and 3rd smallest observations = (5+6)/2 = 5.5 STA 291 Fall 2009 Lecture 5

  5. Mean and Median • For skewed distributions, the median is often a more appropriate measure of central tendency than the mean • The median usually better describes a “typical value” when the sample distribution is highly skewed • Example • Monthly income for five people 1,000 2,000 3,000 4,000 100,000 • Median monthly income: • Does this better describe a “typical value” in the data set than the mean of 22,000? STA 291 Fall 2009 Lecture 5

  6. Mean and Median • Trimmed mean is a compromise between the median and mean • Calculating the trimmed mean • Order the date from smallest to largest • Delete a selected number of values from each end of the ordered list • Find the mean of the remaining values • The trimming percentage is the percentage of values that have been deleted from each end of the ordered list STA 291 Fall 2009 Lecture 5

  7. Median for Grouped or Ordinal Data • Example: Highest Degree Completed STA 291 Fall 2009 Lecture 5

  8. Calculate the Median • n = 177,618 • (n+1)/2 = 88,809.5 • Median = midpoint between the 88809th smallest and 88810th smallest observations • Both are in the category “High school only” • Mean wouldn’t make sense here since the variable is only ordinal • Median • Can be used for interval data and for ordinal data • Can not be used for nominal data because the observations can not be ordered on a scale STA 291 Fall 2009 Lecture 5

  9. Mean vs. Median • Mean • Interval data with an approximately symmetric distribution • Median • Interval data • Ordinal data • Mean is sensitive to outliers, median is not STA 291 Fall 2009 Lecture 5

  10. Mean vs. Median STA 291 Fall 2009 Lecture 5

  11. Mean vs. Median • Symmetric distribution • Mean = Median • Skewed distribution • Mean lies more towards the direction which the distribution is skewed STA 291 Fall 2009 Lecture 5

  12. Median • Disadvantage • Insensitive to changes within the lower or upper half of the data • Example • 1, 2, 3, 4, 5 • 1, 2, 3, 100, 100 • Sometimes, the mean is more informative even when the distribution is skewed STA 291 Fall 2009 Lecture 5

  13. Example • Keeneland Sales STA 291 Fall 2009 Lecture 5

  14. Mode • Value that occurs most frequently • Does not need to be near the center of the distribution • Not really a measure of central tendency • Can be used for all types of data (nominal, ordinal, interval) • Special Cases • Data Set • {2, 2, 4, 5, 5, 6, 10, 11} • Mode = • Data Set • {2, 6, 7, 10, 13} • Mode = STA 291 Fall 2009 Lecture 5

  15. Mean vs. Median vs. Mode • Mean • Interval data with an approximately symmetric distribution • Median • Interval or ordinal data • Mode • All types of data STA 291 Fall 2009 Lecture 5

  16. Mean vs. Median vs. Mode • Mean is sensitive to outliers • Median and mode are not • Why? • In general, the median is more appropriate for skewed data than the mean • Why? • In some situations, the median may be too insensitive to changes in the data • The mode may not be unique STA 291 Fall 2009 Lecture 5

  17. Example • “How often do you read the newspaper?” • Identify the mode • Identify the median response STA 291 Fall 2009 Lecture 5

  18. Percentiles • The pth percentile (Lp) is a number such that p% of the observations take values below it, and (100-p)% take values above it • 50th percentile = median • 25th percentile = lower quartile • 75th percentile = upper quartile • The index of Lp • (n+1)p/100 STA 291 Fall 2009 Lecture 5

  19. Quartiles • 25th percentile • lower quartile • Q1 • (approximately) median of the observations below the median • 75th percentile • upper quartile • Q3 • (approximately) median of the observations above the median STA 291 Fall 2009 Lecture 5

  20. Example • Find the 25th percentile of this data set • {3, 7, 12, 13, 15, 19, 24} STA 291 Fall 2009 Lecture 5

  21. Interpolation • Use when the index is not a whole number • Want to go closest index lower then go the distance of the decimal towards the next number • If the index is found to be 5.4 you want to go to the 5th value then add .4 of the value between the 5th value and 6th value • In essence we are going to the 5.4th value STA 291 Fall 2009 Lecture 5

  22. Example • Find the 40th percentile of the same data set • {3, 7, 12, 13, 15, 19, 24} • Must use interpolation STA 291 Fall 2009 Lecture 5

  23. Data Summary • Five Number Summary • Minimum • Lower Quartile • Median • Upper Quartile • Maximum • Example • minimum=4 • Q1=256 • median=530 • Q3=1105 • maximum=320,000. • What does this suggest about the shape of the distribution? STA 291 Fall 2009 Lecture 5

  24. Interquartile Range (IQR) • The Interquartile Range (IQR) is the difference between upper and lower quartile • IQR = Q3 – Q1 • IQR = Range of values that contains the middle 50% of the data • IQR increases as variability increases • Murder Rate Data • Q1= 3.9 • Q3 = 10.3 • IQR = STA 291 Fall 2009 Lecture 5

  25. Box Plot • Displays the five number summary (and more) graphical • Consists of a box that contains the central 50% of the distribution (from lower quartile to upper quartile) • A line within the box that marks the median, • And whiskersthat extend to the maximum and minimum values • This is assuming there are no outliers in the data set STA 291 Fall 2009 Lecture 5

  26. Outliers • An observation is an outlier if it falls • more than 1.5 IQR above the upper quartile or • more than 1.5 IQR below the lower quartile STA 291 Fall 2009 Lecture 5

  27. Box Plot • Whiskers only extend to the most extreme observations within 1.5 IQR beyond the quartiles • If an observation is an outlier, it is marked by an x, +, or some other identifier STA 291 Fall 2009 Lecture 5

  28. Example • Values • Min = 148 • Q1 = 158 • Median = Q2 = 162 • Q3 = 182 • Max = 204 • Create a box plot STA 291 Fall 2009 Lecture 5

  29. 5 Number Summary/Box Plot • On right-skewed distributions, minimum, Q1, and median will be “bunched up”, while Q3 and the maximum will be farther away. • For left-skewed distributions, the “mirror” is true: the maximum, Q3, and the median will be relatively close compared to the corresponding distances to Q1 and the minimum. • Symmetric distributions? STA 291 Fall 2009 Lecture 5

More Related