1 / 36

Mean and variations

Mean and variations. Averages or Central Tendency Variability or Deviation. Measures of Central Tendency. “Average” The population mean: m = S X/N Alternate definitions: the proportional share the balance point of the values.

makoto
Télécharger la présentation

Mean and variations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mean and variations Averages or Central Tendency Variability or Deviation

  2. Measures of Central Tendency • “Average” • The population mean: m= SX/N • Alternate definitions: • the proportional share • the balance point of the values

  3. The mean is the balance point of a set of scores (a distribution) 10 20 30 40 50 60 70 80 90 100

  4. The mean of a sample • The symbol for a sample mean is M • M = SX/N • Note the computation of a weighted mean • Instead, compute means from the raw scores.

  5. Properties of the mean: • Includes the exact value of every score, so is vulnerable to skew • Sum of the deviations around the mean is zero • Sum of the squared deviations is a minimum

  6. The Median • The 50th percentile (P50), the centermost score counting frequencies • If there is an even number of scores, the median is halfway between the two centermost scores.

  7. The median is the score that is at the middle of the frequency distribution 10 20 30 40 50 60 70 80 90 100

  8. Properties of the median: • Less sensitive than the mean to extreme scores • May be more subject to differences between samples than is the mean

  9. The median by interpolation: Simple frequency distributions 4 3 2 1 Frequency 8 9 10 11 12 13 Score (X)

  10. 4 3 2 1 Frequency 8 9 10 11 12 13 Score (X) The median by interpolation: Another example

  11. The median by interpolation: Grouped frequency distributions 4 3 2 1 Frequency 75 85 95 105 115 125 135 Score (X)

  12. Computing the median for grouped frequency distributions 1. Multiply .50 by N to find the number of scores below the median: (cumfp) 2. Find the lower limit of the interval containing cumfp: LRL 3. Determine the proportion of the scores in the interval needed to reach cumfp (cumfp - cumfb)/ fw 4. Find the corresponding score, which is the median.

  13. Using the logical method, we can derive a formula, viz.: Median = XL + (cumfp - cumfb)(i ) fw from which another formula is derived, viz.: Median = LRL + i .5N - fb fw ( )

  14. The Mode • The commonest score or scores • Properties of the mode: • May be multiple • Is unstable from sample to sample • In symmetrical distributions mean = median • Skew is the movement of the mean away from the median in non-symmetrical distributions.

  15. When should we use each average? • First, consider the shape of the distribution: • When the distribution is unimodal and symmetrical (balanced), mean, median, and mode will all be the same. You may use any one, although the mean is preferable. • When the distribution is unimodal but skewed, use the median. • When the distribution is bimodal, use the mode.

  16. Second, consider the scale • For scores on a nominal scale, use the mode only. In a nominal scale, mean and median are usually meaningless. • For an ordinal scale, use median or mode, but not the mean. • For interval or ratio scales, use any of the three measures of central tendency. • For discrete variables, consider the median, although the mean is commonly used.

  17. Third, consider the values • If there are a few extreme scores, use the median. (Note that this situation produces a skewed distribution.) • If the distribution includes undetermined values, as when timing a task that some people never complete, use the median. • If a scale has open-ended values, such as “6 or more,” use the median.

  18. APA format (“In the literature”) • Journals in all of the disciplines represented in this course use APA format. • In APA format, the mean is reported in parentheses with the abbreviation M, as follows: “This is one of the oldest statistics classes on record (M = 20.7).” • If there are several means to report, use a table.

  19. More APA format • You may report any of the averages in a written sentence using the full name of the statistic: “The mean age of the class, 20.7, is one of the oldest on record. However, the median age of 19.5 is within the normal range.” • The median should be represented in parenthetical notations as Mdn, as follows: “However, the age distribution is clearly skewed (Mdn = 19.5).” • There is no conventional abbreviation for the mode.

  20. Statistics for August, 2005

  21. Variability: Scores are not all the same. But how do they vary? • Range = High - low • Inclusive range = U.L. - L.L. • Or in textbook terms, range = URL Xmax – LRL Xmin • The range is the distance from the upper real limit of the highest score to the lower real limit of the lowest score.

  22. Some variations on the range • The range is increased by skew. • Interquartile range = Q3 – Q1 • Or, P75 – P25 • Semi-interquartile range is one-half of the interquartile range • Although these ranges eliminate the skew problem, they still only consider the middle 50% of the scores.

  23. Interquartile range and semi-interquartile range The distance from the first quartile, which is the 25th percentile, to the third quartile, which is the 75th percentile, is called the inter-quartile range. One-half of that distance is called the semi-interquartile range. Find the quartiles using the same method you used for the median, except that you find the score that is above 25% and 75% of the number of scores. (Compare to the median, which you found as the score that is above 50% of the number of scores.)

  24. Deviation scores: X - m or X - M • Standard deviation: S(X - m)= 0 S(X - m)2 is a minimum S(X - m)2is the variance N S(X - m)2 is the standard deviation N

  25. The Sum of Squares concept • Variance: s2 = SS / N or s2 = SS / (N - 1) • s2 and s2 are also known as the Mean Square • Standard deviation: s = (s2) or s = (s2)

  26. Sums of Squares in Real Life • The Sum of Squares song “Sum of squares equals the sum of X squared, minus the sum of X (quantity) squared over N-- ALWAYS!” • SS = SX2 - (SX)2/N • s2 = SS/N or s2 = SS / (N - 1) • s = (s2) or s = (s2)

  27. An example X 1 2 3 4 5 X-m -2 -1 0 1 2 0 (X-m)2 4 1 0 1 4 10 X2 1 4 9 16 25 55 • X=15 m= 3

  28. And, applying the Sum of Squares Song • SS = SX2 - (SX)2/N = 55 – (15)2 / 5 = 55 – 225 / 5 = 55 – 45 = 10 which is the same as S(X - m)2

  29. Then, to find the variance… • s2 = SS/N = 10 / 5 = 2 (Population parameter) or • s2 = SS / (N - 1) = 10 / (5 – 1) = 10 / 4 = 2.5 (Sample statistic)

  30. And the standard deviation… • s = s2 • = 2 • = 1.414 (Population parameter) • or • s = s2 = 2.5 = 1.581 (Sample statistic)

  31. Degrees of freedom (df) Why did we use (n – 1) as the denominator to compute the variance of a sample? Recall that the variance is based on the deviation of each score from the mean. Therefore, before computing the variance, you must already know the mean: It is determined. Consequently, of the scores in the distribution, only (n – 1) are free to vary. The number of scores free to vary is known as the degrees of freedom.

  32. Biased and unbiased statistics In the long run, the mean of samples drawn from a population will average out to the mean of the population. The same is true for the sums of squares of samples. Thus, both means and sums of squares are unbiased estimators of population parameters. That is not true, however, for the standard deviation. The standard deviation of a sample is a biased estimator of a population parameter. It tends to under-estimate variation. To correct for the bias, we change the denominator for the standard deviation of a sample from n to (n-1)

  33. When n, and when (n-1)? • Population • A sample is smaller, so n - 1

  34. Transformations of scale Adding a constant Multiplying by a constant

  35. In APA format In text, you may write out the name of the statistic: “The standard deviation was 1.414” or “The distribution had a mean of 6.5 and a standard deviation of 1.414.” Abbreviate standard deviation as SD in parenthetical references: “The mean was 6.5 (SD = 1.414).” All other measures of variability are written out: range, interquartile range, semi-interquartile range, mean deviation, mean square, sum of squares, variance.

  36. 3 3 3 # # 4 4 The Sum of Squares Song Sum of Squares e-quals the sum of X squared, minus the sum of X (quantity) squared o-ver N…ALWAYS! © Paul D. Young, 2001

More Related