Understanding Measures of Variation in Statistics – Key Concepts and Examples
This lecture covers vital statistical measures of variation, including range, variance, and standard deviation. It emphasizes that two distributions can have the same mean or median but different levels of variability, thereby affecting data interpretation. The lecture explores how to compute these measures, their properties, and practical examples, including the Empirical Rule for normally distributed data. Additionally, it introduces the Coefficient of Variation to compare relative variability across different data sets.
Understanding Measures of Variation in Statistics – Key Concepts and Examples
E N D
Presentation Transcript
STA 291Fall 2009 Lecture 6 Dustin Lueker
Measures of Variation • Statistics that describe variability • Two distributions may have the same mean and/or median but different variability • Mean and Median only describe a typical value, but not the spread of the data • Range • Variance • Standard Deviation • Interquartile Range • All of these can be computed for the sample or population STA 291 Fall 2009 Lecture 6
Range • Difference between the largest and smallest observation • Very much affected by outliers • A misrecorded observation may lead to an outlier, and affect the range • The range does not always reveal different variation about the mean STA 291 Fall 2009 Lecture 6
Example • Sample 1 • Smallest Observation: 112 • Largest Observation: 797 • Range = • Sample 2 • Smallest Observation: 15033 • Largest Observation: 16125 • Range = STA 291 Fall 2009 Lecture 6
Deviations • The deviation of the ith observation xi from the sample mean is the difference between them, • Sum of all deviations is zero • Therefore, we use either the sum of the absolute deviations or the sum of the squared deviations as a measure of variation STA 291 Fall 2009 Lecture 6
Sample Variance • Variance of nobservations is the sum of the squared deviations, divided by n-1 STA 291 Fall 2009 Lecture 6
Example STA 291 Fall 2009 Lecture 6
Interpreting Variance • About the average of the squared deviations • “average squared distance from the mean” • Unit • Square of the unit for the original data • Difficult to interpret • Solution • Take the square root of the variance, and the unit is the same as for the original data • Standard Deviation STA 291 Fall 2009 Lecture 6
Properties of Standard Deviation • s ≥ 0 • s = 0 only when all observations are the same • If data is collected for the whole population instead of a sample, then n-1 is replaced by n • s is sensitive to outliers STA 291 Fall 2009 Lecture 6
Variance and Standard Deviation • Sample • Variance • Standard Deviation • Population • Variance • Standard Deviation STA 291 Fall 2009 Lecture 6
Population Parameters and Sample Statistics • Population mean and population standard deviation are denoted by the Greek letters μ (mu) and σ (sigma) • They are unknown constants that we would like to estimate • Sample mean and sample standard deviation are denoted by and s • They are random variables, because their values vary according to the random sample that has been selected STA 291 Fall 2009 Lecture 6
Empirical Rule • If the data is approximately symmetric and bell-shaped then • About 68% of the observations are within one standard deviation from the mean • About 95% of the observations are within two standard deviations from the mean • About 99.7% of the observations are within three standard deviations from the mean STA 291 Fall 2009 Lecture 6
Example • SAT scores are scaled so that they have an approximate bell-shaped distribution with a mean of 500 and standard deviation of 100 • About 68% of the scores are between • About 95% of the scores are between • If you have a score above 700, you are in the top % • What percentile would this be? STA 291 Fall 2009 Lecture 6
Example • According to the National Association of Home Builders, the U.S. nationwide median selling price of homes sold in 1995 was $118,000 • Would you expect the mean to be larger, smaller, or equal to $118,000? • Which of the following is the most plausible value for the standard deviation? (a) –15,000, (b) 1,000, (c) 45,000, (d) 1,000,000 STA 291 Fall 2009 Lecture 6
Coefficient of Variation • Standardized measure of variation • Idea • A standard deviation of 10 may indicate great variability or small variability, depending on the magnitude of the observations in the data set • CV = Ratio of standard deviation divided by mean • Population and sample version STA 291 Fall 2009 Lecture 6
Example • Which sample has higher relative variability? (a higher coefficient of variation) • Sample A • mean = 62 • standard deviation = 12 • CV = • Sample B • mean = 31 • standard deviation = 7 • CV = STA 291 Fall 2009 Lecture 6