1 / 31

3.3 Measures of Spread

What is more important: potential or consistency?. 3.3 Measures of Spread. Chapter 3 - Tools for Analyzing Data Learning goal: calculate and interpret measures of spread Due now: p . 159 #4, 5, 6, 8, 10-13 MSIP / Home Learning: p. 168 #2b, 3b, 4, 6, 7, 10. What is spread?.

oswald
Télécharger la présentation

3.3 Measures of Spread

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What is more important: potential or consistency? 3.3 Measures of Spread Chapter 3 - Tools for Analyzing Data Learning goal: calculate and interpret measures of spread Due now: p. 159 #4, 5, 6, 8, 10-13 MSIP / Home Learning: p. 168 #2b, 3b, 4, 6, 7, 10

  2. What is spread? • Measures of central tendency do not always tell you everything! • The histograms have identical mean and median, but the spread is different • Spread is how closely the values cluster around the middle value

  3. Why worry about spread? • Less spread means you have greater confidence that values will fall within a particular range • Important for making predictions

  4. Measures of Spread • We will also study 3 Measures of Spread: • Range • Interquartile Range (IQR) • Standard Deviation (Std.Dev.) • All 3 measure how spread out data is • Smaller value = less spread (more consistent) • Larger value = more spread (less consistent)

  5. Measures of Spread • 1) Range = (max) – (min) • Indicates the size of the interval that contains 100% of the data • 2) Interquartile Range • IQR = Q3 – Q1, where • Q1 is the lower half median • Q3 is the upper half median • Indicates the size of the interval that contains the middle 50% of the data

  6. Quartiles Example • 26 28 34 36 38 38 40 41 41 44 45 46 51 54 55 • Q2 = 41 Median • Q1 = 36 Lower half median • Q3 = 46 Upper half median • IQR = Q3 – Q1 • = 46 – 36 • = 10 • The middle 50% of the data is within 10 units • If a quartile occurs between 2 values, it is calculated as the average of the two values

  7. A More Useful Measure of Spread • Range is very basic • Does not take clusters nor outliers into account • Interquartile Range is somewhat useful • Takes clusters into account • Visual in Box-and-Whisker Plot • Standard deviation is very useful • Average distance from the mean for all data points

  8. Deviation • The mean of these numbers is 48 • Deviation = (data) – (mean) • The deviation for 24 is 24 - 48 = -24 • -24 • 12 24 36 48 60 72 84 • 36 • The deviation for 84 is 84 - 48 = 36

  9. Calculating Standard Deviation 1. Find the mean (average) 2. Find the deviation for each data point data point – mean 3. Square the deviations (data point – mean)2 4. Average the squares of the deviations (this is called the variance) 5. Take the square root of the variance

  10. Example of Standard Deviation • 26 28 34 36 • mean = (26 + 28 + 34 + 36) ÷ 4 = 31 • σ² = (26–31)² + (28-31)² + (34-31)² + (36-31)² 4 • σ² = 25 + 9 + 9 + 25 4 • σ² = 17 • σ = √17 = 4.1

  11. Standard Deviation • σ² (lower case sigma squared) is used to represent variance • σ is used to represent standard deviation • σ is commonly used to measure the spread of data, with larger values of σ indicating greater spread • we are using a population standard deviation

  12. Standard Deviation with Grouped Data • grouped mean = (2×2 + 3×6 + 4×6 + 5×2) / 16 = 3.5 • deviations: • 2: 2 – 3.5 = -1.5 • 3: 3 – 3.5 = -0.5 • 4: 4 – 3.5 = 0.5 • 5: 5 – 3.5 = 1.5 • σ² = 2(-1.5)² + 6(-0.5)² + 6(0.5)² + 2(1.5)² • 16 • σ² = 0.7499 • σ = √0.7499 = 0.9

  13. Measure of Spread - Recap • Measures of Spread are numbers indicating how spread out data is • Smaller value for any measure of spread means data is more consistent • 1) Range = Max – Min • 2) Interquartile Range: IQR = Q3 – Q1, where • Q1 = first half median • Q3 = second half median • 3) Standard Deviation i. Find mean (average) ii. Find all deviations (data) – (mean) iii./iv. Square all and avgthem (this is variance or σ2) v. Take the square root to get std. dev. σ

  14. MSIP / Home Learning • Read through the examples on pages 164-167 • Complete p. 168 #2b, 3b, 4, 6, 7, 10 • You are responsible for knowing how to do simple examples by hand (~6 pieces of data) • We will use technology (Fathom/Excel) to calculate larger examples • Have a look at your calculator and see if you have this feature (Σσn and Σσn-1)

  15. Different Types of Std. Dev.

  16. “Think of how stupid the average person is, and realize half of them are stupider than that.” -George Carlin 3.4 Normal Distribution Chapter 3 – Tools for Analyzing Data Learning goal: Determine the % of data within intervals of a Normal Distribution Due now: p. 168 #2b, 3b, 4, 6, 7, 10 MSIP / Home Learning: p. 176 #1, 3b, 6, 8-10

  17. Histograms • Histograms can be skewed... Right-skewed Left-skewed

  18. Histograms • ... or symmetrical

  19. Normal? • A normal distribution is a histogram that is symmetrical and has a bell shape • Used quite a bit in statistical analysis • Also called a Gaussian Distribution • Symmetrical with equal mean, median and mode that fall on the line of symmetry of the curve

  20. A Real Example • the heights of 600 randomly chosen Canadian students from the “Census at School” data set • the data approximates a normal distribution

  21. The 68-95-99.7% Rule • Area under curve is 1 (i.e. it represents 100% of the population surveyed) • Approx 68% of the data falls within 1 standard deviation of the mean • Approx 95% of the data falls within 2 standard deviations of the mean • Approx 99.7% of the data falls within 3 standard deviations of the mean • http://davidmlane.com/hyperstat/A25329.html

  22. 99.7% 95% 68% 34% 34% 0.15% 0.15% 13.5% 13.5% 2.35% 2.35% x - 3σ x - 2σ x - 1σ x x + 1σ x + 2σ x + 3σ Distribution of Data

  23. Normal Distribution Notation • The notation above is used to describe the Normal distribution where x is the mean and σ² is the variance (square of the standard deviation) • e.g. X~N (70,82) describes a Normal distribution with mean 70 and standard deviation 8 (our class at midterm?)

  24. An example • Suppose the time before burnout for an LED averages 120 months with a standard deviation of 10 months and is approximately Normally distributed. What is the length of time a user might expect an LED to last with: • a) 68% confidence? • b) 95% confidence? • So X~N(120,102)

  25. An example cont’d • 68% of the data will be within 1 standard deviation of the mean • This will mean that 68% of the bulbs will be between 120–10 = 110 months and 120+10 = 130 months • So 68% of the bulbs will last 110 - 130 months • 95% of the data will be within 2 standard deviations of the mean • This will mean that 95% of the bulbs will be between 120 – 2×10 = 100 months and 120 + 2×10 = 140 months • So 95% of the bulbs will last 100 - 140 months

  26. Example continued… • Suppose you wanted to know how long 99.7% of the bulbs will last? • This is the area covering 3 standard deviations on either side of the mean • This will mean that 99.7% of the bulbs will be between 120 – 3×10 months and 120 + 3×10 • So 99.7% of the bulbs will last 90-150 months • This assumes that all the bulbs are produced to the same standard

  27. Example continued… 99.7% 95% µ represents the population mean What % of bulbs last between: µ - σ and µ + 2σ? 34 + 34 + 13.5 = 81.5% µ - 2σand µ? 13.5 + 34 = 47.5% 34% 34% 13.5% 13.5% 2.35% 2.35% 90 100 120 140 150 months months months months months

  28. Percentage of data between two values • The area under any normal curve is 1 • The percent of data that lies between two values in a normal distribution is equivalent to the area under the normal curve between these values • See examples 2 and 3 on page 175

  29. Why is the Normal distribution so important? • Many psychological and educational variables are distributed approximately normally: • height, reading ability, memory, IQ, etc. • Normal distributions are statistically easy to work with • All kinds of statistical tests are based on it • Lane (2003)

  30. MSIP / Home Learning • Complete p. 176 #1, 3b, 6, 8-10 • http://onlinestatbook.com/

  31. References • Lane, D. (2003). What's so important about the normal distribution? Retrieved October 5, 2004 from http://davidmlane.com/hyperstat/normal_distribution.html • Wikipedia (2004). Online Encyclopedia. Retrieved September 1, 2004 from http://en.wikipedia.org/wiki/Main_Page

More Related