1 / 35

2.4 Describing Distributions Numerically – cont.

Learn how to measure the center and spread of symmetric data sets using mean, median, range, and standard deviation.

jordanj
Télécharger la présentation

2.4 Describing Distributions Numerically – cont.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2.4 Describing Distributions Numerically – cont. Describing Symmetric Data

  2. Symmetric DataBody temp. of 93 adults

  3. Recall: 2 characteristics of a data set to measure • center measures where the “middle” of the data is located • variability measures how “spread out” the data is

  4. Measure of Center When Data Approx. Symmetric • mean (arithmetic mean) • notation

  5. Connection Between Mean and Histogram • A histogram balances when supported at the mean. Mean x = 140.6

  6. Mean: balance pointMedian: 50% area each halfright histo: mean 55.26 yrs, median 57.7yrs

  7. Properties of Mean, Median 1. The mean and median are unique; that is, a data set has only 1 mean and 1 median (the mean and median are not necessarily equal). 2. The mean uses the value of every number in the data set; the median does not.

  8. Think about mean and median 456=270; 270-40=230; 230/5=46 • Six people in a room have a median age of 45 years and mean age of 45 years. • One person who is 40 years old leaves the room. • Questions: • What is the median age of the 5 people remaining in the room? • What is the meanage of the 5 people remainingin the room? Can’t answer 46

  9. Example: class pulse rates • 53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

  10. 2018 n = 877 mean = $4,512,768 median = $1,450,000 max = $34,083,333 2014 n = 848 mean = $3,932,912 median = $1,456,250 max = $28,000,000 2014, 2018 baseball salaries

  11. Disadvantage of the mean • Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

  12. Mean, Median, Maximum Baseball Salaries 1985 - 2017

  13. Skewness: comparing the mean, and median • Skewed to the right (positively skewed) • mean>median

  14. Skewed to the left; negatively skewed • mean < median. mean=86.92; median=98.45

  15. Symmetric data • mean, median approx. equal

  16. Describing Variability of Symmetric Data Describing symmetric data (cont.)

  17. Describing Symmetric Data (cont.) • Measure of center for symmetric data: • Measure of variability for symmetric data?

  18. Ways to measure variability 1. range=largest-smallest ok sometimes; in general, too crude; sensitive to one large or small obs.

  19. Example

  20. The Sample Standard Deviation, a measure of spread around the mean • Square the deviation of each observation from the mean; find the square root of the “average” of these squared deviations

  21. Calculations … Women height (inches) Mean = 63.4 Sum of squared deviations from mean = 85.2 (n − 1) = 13; (n − 1) is called degrees freedom (df) s2 = variance = 85.2/13 = 6.55 inches squared s = standard deviation = √6.55 =2.56 inches

  22. We’ll never calculate these by hand, so make sure to know how to get the standard deviation using your calculator, Excel, or other software. Mean ± 1 s.d. Sample standard deviation s and sample variance s2

  23. Population Standard Deviation

  24. Remarks 1. Note that s and s are always greater than or equal to zero. 2. The larger the value of s (or s ), the greater the spread of the data. When does s=0? When does s =0?

  25. Remarks (cont.) 3. The standard deviation is the most commonly used measure of risk in finance and business • Stocks, Mutual Funds, etc. 4. Variance • s2 sample variance • 2 population variance • Units are squared units of the original data • square $, square gallons ??

  26. Remarks 6):Why divide by n-1 instead of n? • degrees of freedom • each observation has 1 degree of freedom • however, when estimate unknown population parameter like m, you lose 1 degree of freedom

  27. Remarks 6) (cont.):Why divide by n-1 instead of n? Example • Suppose we have 3 numbers whose average is 9 • x1= x2= • then x3 must be • once we selected x1 and x2, x3 was determined since the average was 9 • 3 numbers but only 2 “degrees of freedom”

  28. Computational Example

  29. class pulse rates

  30. Example #1 #2 #3 #4 32 33 38 37 41 35 39 42 44 45 39 45 47 50 40 46 50 52 56 47 53 54 57 48 56 58 58 50 59 59 61 67 68 64 62 68 • x 50 50 50 50 • s 10.6 10.6 10.6 10.6 • m 50 52 56 47

  31. Boxplots: same mean, standard deviation

  32. More Boxplots of the 4 data sets

  33. Review: Properties of s and s • s and s are always greater than or equal to 0 when does s = 0? s = 0? • The larger the value of s (or s), the greater the spread of the data • the standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

  34. Summary of Notation

More Related