1 / 17

Putting Statistics to Work

Putting Statistics to Work. 6. Characterizing Data. Definition. The distribution of a variable (or data set) refers to the way its values are spread over all possible values. A distribution can be shown visually with a table or graph. Measures of Center in a Distribution.

cliftone
Télécharger la présentation

Putting Statistics to Work

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Putting Statistics to Work 6 Characterizing Data

  2. Definition The distribution of a variable (or data set) refers to the way its values are spread over all possible values. A distribution can be shown visually with a table or graph.

  3. Measures of Center in a Distribution • The meanis what we most commonly call the average value. It is defined as follows: • The median is the middle value in the sorted data set (or halfway between the two middle values if the number of values is even). • The mode is the most common value (or group of values) in a distribution.

  4. Example Eight grocery stores sell the PR energy bar for the following prices: $1.09 $1.29 $1.29 $1.35 $1.39 $1.49 $1.59 $1.79 Find the mean, median, and mode for these prices. Solution The mean is $1.41

  5. Example (cont) Median: sort the data in ascending order: $1.09 $1.29 $1.29 $1.35 $1.39 $1.49 $1.59 $1.79 Because there are eight prices (an even number), there are two values in the middle of the list: $1.35 and $1.39. The median lies halfway between these two values, which we calculate by adding them and dividing by 2: The mode is $1.29. 3 values below 3 values above 2 middle values

  6. Effects of Outliers An outlier is a data value that is much higher or much lower than almost all other values. Consider the following data set of contract offers: $0 $0 $0 $0 $10,000,000 The mean contract offer is As displayed, outliers can pull the mean upward (or downward). The median and mode of the data are not affected.

  7. Example A newspaper surveys wages for assembly workers in regional high-tech companies and reports an average of $22 per hour. The workers at one large firm immediately request a pay raise, claiming that they work as hard as employees at other companies but their average wage is only $19. The management rejects their request, telling them that they are overpaid because their average wage, in fact, is $23. Can both sides be right? Explain.

  8. Example (cont) Solution Both sides can be right if they are using different definitions of average. In this case, the workers may be using the median while management uses the mean. For example, imagine that there are only five workers at the firm and their wages are $19, $19, $19, $19, and $39. The median of these five wages is $19 (as the workers claimed), but the mean is $23 (as management claimed).

  9. Shapes of Distributions Two single-peaked (unimodal) distributions A double-peaked (bimodal) distribution

  10. Symmetry A distribution is symmetric if its left half is a mirror image of its right half. A distribution that is not symmetric must have values that tend to be mote spread out on one side than the other. In this case we say the distribution is skewed.

  11. Skewness A distribution is left-skewed if its values are more spread out on the left side. A distribution is right-skewed if its values are more spread out on the right side.

  12. Definition Symmetry and Skewness A single-peaked distribution is symmetric if its left half is a mirror image of its right half. A single-peaked distribution is left-skewed if its values are more spread out on the left side of the mode. A single-peaked distribution is right-skewed if its values are more spread out on the right side of the mode.

  13. Example For each of the following situations, state whether you expect the distribution to be symmetric, left-skewed, or right-skewed. Explain. a. Heights of a sample of 100 women b. Number of books read during the school year by fifth graders c. Speeds of cars on a road where a visible patrol car is using radar to detect speeders

  14. Example (cont) Solution a. The distribution of heights of women is symmetric, because roughly equal numbers of women are shorter and taller than the mean and extremes of height are rare on either side of the mean. b. The distribution of the number of books read is right-skewed. Most fifth-grade children read a moderate number of books during the school year, but a few voracious readers will read far more than most other students. These students will therefore be outliers with high values for the number of books read, creating a tail on the right side of the distribution.

  15. Example (cont) Solution c. Drivers usually slow down when they are aware of a patrol car looking for speeders. Few if any drivers will be exceeding the speed limit, but some drivers slow to well below the speed limit. The distribution of speeds is therefore left-skewed, with a mode near the speed limit but a few cars going well below the speed limit.

  16. Variation Variation describes how widely data values are spread out about the center of a distribution. From left to right, these three distributions have increasing variation.

  17. Example How would you expect the symmetry and variation to differ between times in the Olympic marathon and times in the New York marathon? Explain. Solution The Olympic marathon invites only elite runners, whose times are likely to be clustered not far above world record times. The New York marathon allows runners of all abilities, whose times are spread over a very wide range. Therefore, the variation among the times should be greater in the New York marathon than in the Olympic marathon.

More Related