1 / 82

Probability & Statistics for P-8 Teachers

Probability & Statistics for P-8 Teachers. Chapter 3 Data Description. What is Next?. Now that we know how to organize the data and create nice graphs to present the results, we need to focus on describing patterns in the data. Summarizing data sets numerically

iris-campos
Télécharger la présentation

Probability & Statistics for P-8 Teachers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probability & Statistics for P-8 Teachers Chapter 3 Data Description

  2. What is Next? Now that we know how to organize the data and create nice graphs to present the results, we need to focus on describing patterns in the data. • Summarizing data sets numerically • Are there certain values that seem more typical for the data? • How typical are they?

  3. A number that helps describe a set of data is an AVERAGE! Sometimes called a MEASURE OF CENTRAL TENDANCY

  4. Numerical Measures of Data • Central Tendency is the value or values around which the data tend to cluster • Variability shows how strongly the data cluster around that value

  5. Finding the Center All of these are Measures of Central Tendency • MEAN • MEDIAN • MODE • MIDRANGE The question “What’s my average?” has many meanings What we should say is “What’s my mean?”

  6. Mode Mean Mid-Range Median WHAT DO THEY ALL MEAN?

  7. Mean • Arithmetic Mean (Mean) • the measure of center obtained by adding the values and dividing the total by the number of values • What most people call an average.

  8. Notation  denotes the sum of a set of values. x is the variable usually used to represent the individual data values. n represents the number of data values in a sample. N represents the number of data values in a population.

  9. Mean • The sample mean is computed using sample data. • Denoted by • The sample mean is a statistic. If x1, x2, …, xn are the n observations of a variable from a sample, then the sample mean, , is

  10. Mean • The population mean is computed using all data points in a population. • Denoted by µ • The population mean is a parameter. If x1, x2, …, xn are the N observations of a variable from a population, then the population mean, µ , is

  11. x x x = n x µ = N Mean µis pronounced ‘mu’ and denotes themean of all values in apopulation is pronounced ‘x-bar’ and denotes the mean of a set of sample values

  12. Computing Sample Mean The following data represent the travel times (in minutes) to work for a sample of seven employees of an insurance company. 23, 36, 23, 18, 5, 26, 43 Compute the sample mean.

  13. x x = n Computing Sample Mean

  14. Mean • Regardless of the shape of the distribution, the mean is the point at which a histogram of the data would balance:

  15. Median The median represents the middle value when the original data values are arranged in increasing or decreasing order • The median will be one of the data values if there is an odd number of values. • The median will be the average of two data values if there is an even number of values.

  16. Median • The median is the value with exactly half the data values below it and half above it. • It is the middle data value (once the data values have been ordered) that divides the histogram into two equal areas. • It has the same units as the data.

  17. Computing the Median The following data represent the travel times (in minutes) to work for a sample of seven employees of an insurance company. 23, 36, 23, 18, 5, 26, 43 Determine the median of this data.

  18. Computing the Median 23, 36, 23, 18, 5, 26, 43 Step 1: Order the data: 5, 18, 23, 23, 26, 36, 43 Step 2: Locate the middle data point Median = 23

  19. Computing the Median Suppose the insurance company hires a new employee. The travel time of the new employee is 70 minutes. Determine the median of the “new” data set. 23, 36, 23, 18, 5, 26, 43, 70

  20. Computing the Median 23, 36, 23, 18, 5, 26, 43, 70 Step 1: Order the data: 5, 18, 23, 23, 26, 36, 43, 70 Step 2: Locate the middle data point Step 3: Find the mean of the two middle data points Median = (23 + 26) / 2 = 24.5

  21. Describe the distribution The following data represent the asking price of homes for sale in Lincoln, NE. Source: http://www.homeseekers.com

  22. Describe the distribution Find the mean and median. Use the mean and median to identify the shape of the distribution. Verify your result by drawing a histogram of the data. The mean asking price is $168,320 The median asking price is $148,700 Therefore, we would conjecture that the distribution is skewed right.

  23. Mode • The mode is the value that occurs most often in a data set. • There may be no mode, one mode (unimodal), two modes (bimodal), or many modes (multimodal).

  24. Mode NFL Signing Bonuses: Find the mode of the signing bonuses of eight NFL players for a specific year. The bonuses in millions of dollars are 18.0, 14.0, 34.5, 10, 11.3, 10, 12.4, 10 You may find it easier to sort first. 10, 10, 10, 11.3, 12.4, 14.0, 18.0, 34.5 Select the value that occurs the most. The mode is 10 million dollars.

  25. Mode Coal Employees in Pennsylvania Find the mode for the number of coal employees per county for 10 selected counties in southwestern Pennsylvania. 110, 731, 1031, 84, 20, 118, 1162, 1977, 103, 752 No value occurs more than once. There is no mode.

  26. Mode Licensed Nuclear Reactors The data show the number of licensed nuclear reactors in the United States for a recent 15-year period. Find the mode. 104 104 104 104 104 107 109 109 109 110 109 111 112 111 109 104 and 109 both occur the most. The data set is said to be bimodal. • 104 104 104 104 104 107 109 109 109 110 • 109 111 112 111 109 The modes are 104 and 109.

  27. Modal Class Miles Run per Week Find the modal class for the frequency distribution of miles that 20 runners ran in one week. The modal class is 20.5 – 25.5. The mode, the midpoint of the modal class, is 23 miles per week.

  28. Midrange • The midrange is the average of the lowest and highest values in a data set.

  29. MidRange Water-Line Breaks In the last two winter seasons, the city of Brownsville, Minnesota, reported these numbers of water-line breaks per month. Find the midrange. 2, 3, 6, 8, 4, 1 The midrange is 4.5.

  30. Properties of the Mean • Uses all data values. • Varies less than the median or mode • Used in computing other statistics, such as the variance • Unique, usually not one of the data values • Cannot be used with open-ended classes • Affected by extremely high or low values, called outliers

  31. Central Tendency

  32. Properties of the Median • Gives the midpoint • Used when it is necessary to find out whether the data values fall into the upper half or lower half of the distribution. • Can be used for an open-ended distribution. • Affected less than the mean by extremely high or extremely low values.

  33. Properties of the Mode • Used when the most typical case is desired • Easiest average to compute • Can be used with nominal data • Not always unique or may not exist

  34. Properties of the Midrange • Easy to compute. • Gives the midpoint. • Affected by extremely high or low values in a data set

  35. Distributions

  36. Measure of Dispersion • The mean, median and mode give us an idea of the central tendency, or where the “middle” of the data is located • Variability gives us an idea of how spread out the data are around that middle • The combination of central tendency and dispersion provide a more complete picture of the data

  37. Measure of Dispersion • Without knowing something about how data is dispersed, measures of central tendency may be misleading. • For Example: A residential street with 20 homes on it having a mean value of $200,000 where all the homes are in a similar price range would be very different from a street with the same mean value but with 3 homes having a value of $1 million and the other 17 clustered around $60,000.

  38. Measures of Variation How Can We Measure Variability? • Range • Variance • Standard Deviation • Coefficient of Variation • Chebyshev’s Theorem • Empirical Rule (Normal)

  39. Range • The range is the difference between the highest and lowest values in a data set. Find the range in the following test scores. 100, 68, 74, 56, 57, 68 Range = High - Low = 100 - 56 = 44

  40. Range in a Histogram

  41. Range • Disadvantages: • Easy to compute, but not very informative • Considers only two observations (the smallest and largest)

  42. Variance & Standard Deviation • The variance is the average of the squares of the distance each value is from the mean. • The standard deviation is the square root of the variance. • The standard deviation is a measure of how spread out your data are.

  43. Variance & Standard Deviation • The population variance is • The population standard deviation is

  44. Variance & Standard Deviation Find the variance and standard deviation for the data set for how long paint lasts before it fades -25 25 15 -5 5 -15 625 625 225 25 25 225 35 35 35 35 35 35 1750

  45. Variance & Standard Deviation • The sample variance is • The sample standard deviation is

  46. Computational Formula • The sample variance is • The sample standard deviation is

  47. Why n - 1? s is an estimate of the population standard deviation () . • In order to calculate an unbiased estimate of the population standard deviation, subtract one from the denominator. • Sample standard deviation tends to be an underestimation of the population standard deviation.

More Related