1 / 46

Summary

Summary. Five numbers summary, percentiles, mean Box plot, modified box plot Robust statistic – mean, median, trimmed mean outlier Measures of variability range, IQR , average absolute deviation, variation and standard deviation

peers
Télécharger la présentation

Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summary • Five numbers summary, percentiles, mean • Box plot, modified box plot • Robust statistic – mean, median, trimmed mean • outlier • Measures of variability • range, IQR, average absolute deviation, variation and standard deviation • Average distance between each data value and the meanis zero.

  2. Standard deviation – empirical rule

  3. Standard deviation – empirical rule

  4. Standard deviation – empirical rule

  5. population (census) vs. sample parameter (population) vs. statistic (sample) Population - parameter Mean Standard deviation Sample - statistic Mean Standard deviation Výběr - statistika Výběrový průměr Výběrová směrodatná odchylka

  6. Bias, sampling • Sampling – how to construct sample from the population? • Bias – a sample is biased if it differs from the population in a systematic way. • Unbiased standard deviation – divide by .

  7. SRS • sampling with replacement • Generates independent samples. • Two sample values are independent if that what we get on the first one doesn't affect what we get on the second. • sampling without replacement • Deliberately avoid choosing any member of the population more than once. • This type of sampling is not independent, however it is more common. • The error is small as long as • the sample is large • the sample size is no more than 10% of population size

  8. Suppose you have a bag with 3 cards in it. The cards are numbered 0, 2 and 4. • Population mean = 2 • Population variance = 8/3 • An important property of a sample statistic that estimates a population parameter is that if you evaluate the sample statistic for every possible sample and average them all, the average of the sample statistic should equal the population parameter. We want: • This is called unbiased.

  9. Bessel’s game

  10. Histogram revision • Distribution – the pattern of values in the data • Histogram – visualizing the distribution • We can see • whether the data tend to be close to the particular value • whether the data varies a lot or a little about the most common values • whether that variation tends to be more above or below the common values • whether there are unusually large or small values in the data

  11. Life expectancy data – histogram • Use interactive histogram applet to generate histogram with bin size of 10, starting at 40. frequency life expectancy

  12. Life expectancy data – histogram frequency life expectancy

  13. Making conclusions from a histogram • What all you can tell for life expectancy data? • how many modes? • where is the mode? • symmetric, left skewed or right skewed? • outliers – yes or no? frequency life expectancy

  14. Making conclusions from a histogram • Where is the mode, the median, the mean? frequency life expectancy

  15. Five numbers summary Min. Q1 Median Q3 Max. 47.79 64.67 73.24 76.65 83.39 What is the position of the mean and the median?

  16. symmetric, left or rigt skewed?

  17. standardizing normování

  18. Playing chess • Pretend I am a chess player. • Which of the following tells you most about how good I am: • My rating is 1800. • 8110th place among world competitive chess players. • Ranked higher than 88% of competitive chess players.

  19. Distribution Distribution of scores in one particular year We should use relative frequencies and convert all absolute frequencies to proportions.

  20. Height data – absolute frequencies http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights

  21. Height data – relative frequencies

  22. Height data – relative frequencies What proportion of values is between 170 cm and 173.75 cm? 30%

  23. Height data – relative frequencies What proportion of values is between 170 cm and 175 cm? We can’t tell for certain.

  24. How should we modify data/histogram to allow us a more detail? • Adding more value to the dataset • Increasing the bin size • A smaller bin size

  25. Height data – relative frequencies What proportion of values is between 170 cm and 175 cm? 36%

  26. Height data – relative frequencies

  27. Decreasing bin size • Check out what happens with the smallest bin size for Physics Test Scores from http://quarknet.fnal.gov/cosmics/histo.shtml.

  28. Height

  29. Height data – relative frequencies

  30. Normal distribution recall the empirical rule 68-95-99.7

  31. Empirical rule -3 0 -2 -1 +1 +2 +3 0 3 1 2 4 5 6

  32. Z Z – number of standard deviations away from the mean If the Z-value is 1, how many percent are less than that value? cca 84 % -3 0 -2 -1 +1 +2 +3

  33. Who is more popular? Let’s demonstrate the importance of Z-scores with the following example.

  34. Who is more popular s.d. = 36 Z = -3.53 s.d. = 60 Z = -2.57

  35. Standardizing

  36. Formula • What formula describes what we did?

  37. Quiz • What does a negative Z-score mean? • The original value is negative. • The original value is less than mean. • The original value is less than 0. • The original value minus the mean is negative.

  38. Quiz II • If we standardize a distribution by converting every value to a Z-score, what will be the new mean of this standardized distribution? • If we standardize a distribution by converting every value to a Z-score, what will be the new standard deviation of this standardized distribution?

  39. Standard normal distribution N(,) N(,)

  40. Standard normal distribution

  41. Meaning of relative frequencies 3 4 4 5 3 1 3 2 2 3

  42. Histogram of these data

  43. Probability density function Probability density function (PDF) Hustota pravděpodobnosti

  44. Standard normal distribution

More Related