1 / 29

Last lecture summary

Last lecture summary. Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule. Statistical jargon. population (census) vs. sample parameter (population) vs. statistic (sample ). Population - parameter Mean Standard deviation.

lane-cherry
Télécharger la présentation

Last lecture summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Last lecture summary • Which measures of variability do you know? • What are they advantages and disadvantages? • Empirical rule

  2. Statistical jargon population (census) vs. sample parameter (population) vs. statistic (sample) Population - parameter Mean Standard deviation Sample - statistic Mean Standard deviation Výběr - statistika Výběrový průměr Výběrová směrodatná odchylka

  3. Statistical inference • A statistic is a value calculated from our observed data (sample). • A parameter is a value that describes the population. • We want to be able to generalize what we observe in our data to our population. In order to this, the sample needs to be representative. • How to select a representative sample? Use randomization.

  4. New stuff

  5. Random sampling • Simple Random Sampling (SRS) – each possible sample from the population is equally likely to be selected. • Stratified Sampling – simple random sample from subgroups of the population • subgroups: gender, age groups, … • Cluster sampling – divide the population into non-overlapping groups (clusters), sample is a randomly chosen cluster • example: population are all students in an area, randomly select schools and create a sample from students of the given school

  6. Simple random sampling • sampling with replacement (WR) • výběr s navrácením • Generates independent samples • Two sample values are independent if that what we get on the first one doesn't affect what we get on the second. • sampling without replacement (WOR) • výběr bez navrácení • Deliberately avoid choosing any member of the population more than once. • This type of sampling is not independent, however it is more common. • The error is small as long as • the sample is large • the sample size is no more than 10% of population size

  7. Bias • If a sample is not representative, it can introduce bias into our results. • bias – zkreslení, odchylka • A sample is biased if it differs from the population in a systematic way. • The Literary Digest poll, 1936, U. S. presidential election • surveyed 10 mil. people – subscribers • 2.3 mil. responded predicting (3:2) a Republican candidate to win • a Democrat candidate won • What went wrong? • only wealthy people were surveyed (selection bias) • survey was voluntary response (nonresponse bias) – angry people or people who want a change

  8. Bessel’s correction www.udacity.com – Statistics

  9. Sample vs. population SD • We use sample standard deviation to approximate population paramater • But don’t get confused with the actual standard deviation of a small dataset. • For example, let’s have this dataset: 5 2 1 0 7. Do you divide by or by ?

  10. Bessel's game

  11. Bessel's game • An important property of a sample statistic that estimates a population parameter is that if you evaluate the sample statistic for every possible sample and average them all, the average of the sample statistic should equal to the population parameter. We want: • This is called unbiased.

  12. Bessel’s game • List all possible samples of 2 cards. • Calculate sample averages. Population of all cards in a bag

  13. Bessel’s game • List all possible samples of 2 cards. • Calculate sample averages. • Now, half of you calculate sample variance using /n, and half of you using /(n-1). • And then average all sample variances. Population of all cards in a bag

  14. Bessel’s game

  15. Median absolute deviation (MAD) • standard deviation is not robust • IQR is robust • mean absolute deviation MAD – a robust equivalent of the standard deviation • Také your data, find median, calculate absolute deviation from the median, find the median of absolutes deviations

  16. Median absolute deviation (MAD) Median: MAD:

  17. normal distribution

  18. Playing chess • Pretend I am a chess player. • Which of the following tells you most about how good I am: • My rating is 1800. • 8110th place among world competitive chess players. • Ranked higher than 88% of competitive chess players.

  19. Distribution Distribution of scores in one particular year We should use relative frequencies and convert all absolute frequencies to proportions.

  20. Height data – absolute frequencies http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights

  21. Height data – relative frequencies

  22. Height data – relative frequencies What proportion of values is between 170 cm and 173.75 cm? 30%

  23. Height data – relative frequencies What proportion of values is between 170 cm and 175 cm? We can’t tell for certain.

  24. How should we modify data/histogram to allow us a more detail? • Adding more value to the dataset • Increasing the bin size • A smaller bin size

  25. Height data – relative frequencies What proportion of values is between 170 cm and 175 cm? 36%

  26. Height data – relative frequencies

  27. Height data – relative frequencies

  28. Normal distribution recall the empirical rule 68-95-99.7

More Related