1 / 49

Statistics and Data Analysis

Statistics and Data Analysis. . Part 10 ? The Law of Large Numbers and the Central Limit Theorem. Sample Means and the Central Limit Theorem. Statistical InferenceSamplingRandom samplingBiases in samplingSampling from a particular distributionSa

megan
Télécharger la présentation

Statistics and Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics

    2. Statistics and Data Analysis

    3. Sample Means and the Central Limit Theorem Statistical Inference Sampling Random sampling Biases in sampling Sampling from a particular distribution Sample statistics Sampling distributions Distribution of the mean More general results on sampling distributions Results for sampling and sample statistics The Law of Large Numbers The Central Limit Theorem

    4. Measurement as Description

    5. Measurement as Observation - Sampling

    6. Statistics as Inference

    7. A Sample of Observations

    8. Random Sampling What makes a sample a random sample? Independent observations Same underlying process generates each observation made

    9. Types of Samples Cross Section: The sample of operators Time Series Panel (Longitudinal) – The WHO data. 5 Years, 191 Countries.

    10. Population Sample from what? What is a population? The set of all possible observations that could be drawn in a sample from that set.

    11. Overriding Principles in Statistical Inference Characteristics of a random sample will mimic (resemble) those of the population Mean, Median, etc. Histogram The sample is not a perfect picture of the population. It gets better as the sample gets larger.

    12. Sampling From a Particular Population X1 X2 … XN will denote a random sample. They are n random variables with the same distribution. x1, x2 … xN are the values taken by the random sample. Xi is the ith random variable xi is the ith observation

    13. Sampling from a Poisson Population Directory assistance operators clear all calls that reach them. The number of calls that arrive at an operator’s station are Poisson distributed with a mean of 800 per day. These are the assumptions that define the population 60 operators (stations) are observed on a given day.

    14. Sample from a Population The population: The amount of cash demanded in a bank each day is normally distributed with mean $10M (million) and standard deviation $3.5M. Random variables: X1,X2,…,XN will equal the amount of cash demanded on a set of N days. Observed sample: x1 ($12.178M), x2 ($9.343M), …, xn ($16.237M) are the values on n days. X1,…,XN are a random sample from a normal population with mean $10M and standard deviation $3.5M.

    15. Sampling Model For the present, we will focus on samples from normal populations We will generalize to other populations later in the discussion.

    16. Sample Statistics Statistic = a quantity that is computed from a sample. We will assume random samples. Ex. Sample sum: Ex. Sample mean Ex. Sample variance Ex. Sample minimum x[1]. Ex. Proportion of observations less than 10 The value M for which 50% of the observations are less than M. (This is a “quantile” – the median = 50th “percentile.”)

    17. Sampling Distribution The random sample is itself random, since each member is random. (A second sample will differ randomly from the first one.) Statistics computed from random samples will vary as well. For some statistics, the distributions of the elements in the sample will induce a distribution of the statistic.

    18. A Sample of Samples

    19. Distributions of the Sample Sum and the Sample Mean

    20. Sampling Distributions The distribution of a statistic in “repeated sampling” is the sampling distribution. The sampling distribution is the theoretical population that generates sample statistics.

    21. The Sample Sum Mean of the sum: E[X1+X2+…+XN] = E[X1]+E[X2]+…+E[XN] = Nµ Variance of the sum. Because of independence, Var[X1+X2+…+XN] = Var[X1]+…+Var[XN] = Ns2 Standard deviation of the sum = svN This result does not assume the data are normally distributed

    22. The Sample Mean Note Var[(1/N)Xi] = (1/N2)Var[Xi] (product rule) Expected value of the sample mean E(1/n)[X1+X2+…+XN] = (1/N){E[X1]+E[X2]+…+E[XN]} = (1/N)Nµ = µ Variance of the sample mean Var(1/N)[X1+X2+…+XN] = (1/N2){Var[X1]+…+Var[XN]} = ns2/N2 = s2/N Standard deviation of the sample mean = s/vN The data are not necessarily assumed to come from a normal population.

    23. Sample Results

    24. Sampling Distribution The sample mean has a sampling mean and a sampling variance. The sample mean also has a probability distribution.

    25. Sampling Distribution of the Mean Note the resemblance of the histogram to a normal distribution. In random sampling from a normal population with mean µ and variance s2, the sample mean will also have a normal distribution with mean µ and variance s2/N. Does this work for other distributions, such as Poisson and Binomial? Does the mean have the same distribution as the population? Poisson or binomial? No. Is the mean normally distributed? Approximately – to be pursued later.

    26. What of Sampling Distributions? The sampling distribution speaks of the behavior of the sample mean in repeated samples. I have only one sample and one sample mean. Why does any of this matter to me?

    27. Implication 1 of the Sampling Results

    28. Implication 2 of the Sampling Result

    29. Sampling Distribution

    30. Two Major Theorems Law of Large Numbers: As the sample size gets larger, sample statistics get ever closer to the population characteristics Central Limit Theorem: Sample statistics computed from means (such as the means, themselves) are approximately normally distributed, regardless of the parent distribution.

    31. The Law of Large Numbers

    32. The Law of Large Numbers Event consists of two random outcomes YES and NO Prob[YES occurs] = ? ? need not be 1/2 Prob[NO occurs ] = 1- ? Event is to be staged N times, independently N1 = number of times YES occurs, P = N1/N LLN: As N ? ? Prob[(P - ?) > ? ] ? 0 no matter how small ? is. For any N, P will deviate from ? because of randomness. As N gets larger, the difference will disappear.

    33. The LLN at Work - Roulette

    34. Application of the LLN The casino business is nothing more than a huge application of the law of large numbers. The insurance business is close to this as well.

    35. A Sampling Experiment - LLN

    37. Implication of the Law of Large Numbers If the sample is large enough, the difference between the sample mean and the true mean will be trivial. This follows from the fact that the variance of the mean is s2/N ? 0. An estimate of the population mean based on a large(er) sample is better than an estimate based on a small(er) one.

    38. Implication of the LLN Now, the problem of a “biased” sample: As the sample size grows, a biased sample produces a better and better estimator of the wrong quantity. Drawing a bigger sample does not make the bias go away. That was the essential fallacy of the Literary Digest poll and of the Hite Report.

    40. Central Limit Theorem Theorem (loosely): Regardless of the underlying distribution of the sample observations, if the sample is sufficiently large (generally > 30), the sample mean will be approximately normally distributed with mean µ and standard deviation s/vN.

    41. The CLT At Work

    42. Implication of the Central Limit Theorem Inferences about probabilities of events based on the sample mean can use the normal approximation even if the data themselves are not drawn from a normal population.

    43. Poisson Sample

    44. Applying the CLT

    45. Normally Distributed Data If the random sample is not drawn from a normal population, then the mean does not (precisely) have a normal distribution. Is this a problem? Not really, because of the central limit theorem. Can I determine if my data come from a normal population? Yes, more or less. Does it matter whether the data come from a normal population? Possibly. Assuming they do, when they don’t, might taint your statistical procedures.

    46. Overriding Principle in Statistical Inference (Remember) Characteristics of a random sample will mimic (resemble) those of the population Histogram The distribution of the observations.

    47. Finding Normality Do the data look like they come from a normal distribution? Symmetrically distributed around the mean? Very few outliers and observations far from the mean? Histogram look like a normal Sample quantiles look like normal? 15% less than mean – 1 standard deviation? 50% less than the mean? 85% less than mean + 1 standard deviation? etc.

    48. Finding Normality

    49. Normal Probability Plot

    50. Summary Random Sampling Statistics Sampling Distributions Law of Large Numbers Central Limit Theorem

More Related