1 / 10

Limits to Statistical Theory Bootstrap analysis

Limits to Statistical Theory Bootstrap analysis. ESM 206 11 April 2006. Assumption of t -test. Sample mean is a t -distributed random variable Guaranteed if observations are normally distributed random variables or sample size is very large

Télécharger la présentation

Limits to Statistical Theory Bootstrap analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Limits to Statistical TheoryBootstrap analysis ESM 206 11 April 2006

  2. Assumption of t-test • Sample mean is a t-distributed random variable • Guaranteed if observations are normally distributed random variables or sample size is very large • In practice, OK if observations are not too skewed and sample size is reasonably large • This assumption also applies when using standard formula for 95% CI of mean

  3. IN AN IDEAL WORLD Take sample Calculate sample mean Take new sample Calculate new mean Repeat many times Look at the distribution of sample means 95% CI ranges from 2.5 percentile to 97.5 percentile IN THE REAL WORLD Find some way to simulate taking a sample Calculate the sample mean Repeat many times Look at the distribution of sample means 95% CI ranges from 2.5 percentile to 97.5 percentile Resampling for a confidence interval of the mean

  4. PARAMETRIC BOOTSTRAP Assume data are random variables from a particular distribution E.g., log-normal Use data to estimate parameters of the distribution E.g., mean, variance Use random number generator to create sample Same size as original Calculate sample mean Allows us to ask: What if data were a random sample from specified distribution with specified parameters? NONPARAMETRIC BOOTSTRAP Assume underlying distribution from which data come is unknown Best estimate of this distribution is the data themselves – the empirical distribution function Create a new dataset by sampling with replacement from the data Same size as original Calculate sample mean WHICH IS BETTER? If underlying distribution is correctly chosen, parametric has more precision If underlying distribution incorrectly chosen, parametric has more bias Bootstrap resampling

  5. Parametric bootstrap If Y is log-normal, it is specified in terms of mean and standard deviation of X = log(Y) Mean = -0.547 SD = 1.360 Use “Monte Carlo Simulation” to generate 999 replicate simulated datasets from log-normal distribution Calculate mean of each replicate and sort means 25th value is lower end of 95% CI 975th value is upper end of 95% CI TcCB in the cleanup site 95% CI: [-0.678, 8.458]

  6. 95% CI: [0.917, 2.293] Parametric bootstrap: results

  7. Sort data Index the values (i = 1,2,…,n) Calculate q = i /(n+1) This is the quantile Plot quantiles against data values This is the empirical cumulative distribution function (CDF) Construct CDF of standard normal using same quantiles Compare the distributions at the same quantiles Normal QQ Plot

  8. 95% CI: [0.851, 9.248] Nonparametric bootstrap: results

  9. One sample t-test Calculate bootstrap CI of mean Does it overlap test value? Paired t-test Calculate differences: Di = xi - yi Find bootstrap CI of mean difference Does it overlap zero? Two-sample t-test Want to create simulated data where H0 is true (same mean) but allow variance and shape of distribution to differ between populations Easiest with nonparametric: Subtract mean from each sample. Now both samples have mean zero Resample these residuals, creating simulated group A from residuals of group A and simulated group B from residuals of group B Generate distribution of t values P is fraction of simulated t’s that exceed t calculated from data Bootstrap and hypothesis tests

  10. t = 1.45 Bootstrapped ‘t’ values do not follow a t distribution! P = 0.02 TcCB: H0: cleanup mean = reference mean

More Related