1 / 20

Bootstrapping

Bootstrapping . (And other statistical trickery). Reminder Of What We Do In Statistics. Null Hypothesis Statistical Test Logic Assume that the “no effect” case is true and then ask if our data is probable given that case.

verna
Télécharger la présentation

Bootstrapping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bootstrapping (And other statistical trickery)

  2. Reminder Of What We Do In Statistics • Null Hypothesis Statistical Test Logic • Assume that the “no effect” case is true and then ask if our data is probable given that case. • If we accept the null hypothesis: Our data isn’t improbable if the null hypothesis were true • If we reject: Our data is improbable if the null hypothesis were true

  3. Hypothesis Tests • The Null Hypothesis: • This is the hypothesis that we are looking to disprove • Usually, that there is “No Difference” • i.e. My sample is the same as the population (in the Z test) • In statistics the Null Hypothesis takes the form of the distribution of results that we would expect by chance More Likely Outcomes Less Likely Outcomes

  4. Hypothesis Tests • Remember, we have to take the upside down logic of how we would normally think about these things. • We say, if the null hypothesis were true, is my sample probable? More Likely Outcomes Less Likely Outcomes

  5. To Make it Work • We have to make assumptions about the population from which we selected our data. • These usually take the form of parametric assumptions. • In a t-test: We assume that the null population is normal • In a multiple regression: we assume that the errors are normal • In Poisson regression: we assume that the DV is Poisson

  6. T Test (Independent Samples) • Usually, the formula looks like this:

  7. The Problem • We are always having to make assumptions that we bend. • In multiple regression: errors are rarely exactly normal • In Poisson regression: the mean rarely equals the variance • Many statistical procedures assume Multivariate Normality • In path analysis: there are situations where even if the data were perfectly normal, the errors follow strange bimodal distributions

  8. Example • Skewed Distributions violate ALL typical parametric assumptions

  9. Early Solutions • The Monte Carlo Simulation: • Use the mean, variance and co-variance of your data to define a truly normal distribution • Sample repeatedly from these idealized distributions • Run your analyses using this simulated data • Your CI’s are the middle 95% of the distribution of parameters

  10. Nate Silver Example • Makes his best prediction of a candidate’s share of the vote (say 42%) • Applies a standard error to that guess (maybe he thinks this is +-5% with 95% confidence)

  11. 3. Creates this distribution of possible outcomes for this candidate 42% 37% 47%

  12. 4. Does this for each candidate in the nation 42% 51% 46% 67% 31% 62%

  13. 6. Samples randomly from each of those distributions (which may represent a win loss for each candidate) And then determines who won the house or senate and by how many seats • Does this 1000 times and ends up with this:

  14. Problems • This method assumes that the original data is really multivariate normal, and that the obtained data just a messy approximation of this. • This only solves situations where the standard errors do not follow a known distribution (but the data, in theory, do)

  15. The Jackknife • This is a good solution if your sample has outliers that are having undue influence on your data. • Recalculate estimate by leaving out 1 (or more) random cases from a dataset. • Repeat many times • New Parameter estimate is mean of all obtained parameters (usually B’s) • Std. Error is the variance of the distribution of B’s

  16. Bootstrap • This is generally agreed to be the best solutions of the sampling methods • Idea is incredible simple (usually far easier than actually computing standard errors) • Computationally intensive (by 1980’s standards). With modern day computing power you barely notice the added time.

  17. Bootstrap Procedure • Sample cases from your dataset randomly with replacement to obtain a new sample (with duplicates) that matched the N-Size of your original • Calculate parameter estimates (don’t worry about standard errors) • Repeat steps 1 and 2 200-1000 times • Every parameter will now have 200-1000 estimates • Mean of this sample is you main parameter estimate • Middle 95% of this sample is your middle 95% CI for the parameter

  18. From Hesterberg, Moore, Monaghan, Clipson and Epstein

  19. Advantages • Allows for non-symmetric, non-parametric distributions for variables And parameters • You don’t need to even know what your distribution is

  20. Disadvantages • You are assuming that your sample accurately reflects the distribution of the population that you have drawn it from. • This will be the case on average but various samples will deviate significantly from the population distribution • Be careful using this in small sample (my guideline is less than 50)

More Related