200 likes | 302 Vues
A Short Look at Several Esoteric Methods. Bootstrap Computational Models Nonlinear Systems Structural Equation Models LISREL AMOS Hierarchical Level Modeling There are others!. The Bootstrap. Bootstrapping is a relatively new statistical technique (Bradley Efron, 1979)
E N D
A Short Look at Several Esoteric Methods • Bootstrap • Computational Models • Nonlinear Systems • Structural Equation Models • LISREL • AMOS • Hierarchical Level Modeling • There are others!
The Bootstrap • Bootstrapping is a relatively new statistical technique (Bradley Efron, 1979) • It asks the researcher a simple question: • Which do you have more confidence in: • The assumption that the Probability Density Function of the statistic is normally distributed, or • Your Data
How to Bootstrap • Bootstrapping is a technique that utilizes computational power in place of mathematical rigor. • When we bootstrap a statistic, we treat the sample as if it were a population and draw a random sample, with replacement, from it. • We then run the statistical test on that sample and record the results. • We repeat the process a bunch of times • say 1000.
The Central Limit Theorem • By resampling, we are making the following claim: • My data set provides me a better estimate of the sampling distribution of the statistic in question than the assumption of normality. • In large samples, the Central Limit Theorem says that the sampling distribution of the sample mean will approximate a normal distribution. • How large is large? • Statisticians often use n>30 • That seems small to me.
When to bootstrap • When you have reason to doubt that the sampling distribution of your statistic is normally distributed. • When there is no theoretical sampling distribution • E.g. difference between two sample medians • When your statistic is inherently biased • e.g. the ratio of two sample means
The Standard Probability Density Function • The Central Limit Theorem says that, given this acceptably large sample, the probability density function (the sampling distribution) of the statistic will be normally distributed. • Thus we use our alpha level to determine where on the distribution the upper and lower confidence levels lie.
The Empirical Density Function • Bootstrapping says that we don’t have to to assume anything about the sampling distribution of the statistic • Rather we use the sample data to empirically estimate the shape of the sampling distribution. • Based on, appropriately enough, the sample!
So….. • So what do you trust… • a general assumption that we universally apply without real inspection? • or your data? • Myself, I go for the data… • Disclaimer: I do have a vested interest in this opinion. ($$$$)
How to Bootstrap • Obtain sample, say n observations • Using the sample, draw a ‘resample’ of n observations from the n original values • (using replacement) • This means some vales may appear more than once, • And other values may not be selected at all • Calculate the statistic you are interested in. • Repeat a large number of times • I recommend 1000.
Bootstrap results • Using the bootstrapped estimates, construct confidence intervals (CI) about the statistic (θ) you are interested. • We use confidence intervals rather than ‘test statistics’. • So if 0.0 (or some other value) does not fall within our 95% CI, then we can conclude that the true θ is different from 0.0 (or the other value). • e.g if we bootstrap a regression model and calculate a CI for the Bs, we can conclude that X is significant if 0.0 does not fall inside its 95% CI.
Types of Confidence intervals • There are several CIs of interest • Normal approximation • Percentile • Percentile-t • Bias Corrected • Accelerated Bias Corrected
Normal approximation • Assumes that the sampling distribution of the statistic is normal
When to Use Normal Approximation CI • OK – if the EDF can be assumed to be Normal, why use it? • How about when there is no sampling distribution of the statistic • Why would you assume it is normal if you can’t calculate it? • OK…How about don’t bother
Percentile • Uses actual 2.5% and 97.5% points in empirical sampling distribution • May perform poorly with small samples • Also assumes tthat EDF is unbiased
Bias Corrected • Uses the cumulative normal distribution of the sampling distribution to correct the endpoints based on bias in the EDF
Percentile-t • Standardizes the estimates and adjusts each according to our confidence in it. • Requires a double bootstrap!
Stata Commands for Bootstrap • regress murder06 unemrate05 povrate06 • regress murdernodc unemrate05 povrate06 • predict cook, c • predict lever, l • predict rst, rstu • list state murder06 cook lever rst • regress murder06 unemrate05 povrate06, vce(bootstrap) • regress murder06 unemrate05 povrate06, vce(bootstrap, rep(1000)) • bootstrap, bca reps(1000): regress murder06 unemrate05 povrate06 • estat bootstrap, bca