110 likes | 217 Vues
Learn how to calculate confidence intervals to measure the certainty in estimating unknown parameters like lead concentration or gas consumption using regression analysis. Discover how to construct confidence intervals and interpret their significance.
E N D
Confidence Intervals • Underlying model: Unknown parameter • We know how to calculate point estimates • E.g. regression analysis • But different data would change our estimates. • So, we treat our estimates as random variables • Want a measure of how confident we are in our estimate. • Calculate “Confidence Interval”
What is it? • If know how data sampled • We can construct a Confidence Interval for an unknown parameter, q. • A 95% C.I. gives a range such that true q is in interval 95% of the time. • A 100(1-a) C.I. captures true q (1-a) of the time. • Smaller a, more sure true q falls in interval, but wider interval.
Example 1: Lead in Water • Lead in drinking water causes serious health problems. • To test contamination, require a control site. • Problems: • Lead concentration in control site? • Estimate 95% confidence interval
Example 2: Gas Market • Recall U.S. gas market question: • By how much does gas consumption decrease when price increases? • Our linear model: • Estimate of b1: -.04237. • How confident are we in this estimate? • Construct 90% C.I. for this estimate
If Data ~N(m,s2) • Since we don’t know s, use t-distribution. • 95% C.I. for m: • s is standard error of mean. • t97.5 is critical value of t distribution • Draw on board (Prob = 2.5%)
t-distribution • Similar to Normal Distribution • Requires “degrees of freedom”. • df = (# data points) – (# variables). • E.g. mean of lead concentration, 8 samples, one variable: d.f.=7. • Higher d.f., closer t is to Normal distribution.
If Distribution Unknown • Can use “Bootstrapping”. • Draw large sample with replacement • Calculate mean • Repeat many times • Draw histogram of sample means • Calculate empirical 95% C.I. • Requires no previous knowledge of underlying process
Lead Concentration • 8 lead measurements: • Mean=51.39, s=5.75, t97.5=2.365 • Lower=51.39-(5.75)(2.365) • Upper= 51.39+(5.75)(2.365) • C.I. = [37.8,65.0] • Using bootstrapped samples: • C.I. = [40.8,62.08]
Gas Regression: S-Plus Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -0.0898134 0.0507787 -1.7687217 0.0867802 PG -0.0423712 0.0098406 -4.3057672 0.0001551 Y 0.0001587 0.0000068 23.4188561 0.0000000 PNC -0.1013809 0.0617077 -1.6429209 0.1105058 PUC -0.0432496 0.0241442 -1.7913093 0.0830122 Residual standard error: 0.02680668 on 31 degrees of freedom Multiple R-Squared: 0.9678838 F-statistic: 233.5615 on 4 and 31 degrees of freedom, the p-value is 0
Gas Price Response • b2=-.04237, s=.00984 • 90% C.I.: t95=1.695 (d.f.=37-5=32) • C.I. = [-.0591,-.0256] • Using bootstrapped samples: • C.I. = [-.063,-.026] • Response is probably between 2.5 gallons and 6 gallons.
Interpretation & Other Facts • There is a 95% chance that the true average lead concentration lies in this range. • There is a 90% chance that the true value of b1 lies in this range. • Also can calculate “confidence region” for 2 or more variables.