In this chapter we’ll learn about ‘confidence intervals.’

In this chapter we’ll learn about ‘confidence intervals.’ • A confidence interval is a range that captures the ‘true value’ of a statistic with a specified probability (i.e. ‘confidence’). • Let’s figure out what this means.

To do so we need to continue exploring the principles of statistical inference: using samples to make estimates about a population. • See, e.g., King et al., Designing Social Inquiry, on the topic of inference.

Remember that fundamental to statistical inference are probability principles that allow us to answer the question: what would happen if we repeated this random sample many times, independently and under the same conditions?

According to the laws of probability, each independent, random sample of size-n from the same population yields the following: true value +/- random error

The procedure, to repeat, must be a random sample or a randomized experiment (or, at very least, independent observations from a population) in order for probability to operate. • If not, the use of statistical inference is invalid.

Remember also that sample means are unbiased estimates of the population mean; & that the standard deviation of sample means can be made narrower by (substantially) increasing the size of random samples-n. • Further: remember that means are less variable & more normally distributed than individual observations.

If the underlying population distribution is normal, then the sampling distribution of the mean will also be normal. • There’s also the Law of Large Numbers.

And last but perhaps most important, there’s the Central Limit Theorem: given a simple random sample from a population with any distribution of x, when n is large thesampling distribution of sample means is approximately normal.

That is, in large samples weighted averages are distributed as normal variables.

The Central Limit Theorem allows us to use normal probability calculations to answer questions about sample means from many observations even when the population distribution is not normal. • Of course, the sample size must be large enough to do so.

N=30 is a common benchmark threshold for the Central Limit Theorem, but N=100 or more may be required, depending on the variability of the distribution. • Greater N is required with greater variability in the variable of interest (as well as to have sufficient observations to conduct hypothesis tests).

The Point of Departure for Inferential Statistics Here, now, is the most basic problem in inferential statistics: you’ve drawn a random sample & estimated a sample mean. • How reliable is this estimate? After all, repeated random samples of the same sample size-n in the same population would be unlikely to give the same sample mean.

How do you know, then, where the sample mean obtained would be located in the variable’s sampling distribution: i.e. on its histogram displaying the sample means for all possible random samples of the same size-n in the same population?

Can’t we simply rely on the fact that the sample mean is an unbiased estimator of the population mean?

No, we can’t: that only says that the sample mean of a random sample has no systematic tendency to undershoot or overshoot the population mean. • We still don’t know if, e.g., the sample mean we obtained is at the very low end or the very high end of the histogram of the sampling distribution, or is located somewhere around the center.

In other words, a sample estimate without an indication of variability is of little value. • In fact, what’s the worst thing about a sample of just one observation?

Answer • A sample of one observation doesn’t allow us to estimate the variability of the sample mean over repeated random samples of the same size in the same population. See Freedman et al., Statistics.

To repeat, a sample estimate without an indication of variability is of little value. • What must we do?

Introduction to Confidence Intervals • The solution has to do with a sample mean’s standard deviation, divided by the square root of the sample size-n. • Thus we compute the sample mean’s standard deviation & divide it by the square root of the sample size-n: this is called the standard error (see Moore/McCabe/Craig Chapter 7).

What does the result allow us to do? • It allows us to situate the sample mean’s variability within the sampling distribution of the sample mean: the distribution of sample means for all possible random samples of the same size from the same population. • It is the standard deviation of thesampling distribution of the sample mean (i.e. of the sample mean over repeated independent random samples of the same size & in the same population).

And it allows us to situate the sample mean’s variability in terms of the 68 – 95 – 99.7 Rule.

The probability is 68% that x-mean lies within +/- one standard deviation of the population mean (i.e. the true value); 95% that x-mean lies within +/- two standard deviations of the population mean; & 99.7% that x-mean lies within +/- three standard deviations of the population mean.

A common practice in statistics is to use the benchmark of +/- two standard deviations: i.e. a range likely to capture 95% of sample means obtained by repeated random samples of the same size-n in the same population.

We can therefore conclude: we’re 95% certain that this sample mean falls within +/- two standard deviations of the population mean—i.e. of the true population value.

Unfortunately, it also means that we still have room for worry: 5% of such samples will not obtain a sample mean within this range—i.e. will not capture the true population value.

The interval either captures the parameter (i.e. population mean) or it doesn’t. • What’s worse: we never know when the confidence interval captures the interval or not.

As Freedman et al. put it, a 95% confidence interval is “like buying a used car. About 5% turn out to be lemons.” • Recall that conclusions are always uncertain.

In any event, we’ve used our understanding of how the laws of probability work in the long run—with repeated random samples of size-n from the same population—to express a specified degree of confidence in the results of this one sample.

That is, the language of statistical inference uses the fact about what would happen in the long run to express our confidence in the results of any one random sample of independent observations.

If things are done right, this is how we interpret a 95% confidence interval: “This number was calculated by a method that captures the true population value in 95% of all possible samples.” • Again, it’s a range that captures the ‘true value’ of a statistic with a specified probability (i.e. confidence).

To repeat: the confidence interval either captures the parameter (i.e. the true population value) or it doesn’t—there’s no in between.

Warning! • A confidence interval addresses sampling error, but not non-sampling error. • What are the sources of non-sampling error?

Standard deviation vs. Standard error • Standard deviation: average deviation from the mean for a set of numbers. • Standard error: estimated average variation from the expected value of the sample mean for repeated, independent random samples of the same size & from the same population.

More on Confidence Intervals • Confidence intervals take the following form: • Sample estimate +/- margin of error • Margin of error: how accurate we believe our estimate is, based on the variability of the sample mean in repeated independent random sampling of the same size & in the same population.

The confidence interval is based on the sampling distribution of sample means: • It is also based on the Central Limit Theorem: the sampling distribution of sample means is approximately normal for large random samples whatever the underlying population distribution may be.

That is, what really matters is that the sampling distribution of sample means is normally distributed—not how the particular sample of observations is distributed (or whether the population distribution is normally distributed). • If the sample size is less than 30 or the assumption of population normality doesn’t hold, see Moore/McCabe/Craig on bootstrapping and Stata ‘help bootstrap’.

Besides the sampling distribution of sample means & the Central Limit Theorem, the computation of the confidence interval involves two other components: • C-level: i.e. the confidence level, which defines the probability that the confidence interval captures the parameter. • z-score: i.e. the standard score defined in terms of the C-level. It is the value on the standard normal curve with area C between –z* & +z*.

The z-score anchors the Confidence Level to the standard normal distribution of the sample means. • Here’s how the z-scores & C-levels are related to each other: z-score: 1.645 1.96 2.57 C-level: 90% 95% 99%

Any normal curve has probability C between the point z* standard deviations below the mean & point z* standard deviation above the mean. • E.g., probability .95 between z=1.96 & z= -1.96.

Here’s what to do: • Choose a z-score that corresponds to the desired level of confidence (1.645 for 90%; 1.960 for 95%; 2.576 for 99%). • Then multiply the z-score times the standard error. • Result: doing so anchors the estimated values of the confidence interval to the probability continuum of the sampling distribution of sample means.

How to do it in Stata . ci write Variable Obs Mean Std. Err. [95% Conf. Interval] write 200 52.775 .6702372 51.45332 54.09668 Note: Stata automatically translated the standard deviation into standard error. What is the computation for doing so?

If the data aren’t in memory, e.g.: . cii 200 63.1 7.8 (obs mean sd) Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------------------- | 200 63.1 .5515433 62.01238 64.18762 Note: 7.8 is the standard deviation; Stata automatically computed the standard error.

How to specify other confidence levels . ci math, level(90) . ci math, l(99)

Note: Stata’s ci & cii commands • See ‘view help ci’ & the ‘ci’ entry in Stata Reference A-G. • Stata assumes that the data are drawn from a sample, so it computes confidence intervals via the commands ci & cii based on t-distributions, which are less precise & hence wider than the z-distribution (which the Moore/McCabe/Craig book uses in this chapter). • We’ll address t-distributions in upcoming chapters, but keep in mind that they give wider CI’s than does the z-distribution.

Review: Confidence Intervals • Confidence intervals, & inferential statistics in general, are premised on random sampling or randomized assignment & the long-run laws of probability. • A confidence interval is a range that captures the ‘true value’ of a statistic with a specified probability over repeated random sampling of the same size in the same population.

If there’s no random sample or randomized assignment (or at least independent observations, such as weighing oneself repeatedly over a period of time), the use of a confidence interval is invalid. • What if you have data for an entire population? Then there’s no need for a confidence interval: terrific!

In this chapter we’ll learn about ‘confidence intervals.’