Statistical Inference

Statistical Inference Making decisions regarding the population base on a sample

Decision Types • Estimation • Deciding on the value of an unknown parameter • Hypothesis Testing • Deciding a statement regarding an unknown parameter is true of false • All decisions will be based on the values of statistics

Estimation • Definitions • An estimator of an unknown parameter is a sample statistic used for this purpose • An estimate is the value of the estimator after the data is collected • The performance of an estimator is assessed by determining its sampling distribution and measuring its closeness to the parameter being estimated

Examples of Estimators

The Sample Proportion Let p = population proportion of interest or binomial probability of success. Let = sample proportion or proportion of successes. is a normal distribution with

The Sample Mean Let x1, x2, x3, …, xn denote a sample of size n from a normal distribution with mean m and standard deviation s. Let is a normal distribution with

Confidence Intervals

Estimation by Confidence Intervals • Definition • An (100) P% confidence interval of an unknown parameter is a pair of sample statistics (t1 and t2) having the following properties: • P[t1 < t2] = 1. That is t1 is always smaller than t2. • P[the unknown parameter lies between t1 and t2] = P. • the statistics t1 and t2 are random variables • Property 2. states that the probability that the unknown parameter is bounded by the two statistics t1 and t2 is P.

Critical values for a distribution • The aupper critical value for a any distribution is the point xaunderneath the distribution such that P[X > xa] = a a xa

Critical values for the standard Normal distribution P[Z > za] = a a za

Critical values for the standard Normal distribution P[Z > za] = a

Let Confidence Intervals for a proportion p and Then t1 to t2 is a (1 – a)100% = P100% confidence interval for p

has a Standard Normal distribution Then and Hence Logic: Thus t1 to t2 is a (1 – a)100% = P100% confidence interval for p

Example • Suppose we are interested in determining the success rate of a new drug for reducing Blood Pressure • The new drug is given to n = 70 patients with abnormally high Blood Pressure • Of these patients to X = 63 were able to reduce the abnormally high level of Blood Pressure • The proportion of patients able to reduce the abnormally high level of Blood Pressure was

and za/2 = 1.960 If P = 1 – a = 0.95 then a/2 = .025 Then and Thus a 95% confidence interval for p is 0.8297 to 0.9703

Confidence Interval for a Proportion 100P% Confidence Interval for the population proportion: Interpretation: For about 100P% of all randomly selected samples from the population, the confidence interval computed in this manner captures the population proportion.

Error Bound For a (1 – a)% confidence level, the approximate margin of error in a sample proportion is

2. The sample proportion, . If the proportion is close to either 1 or 0 most individuals have the same trait or opinion, so there is little natural variability and the margin of error is smaller than if the proportion is near 0.5. Factors that Determine the Error Bound 1. The sample size, n. When sample size increases, margin of error decreases. 3. The “multiplier” za/2. Connected to the “(1 – a)%” level of confindence of the Error Bound. The value of za/2for a 95% level of confidence is 1.96 This value is changed to change the level of confidence.

Determination of Sample Size In almost all research situations the researcher is interested in the question: How large should the sample be?

Answer: Depends on: • How accurate you want the answer. Accuracy is specified by: • Specifying the magnitude of the error bound • Level of confidence

Error Bound: • If we have specified the level of confidence then the value of za/2 will be known. • If we have specified the magnitude of B, it will also be known Solving for n we get:

Summarizing: The sample size that will estimate p with an Error Bound B and level of confidence P = 1 – a is: • where: • B is the desired Error Bound • za/2 is the a/2 critical value for the standard normal distribution • p* is some preliminary estimate of p. • If you do not have a preliminary estimate of p, use p* = 0.50

Reason For p* = 0.50 n will take on the largest value. Thus using p* = 0.50, n may be larger than required if p is not 0.50. but will give the desired accuracy or better for all values of p.

Example • Suppose that I want to conduct a survey and want to estimate p = proportion of voters who favour a downtown location for a casino: • I know that the approximate value of p is • p* = 0.50. This is also a good choice for p if one has no preliminary estimate of its value. • I want the survey to estimate p with an error bound B = 0.01 (1 percentage point) • I want the level of confidence to be 95% (i.e. a = 0.05 and za/2 = z0.025 = 1.960 • Then

Let and Confidence Intervals for the mean of a Normal Population, m Then t1 to t2 is a (1 – a)100% = P100% confidence interval form

Then has a Standard Normal distribution Logic: and Hence Thus t1 to t2 is a (1 – a)100% = P100% confidence interval for p

Example • Suppose we are interested average Bone Mass Density (BMD) for women aged 70-75 • A sample n = 100 women aged 70-75 are selected and BMD is measured for eahc individual in the sample. • The average BMD for these individuals is: • The standard deviation (s) of BMD for these individuals is:

If P = 1 – a = 0.95 then a/2 = .025 and za/2 = 1.960 Then and Thus a 95% confidence interval formis 24.10 to 27.16

Determination of Sample Size Again a question to be asked: How large should the sample be?

Answer: Depends on: • How accurate you want the answer. Accuracy is specified by: • Specifying the magnitude of the error bound • Level of confidence

Error Bound: • If we have specified the level of confidence then the value of za/2 will be known. • If we have specified the magnitude of B, it will also be known Solving for n we get:

Summarizing: The sample size that will estimate m with an Error Bound B and level of confidence P = 1 – a is: • where: • B is the desired Error Bound • za/2 is the a/2 critical value for the standard normal distribution • s* is some preliminary estimate of s.

Notes: • n increases as B, the desired Error Bound, decreases • Larger sample size required for higher level of accuracy • n increases as the level of confidence, (1 – a), increases • za/2 increases as a/2 becomes closer to zero. • Larger sample size required for higher level of confidence • n increases as the standard deviation, s, of the population increases. • If the population is more variable then a larger sample size required

Summary: • The sample size n depends on: • Desired level of accuracy • Desired level of confidence • Variability of the population

Example • Suppose that one is interested in estimating the average number of grams of fat (m) in one kilogram of lean beef hamburger : • This will be estimated by: • randomly selecting one kilogram samples, then • Measuring the fat content for each sample. • Preliminary estimates of m and s indicate: • that m and s are approximately 220 and 40 respectively. • I want the study to estimate mwith an error bound 5 • and • a level of confidence to be 95% (i.e. a = 0.05 and za/2 = z0.025 = 1.960)

Solution Hence n = 246 one kilogram samples are required to estimate m within B = 5 gms with a 95% level of confidence.

Confidence Intervals

Confidence Interval for a Proportion

Determination of Sample Size The sample size that will estimate p with an Error Bound B and level of confidence P = 1 – a is: • where: • B is the desired Error Bound • za/2 is the a/2 critical value for the standard normal distribution • p* is some preliminary estimate of p.

Confidence Intervals for the mean of a Normal Population, m

Determination of Sample Size The sample size that will estimate m with an Error Bound B and level of confidence P = 1 – a is: • where: • B is the desired Error Bound • za/2 is the a/2 critical value for the standard normal distribution • s* is some preliminary estimate of s.

Hypothesis Testing An important area of statistical inference

Definition Hypothesis (H) • Statement about the parameters of the population • In hypothesis testing there are two hypotheses of interest. • The null hypothesis (H0) • The alternative hypothesis (HA)

Either • null hypothesis (H0) is true or • the alternative hypothesis (HA) is true. But not both We say that are mutually exclusive and exhaustive.

One has to make a decision • to either to accept null hypothesis (equivalent to rejecting HA) or • to reject null hypothesis (equivalent to accepting HA)

There are two possible errors that can be made. • Rejecting the null hypothesis when it is true. (type I error) • accepting the null hypothesis when it is false (type II error)

An analogy – a jury trial The two possible decisions are • Declare the accused innocent. • Declare the accused guilty.

The null hypothesis (H0) – the accused is innocent The alternative hypothesis (HA) – the accused is guilty

Statistical Inference