Statistical Inference: Hypothesis Testing and Confidence Intervals

Experimental Statistics - week 2 Review Continued • Sampling Distributions • Chi-square • F • Statistical Inference • Confidence Intervals • Hypothesis Tests

Chi-Square Distribution (distribution of the sample variance) • IF: • Data are Normally Distributed • Observations are Independent • Then: has a Chi-Square distribution with n-1 degrees of freedom

Chi-square Distribution, Figure 7.10, page 357

F-Distribution • IF: • S12and S22are sample variances from 2 samples • samples independent • populations are both normal Then:

F-distribution, Figure 7.10, page 357

(1-a)x100% Confidence Intervalsfor m • Setting: • Data are Normally Distributed • Observations are Independent Case 1:s known Case 2:s unknown

CI Example An insurance company is concerned about the number and magnitude of hail damage claims it received this year. A random sample 20 of the thousands of claims it received this year resulted an average claim amount of $6,500 and a standard deviation of $1,500. What is a 95% confidence interval on the mean claim damage amount? Suppose that company actuaries believe the company does not need to increase insurance rates for hail damage if the mean claim damage amount is no greater than $7,000. Use the above information to make a recommendation regarding whether rates should be raised.

Interpretation of 95% Confidence Interval i.e. about 95% of these confidence intervals should “cover” the true mean 100 different 95% CI plotted in the case for which true mean is 80

Concern has been mounting that SAT scores are falling. • 3 years ago -- National AVG = 955 • Random Sample of 200 graduating high school students this year (sample average = 935)(each the standard deviation is about 100) Question:Have SAT scores dropped ? Procedure:Determine how “extreme” or “rare” our sample AVG of 935 is if population AVG really is 955.

We must decide: • The sample came from population with population AVG = 955 and just by chance the sample AVG is “small.” OR • We are not willing to believe that the pop. AVG this year is really 955. (Conclude SAT scores have fallen.)

Hypothesis Testing Terminology Statistical Hypothesis - statement about the parameters of one or more populations • Null Hypothesis • - hypothesis to be “tested” • (standard, traditional, claimed, etc.) • - hypothesis of no change, effect, or difference • (usually what the investigator wants to disprove) • Alternative Hypothesis • - null is not correct • (usually what the hypothesis the investigator suspects or wants to show)

Basic Hypothesis Testing Question: Do the Data provide sufficient evidence to refute the Null Hypothesis?

Hypothesis Testing (cont.) Critical Region (Rejection Region) - region of test statistic that leads to rejection of null (i.e. t > c, etc.) Critical Value - endpoint of critical region Significance Level - probability that the test statistic will be in the critical region if null is true - probability of rejecting when it is true

Types of Hypotheses One-Sided Tests Two-sided Tests

Rejection Regions for One- and Two-Sided Alternatives a -ta Critical Value

A Standard Hypothesis Test Write-up 1. State the null and alternative 2. Give significance level, test statistic,and the rejection region 3. Show calculations 4. State the conclusion - statistical decision - give conclusion in language of the problem

Hypothesis Testing Example 1 A solar cell requires a special crystal. If properly manufactured, the mean weight of these crystals is .4g. Suppose that 25 crystals are selected at random from from a batch of crystals and it is calculated that for these crystals, the average is .41g with a standard deviation of .02g. At the a= .01 level of significance, can we conclude that the batch is bad?

Hypothesis Testing Example 2 A box of detergent is designed to weigh on the average 3.25 lbs per box. A random sample of 18 boxes taken from the production line on a single day has a sample average of 3.238 lbs and a standard deviation of 0.037 lbs. Test whether the boxes seem to be underfilled.

Errors in Hypothesis Testing Actual Situation Null is True Null is False Correct Decision Do Not Reject Ho Type IIError ( 1 - a) ( b) Conclusion Correct Decision Type I Error Reject Ho (Power) ( a) ( 1 - b)

p-Value Note: “Large negative values” of tmake us believe alternative is true the probability of an observation as extreme or more extreme than the one observed when the null is true Suppose t = - 2.39 is observed from data for test above p-value -2.39 (observed value of t)

Note: -- if p-value is less than or equal to a, then we reject null at thea significance level -- the p-value is the smallest level of significance at which the null hypothesis would be rejected

Find the p-values for Examples 1 and 2

Two Independent Samples • Assumptions:Measurements from Each Population are • Mutually Independent • Independent within Each Sample • Independent Between Samples • Normally Distributed (or the Central Limit Theorem can be Invoked) • Analysis Differs Based on Whether the Two Populations Have the Same Standard Deviation

Two Types of Independent Samples • Population Standard Deviations Equal • Can Obtain a Better Estimate of the Common Standard Deviation by Combining or “Pooling” Individual Estimates • Population Standard Deviations Different • Must Estimate Each Standard Deviation • Very Good Approximate Tests are Available If Unsure, Do Not Assume Equal Standard Deviations

Equal Population Standard Deviations Test Statistic where df =n1 + n2 - 2

Behrens-Fisher Problem

Satterthwaite’s Approximate t Statistic (i.e. approximate t) (Approximate t df)

Often-Recommended Strategy for Tests on Means Test Whethers1 = s2 (F-test ) • If the test is not rejected, use the 2-sample t statistics, assuming equal standard deviations • If the test is rejected, use Satterthwaite’s approximate t statistic NOTE: This is Not a Wise Strategy • the F-test is highly susceptible to non-normality Recommended Strategy: • If uncertain about whether the standard deviations are equal, use Satterthwaite’s approximate t statistic

Example 3: Comparing the Mean Breaking Strengths of 2 Plastics Question: Is there a difference between the 2 plastics in terms of mean breaking strength? Plastic A: Plastic B: • Assumptions: • Mutually independent measurements • Normal distributions for measurements from each type of plastic • Equal population standard deviations

New diet -- Is it effective? Design: 50 people: randomly assign 25 to go on diet and 25 to eat normally for next month. Assess results by comparing weights at end of 1 month. Diet: No Diet: Run 2-sample t-test using guidelines we have discussed. Is this a good design?

Better Design: Randomly select subjects and measure them before and after 1-month on the diet. Subject Before After Difference 1 150 147 2 210 195 : : : n 187 190 3 15 : -3 Procedure:Calculate differences, and analyze differences using a 1-sample test “Paired t-Test”

Example 4:International Gymnastics Judging Question: Do judges from a contestant’s country rate their own contestant higher than do foreign judges? Data:

Statistical Inference: Hypothesis Testing and Confidence Intervals