290 likes | 561 Vues
Lecture 3. Miscellaneous details about hypothesis testing Type II error Practical significance vs. statistical significance Chapter 12.2: Inference about mean when s.d. is unknown. . Relation between p-value and rejection region methods.
E N D
Lecture 3 • Miscellaneous details about hypothesis testing • Type II error • Practical significance vs. statistical significance • Chapter 12.2: Inference about mean when s.d. is unknown.
Relation between p-value and rejection region methods • Compare the p-value to a. Reject the null hypothesis only if p-value <a • Ex. 11.1:
Null Hypothesis in One-Sided Test • We start by defining H1 because this is the focus of our test. • Example 11.1: H1: m > 170 • The null hypothesis is more logically satisfying than • However, only the parameter value in H0 that is closest to H1 influences the form of the test. • We therefore take for simplicity.
Calculating the Probability of a Type II Error • To properly interpret the results of a test of hypothesis, we need to • specify an appropriate significance level or judge the p-value of a test; • understand the relationship between Type I and Type II errors. • How do we compute a type II error?
Calculation of the Probability of a Type II Error • A Type II error occurs when a false H0 is not rejected. • To calculate Type II error we need to… • express the rejection region directly, in terms of the parameter hypothesized (not standardized). • specify the alternative value under H1. • Let us revisit Example 11.1
Let us revisit Example 11.1 • The rejection region was with a = .05. Calculation of the Probability of a Type II Error • Let the alternative value be m = 180 (rather than just m>170)
Judging the Test • A hypothesis test is effectively defined by the significance level a and by the sample size n. • A measures of effectiveness is the probability of Type II error. Typically we want to keep the probability of Type II error as small as possible. • If the probability of a Type II error b is judged to be too large, we can reduce it by • increasing a, and/or • increasing the sample size.
By increasing the sample size the standard deviation of the sampling distribution of the mean decreases. Thus, decreases. Judging the Test • Increasing the sample size reduces b
Judging the Test • In Example 11.1, suppose n increases from 400 to 1000. • a remains 5%, but the probability of a Type II drops dramatically.
Judging the Test • Another way of expressing how well a test performs is to report its power • The power of a test is defined as 1 - b. • It represents the probability of rejecting the null hypothesis when it is false.
Planning Studies • Power calculations are important in planning studies. • Using a hypothesis test with low power makes it unlikely that you will reject H0 even if the truth is far from the null hypothesis. • Operating characteristic curve is a plot of versus the alternative for a fixed sample size n and a fixed significance level
Problem 11.54 Many Alpine ski centers base their projections of revenues and profits on the assumption that the average Alpine skier skis 4 times per year. To investigate the validity of this assumption, a random sample of 63 skiers is drawn and each is asked to report the number of times they skied the previous year. Assume that the population standard deviation is 2, and the sample mean is 4.84. Can we infer at the 10% level that the assumption is wrong?
Problem 11.54 follow-up • What is the probability of making a Type II error if the average Alpine skier skis 4.2 times per year?
Problem: Effects of SAT Coaching • Suppose that SAT mathematics scores in the absence of coaching have a normal distribution with 475 and standard deviation 100. Suppose further that coaching may change the mean but not the standard deviation. Calculate the p-value for the test of versus for each of the following three situations: (a) A coaching service coaches 100 students; their SAT-M scores average (b) By the next year, the coaching service has coached 1000 students; their SAT-M scores average (c) An advertising campaign brings the total number of students coached to 10,000; their average score is still
Practical Significance vs. Statistical Significance • An increase in the average SAT-M score from 475 to 478 is of little importance in seeking admission to college, but a large enough sample size will always declare very small effects statistically significant. • A confidence interval provides information about the size of the effect and should always be reported. The two-sided 95% confidence intervals for the SAT coaching problem are . Thus, for (a) - (458.4,497.6); (b) – (471.8,484.2); (c) – (476.04,479.96). • For large samples, the CI says “Yes, the mean score is higher after coaching but only by a small amount.”
Chapter 12 • In this chapter we utilize the approach developed before to describe a population. • Identify the parameter to be estimated or tested. • Specify the parameter’s estimator and its sampling distribution. • Construct a confidence interval estimator or perform a hypothesis test.
12.2 Inference About a Population Mean When the Population Standard Deviation Is Unknown • Recall that when sis known we use the following • statistic to estimate and test a population mean • When sis unknown, we use its point estimator s, and the z-statistic is replaced then by the t-statistic
t-Statistic • When the sampled population is normally distributed, the t statistic is Student t distributed with n-1 degrees of freedom. • Confidence Interval: where is the quantile of the Student t-distribution with n-1 degrees of freedom.
The t - Statistic t s The “degrees of freedom”, (a function of the sample size) determine how spread the distribution is (compared to the normal distribution) The t distribution is mound-shaped, and symmetrical around zero. d.f. = v2 d.f. = v1 v1 < v2 0
A = .05 tA t.100 t.05 t.025 t.01 t.005
Testing m when s is unknown • Example 12.1 • In order to determine the number of workers required to meet demand, the productivity of newly hired trainees is studied. • It is believed that trainees can process and distribute more than 450 packages per hour within one week of hiring. • Fifty trainees were observed for one hour. In this sample of 50 trainees, the mean number of packages processed is 460.38 and s=38.82. • Can we conclude that the belief is correct, based on the productivity observation of 50 trainees?
Checking the required conditions • In deriving the test and confidence interval, we have made two assumptions: (i) the sample is a random sample from the population; (ii) the distribution of the population is normal. • The t test is robust – the results are still approximately valid as long as the population is not extremely nonnormal. Also if the sample size is large, the results are approximately valid. • A rough graphical approach to examining normality is to look at the sample histogram.
JMP Example • Problem 12.45: Companies that sell groceries over the Internet are called e-grocers. Customers enter their orders, pay by credit card, and receive delivery by truck. A potential e-grocer analyzed the market and determined that to be profitable the average order would have to exceed $85. To determine whether an e-grocer would be profitable in one large city, she offered the service and recorded the size of the order for a random sample of customers. Can we infer from the data than e-grocery will be profitable in this city at significance level 0.05?
Practice Problems • 11.68,11.84,12.40,12.46