1 / 33

Tests of Significance

Tests of Significance. An Inference Procedure We will study a procedure for learning about the unknown population mean on a quantitative variable. Background .

nili
Télécharger la présentation

Tests of Significance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tests of Significance An Inference Procedure We will study a procedure for learning about the unknown population mean on a quantitative variable.

  2. Background There are times we would like to know about the unknown mean in a population. But, it is often expensive and too time consuming to investigate the whole population. So, a sample is taken. The method of confidence intervals is based on idea that a point estimate would vary from sample to sample in theory and so from the one sample we do take we build in the variability and then are a certain percent confident our interval contains the unknown value. Hypothesis testing will rely on some of the same ideas used in confidence interval, but here there is a least a starting point for the unknown value. The starting point can be from past work or belief one has in a process.

  3. Example Consider an example about a company that puts cereal in a boxes. On the label of each box it says there are 368 grams of cereal in the box. Does each box have exactly 368 grams? Probably not because maybe a few extra flakes fall in one box and a few less in another box. But in the grand scheme of things the process is filling the boxes on average to 368 grams. (Now if one box had 268 grams and the other had 468 grams for an average of 368 we would have a problem, but not of the kind we are talking about here.)

  4. Null hypothesis From the cereal example we would say the null hypothesis is that the mean amount of cereal put in the boxes is 368 grams and in a shorthand notation we would write: Ho: μ = 368. The Ho stands for null hypothesis. Here this basically means if the company believes they are putting 368 ounces in each box then we will not on face value object to that assertion. The mu, μ, is the idea that we are making a hypothesis about the population of all boxes. Of course we will only take a sample, but our hypothesis is about the population mean.

  5. Alternative Hypothesis In hypothesis testing there will always be a mutually exclusive alternative hypothesis to the null. In the cereal example the alternative hypothesis may be that the cereal boxes are not being filled to an average of 368 grams and we would write this as H1: μ≠ 368. The general process of hypothesis testing starts with the null and alternative hypotheses. Then a sample is collected and analyzed. The analysis will have one either continue to believe in the null hypothesis and thus fail to reject the null, or one will reject the null and conclude the alternative is the one to go with. Note in the cereal example, if the null is rejected the firm better find out why the machine is not filling the cereal boxes properly and get that situation fixed.

  6. Analogy Story about hypothesis tests. Not really stats, but an idea to consider. Say I have two decks of cards. One deck is a regular deck – spades, hearts, diamonds and clubs. The other deck is special – 4 sets of hearts. Now, I take out one of the decks, but you do not know which one. In the language of statistics the null hypothesis will be that I took out the regular deck. You will accept the null hypothesis unless an event occurs that has a really low probability. If a really low probability event occurs you will reject the null hypothesis and go with the alternative hypothesis. So, I take out a deck and deal you five cards – a royal flush hearts! You would reject the null hypothesis of a regular deck and go with the alternative that the deck I pulled out is the special one because a royal flush hearts has a low probability in a regular deck.

  7. Sampling Distribution You may recall that when we have a quantitative variable and the population standard deviation of the variable is known, the distribution of the sample mean is 1) normal 2) Has the same mean as the mean of the variable in the population, 3) Has standard deviation = standard deviation in the population divided by the square root of the sample size. When the population standard deviation is not known we rely on the sample standard deviation and the distribution of the sample mean is a t distribution. In what follows we assume population standard deviation is known, but the ideas we bring up are also relevant later.

  8. Regions of Sampling Distribution X μ Imagine that this slide has animation. Think about the arrows as both starting out in the center and as the arrows move out they push the vertical lines with them. Using the cereal example, the center of the distribution is thought to be at 368. As we move in either direction from the center we have sample means that are possible when the population mean really is 368. But at some point as we move out we start to wonder about our 1 sample mean as really coming from a distribution with mean equal to 368.

  9. Regions of Sampling Distribution In the process of hypothesis testing the area of the sampling distribution is divided up into regions. The nonrejection region is the area in the middle of the distribution. These values are relatively close to the center. So if we get a sample value in this area we do not have enough evidence to reject the null hypothesis. The “tail” areas that I have on the previous screen are considered rejection regions. While sample mean values could occur in these regions when in fact the true mean is 368, the probability is low and thus this raises suspicion about the null hypothesized value and leads us to reject the null. (Could I deal you a royal flush hearts from a regular deck? Yes, but chance is small, or much better under the alternative hypothesis.)

  10. Critical Values The values of x bar (sample means) that occur where the arrows are pushed out are called critical values of x bar. Note that the critical values are not determined from the sample. The null hypothesized value is also NOT determined from the sample. Remember the null hypothesis value is determined from past work or knowledge of some process. The critical values are picked based on some additional ideas I want to explore next.

  11. Type I Error A Type I error is a situation where you reject the null hypothesis, Ho, when it is true and should not be rejected. The probability of making a type I error is called alpha and is often referred to as the level of significance. In the cereal example if we reject the null hypothesis we will have to shut down production and investigate the production process to see why it is not putting in the “correct” amount of cereal. There is a consequence to rejecting a true null hypothesis. Depending on the nature of the consequence we pick the value of alpha. Traditional values of alpha are .01, .05 and .1. The choice of alpha will be part of determining the critical x bar values.

  12. Type II Error A type II error is a situation where the null hypothesis is not rejected when it should be because the null is false. The probability of making a type II error is called beta, β. A type II error also has consequences. In the cereal example if we do not reject the null when we should we could either be giving more cereal than we say we are (and thus not charging for it – we certainly have costs in making it), or giving less than we say we are and thus cheating customers. In an introductory statistics class such as ours we typically focus on the type I error.

  13. Critical Value approach Alpha/2 Alpha/2 X Reject region Reject region μ = μo Do not reject region Upper critical value Lower critical value

  14. Critical value approach The null and alternative hypotheses can be stated in a generic way as Ho: μ = μo H1: μ≠μo, where μo is a specific number. In our cereal example we would have Ho: μ = 368 H1: μ≠ 368. When the alternative is a not equal sign we have what is called a two tailed test because if we are off in either direction we are concerned. In this case we divide up the alpha value in half and make our rejection regions have areas add up to alpha. If alpha = .05 we would have .025 in each tail of the distribution.

  15. Critical Value Approach Our context here is that we know the population standard deviation so we use the Z table (the standard normal table). While my graph a few slides back is of X bar, we translate to Z values. With alpha = .05 and thus .025 in either tail, the lower critical Z = -1.96 and the upper critical Z = 1.96. We would reject the null if from our sample the Zstat is less than -1.96 or greater then 1.96 Now, let’s say we take a sample of 25 observations and we get a mean of 372.5 grams and we know the population standard deviation is 15. The Zstat = (372.5 – 368)/(15/sqrt(25)) = 4.5/3 = 1.50. This means we can not reject the null. The data support the filling process is ok!

  16. p – value approach The critical value approach had you set up rejection regions and in the end work with a sample. In the p – value approach you will work with the sample almost as soon as you can. Remember we had a sample mean of 372.5 and the Zstat for this is 1.50. A Z of 1.50 has area .9332 to the left and .0668 to the right. The area to the right is the upper tail associated with the actual sample mean. In the critical value approach we had .025 in the upper tail. So, the .0668 suggests our sample mean is in the do not reject region. With a two tail test we look at the Zstat from the sample and the negative of the Zstat, here -1.50. Then when alpha = .05 we can see our tail areas add up to .1336.

  17. p – value approach The p – value for a sample mean is the probability in the tail given the null hypothesis is true. If we have a two tail test we just double the one tail value to get the p – value. Then if p – value > alpha we do not reject the null, but if the p – value < alpha we reject the null because we know the Zstat is more extreme than the critical values. If the p – value is low, then Ho must go. Note in our work a “low” p – value will be defined from problem to problem. Low from problem to problem may be called the level of significance or alpha.

  18. With a .01 level of significance we have .005 as the area in each tail. We would reject the null if 1) The Zstat is less than -2.576, or 2) The Zstat is greater than 2.576. Area = .005 Area = .005 -2.576 2.576

  19. One Tailed Tests Here we study the hypothesis test for the mean of a population when the alternative hypothesis is an inequality.

  20. Let’s follow an example and then highlight some main points about hypothesis testing. At its drive-up windows McDonald’s has had a mean service time of 163.9 seconds. McDonald’s would like to improve on this. The null hypothesis in this case would be that the mean of the population of all service times would be equal to 163.9. We write the null as Ho: μ= 163.9 The alternative, then, would be that the mean of the population of all service times are less than 163.9. We write H1: μ < 163.9. In this case the alternative hypothesis contains the improvement the company is attempting to make. The null hypothesis is a maintaining of the status quo. In a hypothesis test we will either 1) accept (fail to reject) the null hypothesis or 2) reject the null and go with the alternative.

  21. 163.9 Sample means service time Here I show the distribution of sample means of service times. The center is at 163.9. Note some samples would have means less than 163.9 even when in reality the population has mean of 163.9 and this would be due to the sampling variability we have talked about.

  22. Remember a Type I Error would have us reject the null when it is true. In this case a Type I Error would suggest the company is improving service time when it is in fact not doing so. We want to be careful about this, so, we will only reject the null when the probability of an observed event is really low. What do I mean by the phrase, “probability of an observed event?” In a study like this we do not look at every drive-up case. This would be expensive. So a sample is taken. From the sample we calculate a statistic and figure a probability of this value or a more extreme value. This is the event to which I refer. On the next slide I show the distribution of sample means, assuming the mean is exactly 163.9 seconds. Also say we know the population standard deviation is 20 seconds.

  23. Level of significance 163.9 Accept null Sample means in seconds Reject null Critical value In terms of a Z = 0

  24. Now, in the graph on the previous slide I put the mean of the distribution at 163.9. You will notice on the left side of the distribution I put another vertical line and wrote next to it the phrase level of significance. I also labeled the value on the horizontal axis as “critical value.” This is the lowest level the sample mean could be and still have the claim true. Here is the basic logic of the hypothesis test. Say in our one sample we get a value farther away from the center than the critical value. Although it could happen, the probability is small. This will lead us to reject the null.

  25. Now, let’s refer back to the graph on slide 23. The area I called the level of significance is the “low” probability I referred to before. If our sample value falls in this area past the critical value we will reject the null. But we pick the level of significance to be low, like .05 or .01, and thus we make our chance of making a Type I Error low. Next I want to develop some equivalent ways of conducting the hypothesis test. They are all similar and related.

  26. Critical Z method Level of significance = Accept null Z distribution Reject null Critical value of Z = In terms of a Z = 0

  27. Say that alpha, or the level of significance we want is .05. In this case this means all .05 is in the lower tail we have a critical value of -1.645. We put all of the alpha =.05 in the lower tail because our alternative is an inequality. Will you please put the alpha and critical Z value in the graph on the previous slide? Up until this point I have not used a real sample mean value. I have talked in the hypothetical. The hypothetical allowed me to think of critical values. (In practical terms you would probably have a sample mean from data at this time, but we have not used it yet.)

  28. To complete the test we have to take the sample mean and actually calculate the Z value for it. This will be called the test statistic Zstat. Say in our sample of 25 we get an X bar = 152.7 and the population standard deviation of 20 seconds. Thus, Zstat= (sample mean minus hypothesis value)/(popstandard dev/sqrt(n)) = (152.7 – 163.9)/(20/sqrt(25)) = -2.80. Since the test statistic is farther from a Z of 0 than the critical value is (-1.654) we reject the null and go with the alternative in this example. McDonald’s has done something to speed up its service time (reduce the time to serve customers).

  29. p value method Level of significance = .05 0 Accept null Z distribution Reject null Critical value of Z = -1.645

  30. p-value What is a p-value? Let’s develop a story. When we said the level of significance was picked to be .05 we could find the critical value of -1.645 in terms of a Z value. Then we had a sample mean of 152.7 and a corresponding Zstatof minus 2.80. On the number line this Zstatis farther from the center than the critical value. And for this reason we rejected the null. Now, if you go to the graph on the previous slide, and if you put your right pinky, palm up on the critical value, the area to the left of your pinky was chosen to be .05.

  31. Now, move your pinky to the left to the value of the sample mean, or its corresponding Z. The area under the curve to the left of your pinky is less than .05. This actual area is called the p-value. It is just the level of significance of the sample statistic we obtained. An equivalent way to test an hypothesis then is to reject the null if the p value is less than the level of significance established for the test. In the Z table we see the Z = -2.80 have tail area .0026. This is the p-value from the sample and is certainly less than .05 and thus we reject the null and go with the alternative.

  32. Remember, all this is a one tailed test on the low side of the mean. What if we had a one tailed test on the high side of the mean? Same logic. Generic example: Sample size is 50. Alpha is picked to be .01. The critical Z is found in the table as 2.33 (this is the closest we can get. Do not reject null Reject null t 2.33 Say in a sample the sample mean is 6.034 and pop standard deviation is .02. SO our Zstat= (6.034 – 6.03)/(.02/sqrt(50)) = 1.41 and since this is not as big as 2.33 we do not reject the null. The p-value is .0793 and since this is bigger than .01 we do not reject the null.

  33. Let’s summarize some things. When an alternative hypothesis Ha has a not equal sign we have a two tailed test. We take alpha and divide by 2 to get critical values. Reject the null if our Zstatis more extreme than the critical values. If we go to the p-value approach we have to take our tail area and multiply by 2. Reject the null if the p-value for the Zstatis less than or equal to alpha. When an alternative hypothesis Ha has an inequality (either a less than or a greater than sign) sign we have a one tailed test. Alpha is all in one tail to get the critical value. Reject the null if our Zstatis more extreme than the critical value. If we go with the p-value approach the tail area with the Zstatis the p-value. Reject the null if the p-value for the Zstatis less than or equal to alpha.

More Related