1 / 66

Chapter 6

Chapter 6. Introduction to Inference. Introduction. We use statistical inference to draw conclusions from data Our conclusions must account for the natural variability in the data To account for the variability, formal inference relies on probability to describe chance variation

avel
Télécharger la présentation

Chapter 6

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 6 Introduction to Inference

  2. Introduction • We use statistical inference to draw conclusions from data • Our conclusions must account for the natural variability in the data • To account for the variability, formal inference relies on probability to describe chance variation • We can then correct our “eyeball” judgement by formal calculation

  3. Example • Scenario with the US draft. • Supposedly there should be no correlation between a draft number and birth-date. • Imagine that a sample shows the correlation was r = -.226 • If the correlation between draft number and birth-date really is zero how likely is it to get a correlation far from zero just by chance? • Does r = -0.226 sample correlation put the claim that the population correlation is 0 in doubt?

  4. TV Ads

  5. Cautions

  6. Cautions

  7. Statistical Inference • Chapter 6 introduces the reasoning of statistical inference • A major assumption is that data come from a random sample. • No statistical methodology can rescue bad data. • We temporarily make the generally unrealistic assumption that we know the standard deviation of the population ( ) • We will try and motivate inference based on the ideas we have developed in class. • Sampling distributions.

  8. Section 6.1 Estimating with Confidence

  9. Inference on Mean Population Parameter? Population Inference Sample Sample Statistic

  10. Introduction • One way to characterize a collection of businesses is to determine the average of some measure of size • Total assets is one commonly used measure • Asset Turnover, Manufacturing Defect Rate, etc. • If the collection of businesses is large, we generally take a sample and use the information gathered to make an inference about the entire collection • We use the term population to refer to the entire collection of interest

  11. Case 6.1 • Community banks are banks with less than a billion dollars of assets. • There are approximately 7500 such banks in the US • The Community Bankers Council of the American Bankers Association (ABA) conducts an annual survey of community banks • For the n=110 banks that make up the sample in a recent survey, the mean assets are 220 million dollars

  12. Review • Recall • The mean of the sampling distribution of is • is an unbiased estimator of • Which says that there is no systematic tendency to underestimate or over estimate the truth. • By LLN, we know that as sample size gets larger sample mean gets closer to population mean

  13. Review • So since is an unbiased estimate of and because of the Law of Large Numbers the value = 220 therefore appears to be a reasonable estimate of the mean assets  for all community banks

  14. Statistical Inference • But how reliable is this estimate? • A second sample would surely not give a mean of 220 again • Unbiasedness only says that there is no systematic tendency to underestimate or overestimate the truth • Could we plausibly get a sample mean of 250 or 200 on repeated samples? Of course! • An estimate without an indication of its variability is of limited value!

  15. Statistical Inference • We can answer questions about variation by looking at the spread • Recall: • The variation of if the standard deviation of is • And because of the Central Limit Theorem if the sample size is large enough

  16. Statistical Inference • In the majority of situations we will not know . So what do you do? • For now lets suppose that in our example the true standard deviation  is equal to the sample standard deviation s = 161 • This assumption is not realistic, although the assumption will give reasonably accurate results for large samples (n=110 is probably large enough) • In the next chapter we will learn how to proceed when  is not known

  17. Statistical Inference • Therefore through the central limit theorem and our large sample size of 110 individuals, we can reasonably assume that

  18. Statistical Inference • Recall the 68-95-99.7 we know that the probability that is between • Thus 95% of random samples will produce an x-bar thatlies within a 2 ’s of 

  19. Statistical Inference • To say that lies within 2 ’s of  is the same as to say that  is within 2 ’s of . • In our example then, saying that (x-bar) lies within 30 million dollars of  is the same as saying that  is within 30 million dollars of x-bar • We can say that in 95% of all samples the interval, will capture the true . • This is the same as saying, “We are 95% confident that  is in the interval • We can express “confidence” in the results from ANY ONE sample.

  20. Statistical Confidence • If we repeat what we have done many, many times then we will catch the true  95% of the time • Confidence describes what happens in the long run • It does not mean the probability that the true mean, , falls between is 95% •  is not random. Rather  is fixed and does not change.  is either in the interval, or not.

  21. Statistical Inference • We cannot know whether our sample is one of the 95% for which the interval catches  or one of the unlucky 5% • The statement: we are 95% confident that the unknown  lies between 190 and 250 is shorthand for saying “we arrived at these numbers by a process that gives correct results 95% of the time”

  22. Confidence Intervals • The interval of values between is called a 95% confidence interval for  • In general a confidence interval is in the following form: • estimate +/- margin of error • Margin of Error; • Is evaluated based on the variability of the estimate • Shows how precise our guess is • Estimate: • Is our guess for the value of the unknown parameter

  23. Confidence Intervals • A level C confidence interval for a parameter has two parts • An interval calculated from the data, of the form: estimate +/- margin of error • A confidence level C, which gives the probability that the interval will capture the true parameter value in repeated samples • You can choose the confidence level • Commonly, statisticians choose 95% • 90% and 99% are also popular depending on your needs

  24. Finish Example • So to finalize our example • We want to know the mean assets of the 7500 community banks • We take a random sample of 110 and find that • The margin of error is: • Thus a 95% confidence interval is: • (220 - 30, 220 + 30) = (190, 250) • We say “We are 95% confident that values between 190 million dollars and 250 million dollars will capture the mean assets of all 7500 community banks.”

  25. Statistical Confidence • But what would happen if we took another random sample of 110 banks • Most likely will be different, so that means that we would get a different confidence interval • In the long run, if we repeated the sampling process many times, 95% of the constructed confidence intervals will contain the population mean.

  26. Confidence Interval for a Population Mean • Now we will generalize the idea to get a confidence interval for any confidence level C. • We will use what we know about the sampling distribution of • Recall: • and when n is large by CLT

  27. Confidence Interval for a Population Mean The area between the critical values –z* and z* under the standard normal curve is C. Standard Normal Curve Probability=C P=(1-C)/2 P=(1-C)/2 -z* 0 +z*

  28. Confidence Interval for a Population Mean • We want to find upper value and a lower “critical values”, z*, so that the area between them is C. • We’ve kind of already found these critical values for some areas. (review) • Here are a few common z*’s and corresponding C

  29. Confidence Interval for a Population Mean Example: C=90% Standard Normal Curve The area between the critical values –z* and z* under the standard normal curve is C. Probability=0.9 P=(1-0.90)/2=0.05 P=(1-0.90)/2=0.05 -z* 0 +z*

  30. Confidence Interval for a Population Mean • We choose some z* so that • After some algebra we get

  31. Confidence Interval for a Population Mean • Thus, if we choose a SRS of size n from a population having unknown mean  and known standard deviation  • A level C confidence interval for  is • The quantity is the margin of error

  32. Example • The 110 banks in the ABA survey had mean assets of 220 million dollars. Assume that the standard deviation is 161. Give a 99% confidence interval for , the mean assets for all community banks.

  33. Answer (z* for C = 0.99) We are 99% confident that the mean LTDR for community banks is between 180.46 and 259.54

  34. How Confidence Intervals Behave • Recall: the margin of error is • Relation between the confidence level and margin of error • A higher confidence level -> increases z* -> larger margin of error. • But we would like to have a high level of confidence and a small margin of error. • Other ways to reduce margin of error • Reduce  • Increase the sample size (larger n)

  35. Reduce Level of Confidence • The common choices of confidence level are 99%, 95%, and 90% • The critical values z* for these levels are 2.576, 1.960, and 1.64 • Notice these decrease as the confidence level drops • If n and  are unchanged, settling for lower confidence will reduce the margin of error

  36. Reduce  • The standard deviation  measures variation in the population • Think of the variation among individuals in the population as noise that obscures the average value  • Sometimes we can reduce  by carefully controlling the measurement process or by restricting our attention to only part of a large population

  37. Increase the Sample Size n • Suppose we want to cut the margin of error in half • The square root in the formula implies that we must have four times as many observations, not just twice as many • E.g., Cut in half  divide by 2. The square root of 4 is 2. Hence, we must increase the sample size by a multiple of 4 to cut the margin of error in half.

  38. Example • An SRS of 100 ISU students. The average bus waiting time is 2.5 min. • Assume we know that  = 1.2 • Find a 95% confidence interval for the population mean (average bus waiting time for all ISU students), .

  39. Example Cont. We are 95% confident that the mean bus waiting time for all ISU students is between 2.3 and 2.7 minutes

  40. Example Cont. Now find a 80% confidence interval We are 80% confident that the mean bus waiting time for all ISU students is between 2.3 and 2.65 minutes

  41. Example Cont. • Now take a different SRS of 1000 ISU students. Assume that the average bus waiting time remains 2.5 min. • Find a 95% confidence interval • Z* = 1.96 • Margin of Error: • A 95% confidence interval: • We are 95 confident that the mean bus waiting time for all ISU students is between 2.43 and 2.575 minutes.

  42. Choosing the Sample Size • Planning ahead we can choose a sample size to get a desired margin of error and confidence level • To obtain a desired margin or error m, just set the margin of error equal to m, substitute the critical value z* for your desired confidence level, and solve for the sample size n

  43. Sample Size for Desired Margin of Error • The confidence interval for a population mean will have a specified margin of error m when the sample size is:

  44. Sample Size for Desired Margin of Error • In practice, observations cost time and money • The sample size you calculate from this formula may turn out to be too expensive • Always round your answer up to the next higher whole number • In practice we often calculate the margins of error corresponding to a range of values of n…we then decide what margin or error we can afford

  45. Example • You are planning a survey of starting salaries for recent business major graduates form your college. From a pilot study, you estimate that the standard deviation is about $8000. What sample size do you need to have a margin of error equal to $500 with 95% confidence.

  46. Answer We would need to survey 984 business major graduates for our estimate to be within $500 of the true mean with 95% confidence

  47. Conclusion • Keeping the sample size fixed, if the confidence level increases, the margin of error will be larger. A larger margin of error produces larger confidence interval. • Keeping the confidence level fixed, if the sample size gets larger, the margin of error will be smaller. A smaller margin of error gives narrower confidence interval. • Hence we can achieve both large confidence level and narrow confidence interval by increasing the sample size.

  48. Some Cautions • We have already seen that small margins of error and high confidence can require large numbers of observations. • You should also be aware that any formula for inference is correct only in specific circumstances…. (next slide!)

  49. Some Cautions • The data must be a SRS from the population • We are completely safe if we actually did a randomization and drew a SRS • We are not in great danger if the data can plausibly be thought of as independent observations from a population • The formula is not correct for probability sampling designs more complex than a SRS • Correct methods for other designs are available • If you plan such samples, be sure that you (or your statistical consultant) know how to carry out the inference you desire

  50. Some Cautions • There is no correct method for inference from data haphazardly collected with bias of unknown size • Fancy formulas cannot rescue badly produced data • Because x-bar is not resistant, outliers can have a large effect on the confidence interval • You can search for outliers and try to correct them or justify their removal before computing the interval • If the outliers cannot be removed, ask your statistical consultant about procedures that are not sensitive to outliers

More Related