1 / 84

STT 200 Statistical Methods

STT 200 Statistical Methods. Chapter 3 Inference for Categorical Data Testing For Goodness of Fit Using Chi – square Testing For Independence In Two – Way Tables. 3.3 Testing For Goodness of Fit Using Chi-square.

marycombs
Télécharger la présentation

STT 200 Statistical Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STT 200Statistical Methods Chapter 3 Inference for Categorical Data Testing For Goodness of Fit Using Chi – square Testing For Independence In Two – Way Tables

  2. 3.3 Testing For Goodness of Fit Using Chi-square So far in Chapter 3, we’ve looked at inferential procedures for binomial characteristics that can be summarized with a proportion or with a difference between two proportions. We now explore methods for assessing categorical data with more than two outcomes.

  3. Major Steps In The Goodness Of Fit Test Using The Chi-square Procedure The chi-square statistic can be used to test whether the observed counts in a frequency distribution or contingency table match the counts we would expect according to some model. A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is a model called a test of goodness – of – fit. In a chi-square goodness – of – fit test, the expected counts come from the predicting model. The test finds a P – value from a chi-square model with k – 1 degrees of freedom, where n is the number of categories in the categorical variable.

  4. Assumptions and Conditions • Counted Data Condition • The values in each cell are counts. • Doesn’t work with percents, proportions, or measurements. • Independence Assumption • The counts in each cell must be independent of each other. • For random samples, we can generalize to the entire population.

  5. Assumptions and Conditions Continued • Sample Size Assumption • Expected counts for each cell  5. • This is called the Expected Cell Frequency Condition. • If the assumptions and conditions are met, we can perform a Chi-Square Test for Goodness-of-Fit.

  6. Null and Alternative Hypotheses Null Hypothesis: The distribution of counts are the same. The observed differences are due to chance. Alternative Hypothesis: The distribution of counts are not the same. The observed differences are not due to chance.

  7. Chi-Square Calculations • Interested in difference between observed and expected: residuals. • Make positive by squaring them all. • Get relative sizes of the residuals by dividing them by the expected counts. • This is a Chi-Square Model with df = k – 1. • k is the number of categories, not the sample size.

  8. A regional transit authority is concerned about the number of riders on one of its bus routes. In setting up the route, the assumption was that the number of riders is the same on every day from Monday through Friday. The transit authority is now concerned this assumption might not be safe to make and records the number of riders on the route for a randomly selected set of weekdays.

  9. Bus Ridership

  10. Bus Ridership We know that sample data varies, but it is unclear if this data provides convincing evidence that the ridership is not the same on every weekday. Because the sample is random, we would expect to see small differences due to chance. We need to test whether the differences are significant.

  11. Bus Ridership

  12. Bus Ridership How many riders would we expect to see on each day, if the transit authority’s assumption (the null hypothesis) was true?

  13. In previous hypothesis tests, we constructed a test statistic using the following form: This construction was based on (1) identifying the difference between an observed sample statistic and what the null hypothesis would expect us to observe, and (2) standardizing that difference using the standard error of the sample statistic.

  14. When we were comparing one proportion to a claimed parameter or evaluating the difference in two proportions, we used a z – statistic. Now that we are comparing more than two proportions, we need to use a different statistic with a different distribution. This distribution is called the Chi – square distribution.

  15. These facts will serve as a useful frame of reference for making hypothesis test decisions.

  16. In previous hypothesis tests, we constructed a test statistic using the following form: What would this test statistic be for Mondays? Note: The null standard error in this situation is simply the square root of the expected value under the null hypothesis.

  17. What would this test statistic be for the other days?

  18. Summing all these test statistics gives a value that summarizes how far the actual counts are from what was expected. As it turns out, it is more common to add the squared values. Summing the squared test statistics has two consequences: 1. All standardized differences will be positive. 2. Unusual differences will become much larger.

  19. Degrees of Freedom

  20. As with other hypothesis tests, certain conditions must apply for the chi-square goodness of fit test to be valid. Condition #1: Each case that contributes a count to the table must be ____independent______of all the other cases in the table. Condition #2: The ___expected____ count for each level of the categorical variable must be at least ____5_____. Condition #3: There must be at least _____3______ levels of the categorical variable, corresponding to a chi-square distribution with at least _____2_____ degrees of freedom

  21. Bus Ridership

  22. Bus Ridership test statistic

  23. Bus Ridership P-value

  24. Bus Ridership P-value When evaluating a one – way table such as the one above, we use k – 1 degrees of freedom. Because there were 5 weekdays, we should calculate the p – value using distribution. P-value = χ2cdf(25.9, 10^10, 4) = 0.0000331 What conclusion should we make at the level? Reject the null hypothesis.

  25. Example: Desired Vacation Place The AAA travel agency would like to assess if the distribution of desired vacation place has changed from the model of 3 years ago. A random sample of 928 adults asked, “Name the one place you would want to go for vacation if you had the time and the money.”

  26. Example: Desired Vacation Place The AAA travel agency would like to assess if the distribution of desired vacation place has changed from the model of 3 years ago. a. Give the null hypothesis to test if there has been a significant change in the distribution of desired vacation place from 3 years ago.

  27. Example: Desired Vacation Place b. What test statistic value would we expect to see if the null hypothesis were, in fact, true?

  28. Example: Desired Vacation Place

  29. Example: Desired Vacation Place

  30. Section 3.4 Testing For Independence In Two-way Tables

  31. The Five Steps of the Chi-Squared Test of Independence • Assumptions: • Two categorical variables • Randomization • Expected counts >= 5 in all cells

  32. The Five Steps of the Chi-Squared Test of Independence • 2. Hypotheses: • NULL HYPOTHESIS: The two variables are independent • ALTERNATIVE HYPOTHESIS: The two variables are dependent (associated)

  33. The Five Steps of the Chi-Squared Test of Independence 3. Test Statistic:

  34. The Five Steps of the Chi-Squared Test of Independence

  35. Example: Testing For Independence In Two-way Tables NHANES In recent years, NHANES participants between the ages of 18 and 59 have been asked if they have ever tried marijuana. The responses to this question for 500 randomly selected respondents are summarized by age group in the table below.

  36. NHANES and marijuana use Are age and marijuana use independent? As before, we set up a null and alternative hypothesis: H0: Age and marijuana use are independent HA: Age and marijuana use are not independent Because we have two variables, one with more than two levels or categories, we will again use a χ2 test, but the expected countsand _degrees of freedomfor the test will be computed differently than with the goodness of fit test.

  37. Computing Expected Counts For a Two-way Table

  38. NHANES And Marijuana Use Write the expected count next to the observed count in the table below for each age group and response.

  39. NHANES And Marijuana Use

  40. NHANES And Marijuana Use

  41. Degrees of Freedom (df) In order to calculate the p – value , we need to know the degrees of freedom for the χ2 distribution. For a two – way table, this is calculated using the formula: df = (number of rows – 1) x (number of columns – 1) NOTE: The number of rows and number of columns in the formula does not include the total row or column.

  42. NHANES And Marijuana Use Calculate the degrees of freedom and the p – value for the test. Use α = 0.05 to reach a conclusion.

  43. Conditions For The χ2 Independence Test In order for the χ2 independence test to be valid our data must come from a simple random sample and the expected count in each cell of the two-way table must be at least 5. Were the conditions met for the marijuana example?

  44. Example: Equal Opportunity Avengers (Self-read Example) Is being an Avenger equally risky for male and female superheroes? The Avengers is a long-running, popular comic book series (the first issue was published in 1963) and has introduced 173 superhero characters over the years. Many of these characters have died. (Some have died more than once, after returning from their earlier “death(s)” in true comic book fashion.) The table below summarizes the gender of the Avenger and whether or not they have “died” at least once.

  45. Example: Equal Opportunity Avengers(Self-read Example) Conduct a χ2 – test of independence to determine if “Gender” is independent from “Died”. Treat the 173 Avengers as a random sample. Hypotheses: H0: Gender and Died are independent HA: Gender and Died are not independent

More Related