# introducing Χ 2 Inference for Distribution of Categorical Variables

## introducing Χ 2 Inference for Distribution of Categorical Variables

1. A new test statistic introducingΧ2 Inference for Distribution of Categorical Variables

2. ThreeTests 1 3 2 Goodness Of Fit Association/ Independence Homogeneity Of Populations Conditions?

3. 1 Goodness of Fit Are the hypothesized proportions true? Proportion of Colors in a bag of M&M’s

4. Tech Toolbox pp. 843 and 863 All individual EXPECTED counts at least 1 and no more than 20% are less than 5. Some books say each cell must have at least 5 expected counts. Ho: the actual population proportions are equal to the hypothesized proportions Ha: at least two of the actual population proportions differ from their hypothesized proportions

5. Put data into lists as described on page 843 What is the largest component affecting our X2 statistic? Remember that Ho: pred = .13 pbrown = .13 pyellow = .14 pgreen = .16 porange = .20 pblue = .24 Let’s try our m&m data Look at 14.2 on page 846…let’s do some inference.

6. For Goodness of Fit – Single categorical variable that takes values on a single population. Should Marijuana be legalized for medical purposes? Activity 14B p. 849 For Test of Homogeneity – Either two categorical variables taking values on a single population Or A single categorical variable taking values on two or more populations. • Explain what a two-waytable is. • Compute row or column conditional distributions. • Identify the null hypothesis for a X2 test for homogeneity of populations.

7. 2 Test for Homogeneity Our example What type of 2 way table do we have?

8. Ho: The distribution of the response variable is the same in all c populations. What is the null hypothesis?

9. Expected Cell counts Let’s find our expected cell counts. Remember, the alternative hypothesis is that the response variable distribution is not the same over the explanatory variables.

10. Any questions still for Homogeneity of Proportions? Remember – Just because we find a X2 statistic that is significant and reject our null; we only know that not all of the population proportions were equal!

11. Other ways to state hypotheses Ho: the two categorical variables are not related. Ha: the two categorical variables are related. Ho: the two categorical variables are independent. Ha: the two categorical variables are not independent (or, are dependent).

12. Remember: Homogeneity: data must come from a comparative randomized experiment or from independent random samples from the populations of interest. Association/Independence: data should be obtained by classifying a single sample according to both categorical variables. Heath Care: Canada vs. USLife after being treated for a heart attack Let’s find the conditional distributions of the quality of life given the country where treated.

13. What are our hypotheses? Ho: There is no relationship between nationality and quality of life.Ha: There is some relationship between nationality and quality of life.

14. ? Conditions? Expected Counts large enough – let’s calculate! SRS the data came from independent random samples from the two populations (US and Canada, eh?)

15. Medical Care and Age • The type of medical care that a patient receives sometimes varies with the age of the patient. • Here are data from a study that asked whether women received a mammogram and biopsy after noticing a suspicious lump in their breast. • In this study, a single sample was classified two ways: by age and by whether or not the tests were done.

16. Now you can work on: ExercisesSpecial ProblemCase Closed We can also watch a video on Inference for Two Way Tables

