Chapter 11The Chi-Square Test of Association/Independence Target Goal: I can perform a chi-square test for association/independence to determine whether there is convincing evidence of an association between two categorical variables. 11.2b h.w: pg. 728: 49, 51, 53 - 58
The chi-square test can also be used to show evidence that there is a relationship between two categorical variables. • Use this if you have independent SRS’s from several populations where one variable is categorical and the other is the sample number. • Or, if you have a single SRS with each individual classified according to two categorical variables. • Or, if you have an entire population with each individual classified according to two categorical variables.
Ex: Smoking and SES An example that classifies observations from a single population in two ways: by smoking habits and SES. • In a study of heart disease in male federal employees, researchers classified 356 volunteer subjects according to their socioeconomic status (SES) and their smoking status.
Observed Counts for smoking and SES SES Smoking High Middle Low Total Current 51 22 43 116 Former 92 21 28 141 Never 68 9 22 99 Total 211 52 93 356 • This is a 3x3 table with added margin totals. Even though this example is different than comparing several proportions, we can still apply the chi-square test because the row and column variables are not related to each other.
The Chi-Square Test of Association/Independence Use the chi-square test of association/independence to test the null hypothesis, Ho: there is no relationship between two categorical variables when you have a two way table from a single SRS, with each individual is classified according to both oftwo categorical variables.
SES cont. • SES is the explanatory variable therefore we need to compare the column percents that give the conditional distribution of smoking within each SES category.
Calculate Column Percents: • 51/211 = 0.242 about 24.2% of the high-SES group are current smokers. • Fill in the rest of the table.
Column percents for Smoking and SES SES Smoking High Middle Low Current 24.2 42.3 46.2 Former 43.6 40.4 30.1 Never 32.2 17.3 23.7 Total 100.0 100.0 100.0 What do the column percents suggest?
There is a negative association between smoking and SES. • The lower the SES, the more likely to smoke.
Computing Expected Cell Counts • 116 x 211 = 68.75 356
Expected Count for Smoking and SES SES Smoking High Middle Low Total Current 68.75 16.94 30.30 115.99 Former 83.57 20.60 36.83 141.00 Never 58.68 14.46 25.86 99.00 Total 211 52 92.99 355.99
Chi-square Test for Association/Independence Step 1: State - We want to perform a test of Ho: There is no association between smoking and SES. Ha: There is an association between smoking and SES.
Step 2: Plan If conditions are met, we should carry out a chi-square test of association/independence. Random: The subjects were volunteers, we may not be able to generalize our results. Large Sample Size: • To use chi-square we must check all expected counts. • We did this and all counts ≥ 1 and no more than 20% < 5.
Independence: • Because we are sampling without replacement, we need to check the 10% condition.It is safe to assume that the total number of male federal employees is at least 10(356) = 3560. • Thus, knowing the values of both variables for one person gives us no meaningful information about the variables for another person. So, individual observations are independent.
Step 3: Carry out the inference procedure. • The test statistic • Calculate by hand with df = (r-1)(c-1) = • Or with calculator, need to enter observed counts into matrix table A. • Note: the calculator will calculate the expected counts for you when you execute the X2 test.
Note: if doing by hand, could write calculator program to do “expected counts” or must do by hand. • Enter observed values in matrix A, • Then STAT:TESTS: -Test • The calculator enters expected values in matrix B. • P-value = .00098 Note: the association does not mean that SES causes smoking behavior.
Step 4: Conclude –Interpret the results in context. • With a p-value this low, we reject the null hypothesis at the alpha = .01 level and conclude that there is strong evidence of an association between smoking and SES in the population of male federal employees.
Follow-up Analysis Inference for Relationships Start by examining which cells in the two-way table show large deviations between the observed and expected counts. Then look at the individual components to see which terms contribute most to the chi-square statistic. Minitab output for the wine and music study displays the individual components that contribute to the chi-square statistic.
Follow-up Analysis Inference for Relationships Looking at the output, we see that just two of the nine components that make up the chi-square statistic contribute about 14 (almost 77%) of the total χ2 = 18.28. We are led to a specific conclusion: sales of Italian wine are strongly affected by Italian and French music.