Chi-Square Heibatollah Baghi, and Mastee Badii
Chi-Square (χ2) and Frequency Data • Up to this point, the inference to the population has been concerned with “scores” on one or more variables, such as CAT scores, mathematics achievement, and hours spent on the computer. • We used these scores to make the inferences about population means. To be sure not all research questions involve score data. • Today the data that we analyze consists of frequencies; that is, the number of individuals falling into categories. In other words, the variables are measured on a nominal scale. • The test statistic for frequency data is Pearson Chi-Square. The magnitude of Pearson Chi-Square reflects the amount of discrepancy between observed frequencies and expected frequencies.
Steps in Test of Hypothesis • Determine the appropriate test • Establish the level of significance:α • Formulate the statistical hypothesis • Calculate the test statistic • Determine the degree of freedom • Compare computed test statistic against a tabled/critical value
1. Determine Appropriate Test • Chi Square is used when both variables are measured on a nominal scale. • It can be applied to interval or ratio data that have been categorized into a small number of groups. • It assumes that the observations are randomly sampled from the population. • All observations are independent (an individual can appear only once in a table and there are no overlapping categories). • It does not make any assumptions about the shape of the distribution nor about the homogeneity of variances.
2. Establish Level of Significance • α is a predetermined value • The convention • α = .05 • α = .01 • α = .001
3. Determine The Hypothesis:Whether There is an Association or Not • Ho : The two variables are independent • Ha : The two variables are associated
4. Calculating Test Statistics • Contrasts observed frequencies in each cell of a contingency table with expected frequencies. • The expected frequencies represent the number of cases that would be found in each cell if the null hypothesis were true ( i.e. the nominal variables are unrelated). • Expected frequency of two unrelated events is product of the row and column frequency divided by number of cases. Fe= Fr Fc / N
Continued 4. Calculating Test Statistics
Continued 4. Calculating Test Statistics Observed frequencies Expected frequency Expected frequency
5. Determine Degrees of Freedom df = (R-1)(C-1) Number of levels in column variable Number of levels in row variable
6. Compare computed test statistic against a tabled/critical value • The computed value of the Pearson chi- square statistic is compared with the critical value to determine if the computed value is improbable • The critical tabled values are based on sampling distributions of the Pearson chi-square statistic • If calculated 2 is greater than 2 table value, reject Ho
Example • Suppose a researcher is interested in voting preferences on gun control issues. • A questionnaire was developed and sent to a random sample of 90 voters. • The researcher also collects information about the political party membership of the sample of 90 respondents.
Bivariate Frequency Table or Contingency Table Observed frequencies
Row frequency Bivariate Frequency Table or Contingency Table
Bivariate Frequency Table or Contingency Table Column frequency
1. Determine Appropriate Test • Party Membership ( 2 levels) and Nominal • Voting Preference ( 3 levels) and Nominal
2. Establish Level of Significance Alpha of .05
3. Determine The Hypothesis • Ho : There is no difference between D & R in their opinion on gun control issue. • Ha : There is an association between responses to the gun control survey and the party membership in the population.
Continued 4. Calculating Test Statistics = 50*25/90
Continued 4. Calculating Test Statistics = 40* 25/90
Continued 4. Calculating Test Statistics = 11.03
5. Determine Degrees of Freedom df = (R-1)(C-1) =(2-1)(3-1) = 2
6. Compare computed test statistic against a tabled/critical value • α = 0.05 • df = 2 • Critical tabled value = 5.991 • Test statistic, 11.03, exceeds critical value • Null hypothesis is rejected • Democrats & Republicans differ significantly in their opinions on gun control issues
Additional Information in SPSS Output • Exceptions that might distort χ2Assumptions • Associations in some but not all categories • Low expected frequency per cell • Extent of association is not same as statistical significance Demonstrated through an example
Another Example Heparin Lock Placement Time: 1 = 72 hrs 2 = 96 hrs from Polit Text: Table 8-1
Continued Hypotheses in Heparin Lock Placement • Ho: There is no association between complication incidence and length of heparin lock placement. (The variables are independent). • Ha: There is an association between complication incidence and length of heparin lock placement. (The variables are related).
Continued More of SPSS Output
Pearson Chi-Square • Pearson Chi-Square = .250, p = .617 Since the p > .05, we fail to reject the null hypothesis that the complication rate is unrelated to heparin lock placement time. • Continuity correction is used in situations in which the expected frequency for any cell in a 2 by 2 table is less than 10.
Continued More SPSS Output
Phi Coefficient • Pearson Chi-Square provides information about the existence of relationship between 2 nominal variables, but not about the magnitude of the relationship • Phi coefficient is the measure of the strength of the association
Cramer’s V • When the table is larger than 2 by 2, a different index must be used to measure the strength of the relationship between the variables. One such index is Cramer’s V. • If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with particular categories of the second variable.
Cramer’s V • When the table is larger than 2 by 2, a different index must be used to measure the strength of the relationship between the variables. One such index is Cramer’s V. • If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with particular categories of the second variable. Smallest of number of rows or columns Number of cases
Take Home Lesson How to Test Association between Frequency of Two Nominal Variables