130 likes | 140 Vues
Learn about testing independence between variables using Chi-square test and contingency tables in statistical analysis. Explore expected frequencies and how to conduct Chi-square tests in Excel. Examples and hypothesis testing included.
E N D
Testing for Independence QSCI 381 – Lecture 41 (Larson and Farber, Sect 10.2)
Independence • Two variables are independent if the occurrence of one variable does not affect the probability of the other. • We often wish to examine whether two variables are independent: • Age and having a “high” heavy metal concentration. • Concerns regarding the most important factors influencing a fishery and occupation.
Contingency Tables • An shows the observed frequencies for two variables. The observed frequencies are arranged in r rows and c columns. The intersection of a row and a column is called a cell. contingency table r x c
Example-A-1 We wish to examine whether having a high concentration of heavy metals is independent of age.
Expected Frequencies • The expected frequency for a cell Er,c in a contingency table is:
The Chi-square Test for Independence-I • A is used to test the independence of two variables. The conditions for use of this test are: • the observed frequencies must be obtained from a random sample; and • each expected frequency must be greater than or equal to 5. • The null hypothesis for the test is that the variables are independent and the alternative hypothesis is that they are dependent. chi-square independence test
The Chi-square Test for Independence-II • The way this test works is to compare the observed frequencies with the expected frequencies (these expected frequencies are calculated assuming that the two variables are independent). • If the value of the test statistic is high then we reject the null hypothesis of independence.
The Chi-square Test for Independence-III • The test statistic for the chi-square independence test is: where Oij represents the observed frequencies and Eij represents the expected frequencies. • The sampling distribution for the test statistic is a chi-square distribution with degrees of freedom (r-1)(c-1).
Example-A-2 The value of the test statistic is in the rejection region for =0.05 but not for =0.01.
Using EXCEL to conduct Chi-square Tests. • EXCEL includes a function CHITEST which can be used to test for independence. • CHITEST(observed range, expected range) • CHITEST returns the probability associated with the test statistic, i.e. it returns CHIDIST(2,(r-1)(c-1)). • The result of applying CHITEST to the data for the example is 0.011922, i.e. a probability less than 0.05 and greater than 0.01.
Example-B-1 • We sample 150 animals and assess the fraction in each of four categories to be: • Test the null hypothesis that sex and maturity state are independent (=0.01).
Example-B-2 2=0.1256 We cannot reject the null hypothesis of independence. We did reject the null hypothesis that these data are consistent with a “healthy” marine mammal population.
Homogeneity of Proportions • The chi-square test can be used to test the null hypothesis that proportions in various categories are equal among several populations. • The alternative hypothesis for this test is that at least one proportion differs among populations.