1 / 25

Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses

Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses. Robert Jeutong. Outline. Introduction Kansas Farmer Data Notation Modified Pearson Based Statistic Nonparametric Bootstrap Bootstrap p-Value Methods Simulation Study Conclusion. Introduction.

Télécharger la présentation

Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong

  2. Outline • Introduction • Kansas Farmer Data • Notation • Modified Pearson Based Statistic • Nonparametric Bootstrap • Bootstrap p-Value Methods • Simulation Study • Conclusion

  3. Introduction • “pick any” (or pick any/c) or multiple-response categorical variables • Survey data arising from multiple-response categorical variables questions present a unique challenge for analysis because of the dependence among responses provided by individual subjects. • Testing for independence between two categorical variables is often of interest • When at least one of the categorical variables can have multiple responses, traditional Pearson chisquare tests for independence should not be used because of the within-subject dependence among responses

  4. Intro cont’d • A special kind of independence, called marginal independence, becomes of interest in the presence of multiple response categorical variables • The purpose of this article is to develop new approaches to the testing of marginal independence between two multiple-response categorical variables • Agresti and Liu (1999) call this a test for simultaneous pair wise marginal independence (SPMI) • The proposed tests are extensions to the traditional Pearson chi-square tests for independence testing between single-response categorical variables

  5. Kansas Farmer Data • Comes from Loughin (1998) and Agresti and Liu (1999) • Conducted by the Department of Animal Sciences at Kansas State University • Two questions in the survey asked Kansas farmers about their sources of veterinary information and their swine waste storage methods • Farmers were permitted to select as many responses as applied from a list of items

  6. Data cont’d • Interest lies in determining whether sources of veterinary information are independent of waste storage methods in a similar manner as would be done in a traditional Pearson chi-square test applied to a contingency table with single-response categorical variables • A test for SPMI can be performed to determine whether each source of veterinary information is simultaneously independent of each swine waste storage method

  7. Data cont’d • 4 × 5 = 20 different 2 × 2 tablescan be formed to marginally summarize all possible responses to item pairs • Independence is tested in each of the 20 2 × 2 tables simultaneously for a test of SPMI

  8. Data cont’d • The test is marginal because responses are summed over the other item choices for each of the multiple-response categorical variables • If SPMI is rejected, examination of the individual 2 × 2 tablescan follow to determine why the rejection occurs

  9. Notation • Let W and Y = multiple-response categorical variables for an r × c table’s row and column variables, respectively • Sources of veterinary information are denoted by Y and waste storage methods are denoted by W • The categories for each multiple-response categorical variable are called items (Agresti and Liu, 1999); For example, lagoon is one of the items for waste storage method • Suppose W has r items and Y has c items. Also, suppose n subjects are sampled at random

  10. Notation cont’d • Let Wsi= 1 if a positive response is given for item i by subject s for i= 1,.. ,r and s = 1,.. ,n; Wsi = 0 for a negative response. • Let Ysj for j = 1,.., c and s = 1..,n be similarly defined. • The abbreviated notation, Wi and Yj , refers generally to the binary response random variable for item i and j, respectively • The set of correlated binary item responses for subject s are • Ys = (Ys1, Ys2,…,Ysc) and Ws= (Ws1, Ws2,…,Wsr )

  11. Notation cont’d • Cell counts in the joint table are denoted by ngh for the gth possible (W1…,Wr ) and hthpossible (Y1…,Yc) • The corresponding probability is denoted by τgh. Multinomial sampling is assumed to occur within the entire joint table; thus, ∑g,hτgh = 1 • Let mij denote the number of observed positive responses to Wi and Yj • The marginal probability of a positive response to Wi and Yj is denoted by πij and its maximum likelihood estimate (MLE) is mij/n.

  12. Joint Table

  13. SPMI Defined in Hypothesis • Ho: πij = πi•π•j for i = 1,...,r and j = 1,...,c • Ha: At least one equality does not hold • where πij = P(Wi = 1, Yj = 1), πi• = P(Wi = 1), and π•j= P(Yj= 1). This specifies marginal independence between each Wi and Yjpair • P(Wi = 1, Yj = 1) = πij • P(Wi= 1, Yj = 0) =πi• − πij • P(Wi= 0, Yj = 1) = π•j − πij • P(Wi=0, Yj = 0) = 1 − πi• − π•j + πij

  14. Hypothesis • SPMI can be written as ORWY,ij=1 for i = 1,…,r and j = 1,…,c where OR is the abbreviation for odds ratio and • ORWY,ij= πij(1 − πi• − π•j + πij)/[(πi• − πij)(π•j − πij)] • Therefore, SPMI represents simultaneous independence in the rc 2 × 2 pairwise item response tables formed for each Wi and Yjpair • Join independence implies SPMI but the reverse is not true

  15. Modified Pearson Statistic • Under the Null • (1,1), (1,0), (0,1), (1,1)

  16. The Statistic

  17. Nonparametric Bootstrap • To resample under independence of W and Y, Ws and Ysare independently resampledwith replacement from the data set. • The test statistic calculated for the bth resample of size n is denoted by X2∗S,b. • The p-value is calculated as • B-1∑bI(X2∗S,b ≥X2S) • where B is the number of resamples taken and I() is the indicator function

  18. Bootstrap p-Value Combination Methods • Each X2S,i,jgives a test for independence between each Wiand Yj pair for i = 1,…,r and j = 1,…,c. The p-values from each of these tests (using a χ21 approximation) can be combined to form a new statistic p tilde • the product of the r×c p-values or the minimum of the r×c p-values could be used as p tilde • The p-value is calculated as • B-1∑bI(p* tilde ≤ p tilde)

  19. Results from the Farmer Data

  20. Interpretation and Follow-Up • The p-values show strong evidence against SPMI • Since X2Sis the sum of rc different Pearson chi-square test statistics, each X2S,i,jcan be used to measure why SPMI is rejected • The individual tests can be done using an asymptotic χ21approximation or the estimated sampling distribution of the individual statistics calculated in the proposed bootstrap procedures • When this is done, the significant combinations are (Lagoon, pro consultant), (Lagoon, Veterinarian), (Pit, Veterinarian), (Pit, Feed companies & representatives), (Natural drainage, pro consultant), (Natural drainage, Magazines)

  21. Simulation Study • which testing procedures hold the correct size under a range of different situations and have power to detect various alternative hypotheses • 500 data sets for each simulation setting investigated • The SPMI testing methods are applied (B = 1000), and for each method the proportion of data sets are recorded for which SPMI is rejected at the 0.05 nominal level

  22. My Results • n=100 • 2×2 marginal table • OR = 25

  23. Conclusion • The bootstrap methods generally hold the correct size

More Related