Analysis of count data

# Analysis of count data

## Analysis of count data

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Analysis of count data Introduction to log-linear models

2. Log-linear analysis • Contingency-table analysis • Categorical data analysis • Discrete multivariate analysis (Bishop, Fienberg and Holland, 1975) • Analysis of cross-classified data • Multivariate analysis of qualitative data (Goodman, 1978) • Count data analysis

3. Log-linear model fit a model to a table of counts / frequencies Two data sets: Survey: political attitudes of British electors Survey: leaving parental home in the Netherlands

4. Survey: political attitudes of British electors Source: Payne, C. (1977) The log-linear model for contingency. In: C.O. Muircheartaigh and C. Payne eds. The analysis of survey data. Vol 2: Model fitting, Wiley, New York, pp. 105-144 [data p. 106].(from Butler and Stokes, ‘Political change in Britain’, Macmillan, 2nd edidition, 1974)

5. Survey: leaving parental home in the Netherlands

6. The Poisson probability model Let N be a random variable representing the number of events during a unit interval and let n be a realisation of n (COUNT): N is a Poisson r.v. following a Poisson distribution with parameter : The parameter  is the expected number of events per unit time interval:  = E[N]

7. Likelihood function Probability mass function: Log-likelihood function:  Likelihood equations to determine ‘best’ value of 

8. Likelihood equations Hence: Hence: Var(N) = 

9. Let i represent an individual with characteristics xi The probability of observing ni events during a unit interval is: with or Log-linear model

10. The log-linear model The objective of log-linear analysis is to determine if the distribution of counts among the cells of a table can be explained by a simpler, underlying structure. Log-linear models specify different structures in terms of the cross-classified variables (rows, columns and layers of the table).

11. Log-linear models for two-way tables Saturated log-linear model: Overall effect (level) Main effects (marginal freq.) Interaction effect In case of 2 x 2 table: 4 observations 9 parameters Normalisation constraints

12. Survey: leaving parental home in the Netherlands

13. Leaving home Descriptive statistics • Counts • Percentages • Odds of leaving home early rather than late Reference category

14. Leaving home Log-linear models for two-way tables4 models Model 1: Null model or overall effect model All categories are equiprobable (an observation is equally likely to fall into any cell) for all i and j Exp(4.887) = 132.5 = 530/4  = 4.887 s.e. 0.0434 ij is expected count (frequency) in cell (ij): category i of variable A (row) and category j of variable B (column)

15. Leaving home Where ij is a cell frequency generated by a Poisson process and Var[aX] = a2 Var[X] where a is a constant (e.g. Fingleton, 1984, p. 29) 

16. Leaving home Log-linear models for two-way tables Model 2: B null model Categories of variable B (sex) are equiprobable within levels of variable A (time) for all j GLIM estimate s.e. Parameter Exp(parameter) 4.649 0.06914 Overall effect 104.5 0.0000 TIME(1) 0.4291 0.08886 TIME(2) 1.536

17. Leaving home Log-linear models for two-way tables Model 2: B null model Categories of variable B (sex) are equiprobable within levels of variable A (time) for all j SPSS estimate s.e. Parameter Exp(parameter) 5.773 0.0558 Overall effect 321.5 -0.4283 0.0888 TIME(1) 0.5616 0.0000 TIME(2)

18. Leaving home Log-linear models for two-way tables Model 3: independence model (unsaturated model) Categories of variable B (sex) are not equiprobable but the probability is independent of levels of variable A (time) estimate s.e. Parameter Exp(parameter) 4.697 0.0806 Overall effect 109.62 0.4291 0.0889 TIME(2) 1.536 -0.09819 0.0870 SEX(2) 0.906 GLIM

19. Leaving home LOG-LINEAR MODEL: predictions Females leaving home early: 109.62 Females leaving home late: 109.62 * 1.536 = 168.37 Males leaving home early: 109.62 * 0.906 = 99.37 Males leaving home late: 109.62 * 1.536 * 0.906 = 152.63

20. Leaving home SPSS Parameter Estimate SE 1 5.0280 .0721 Overall effect 2 -.4291 .0889 Time(1) 3 .0000 . Time(2) 4 .0982 .0870 Sex(1) 5 .0000 . Sex (2)

21. Leaving home Log-linear models for two-way tables Model 4: saturated model The values of categories of variable B (sex) depend on levels of variable A (time) estimate s.e. parameter 4.905 0.08607 Overall effect 0.05757 0.1200 TIME(2) -0.6012 0.1446 SEX(2) 0.8201 0.1831 TIME(2).SEX(2) GLIM

22. Leaving home Parameter Estimate SE Parameter 1 5.1846 .0748 Overall effect 2 -.8738 .1379 Time(1) 3 .0000 . Time(2) 4 -.2183 .1121 Sex(1) 5 .0000 . Sex(2) 6 .8164 .1827 Time(1) * Sex(1) 7 .0000 . Time(1) * Sex(2) 8 .0000 . Time(2) * Sex(1) 9 .0000 . Time(2) * Sex(2) SPSS

23. Leaving home LOG-LINEAR MODEL: predictions Expected frequencies Observed Model 1 Model 2 Model 3 Model 4 Model 5 Fem_<20 F11 135 132.50 104.50 139.00 109.63 135.00 Mal_<20 F12 74 132.50 104.50 126.00 99.37 74.00 Fem_>20 F21 143 132.50 160.50 139.00 168.37 143.00 Mal_>20 F22 178 132.50 160.50 126.00 152.63 178.00 D:\s\1\liebr\2_2\2_2.wq2

24. Relation log-linear model and Poisson regression model are dummy variables (0 if i or j is equal to 1and1 if i or j equal to 2) and interaction variable is

25. Log-linear model fit a model to a table of frequencies Data: survey of political attitudes of British electors Source: Payne, C. (1977) The log-linear model for contingency. In: C.O. Muircheartaigh and C. Payne eds. The analysis of survey data. Vol 2: Model fitting, Wiley, New York, pp. 105-144 [data p. 106].(from Butler and Stokes, ‘Political change in Britain’, Macmillan, 2nd edidition, 1974)

26. The classical approach Geometric means (Birch, 1963) Effect coding (mean is ref. Cat.) Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25:220-233

27. Political attitudes The basic model Overall effect : 22.98/4 = 5.7456 Effect of party : Conservative : 11.49/2 - 5.7456 = 0.0018 Labour : 11.49/2 - 5.7456 = -0.0018 Effect of gender : Male : 11.44/2 - 5.7456 = -0.0229 Female : 11.54/2 - 5.7456 = 0.0229 Interaction effects: Gender-Party interaction effect Male conservative : 5.6312 - 5.7456 - 0.0018 + 0.0229 = -0.0933 Female conservative : 5.8636 - 5.7456 - 0.0018 - 0.0229 = 0.0933 Male labour : 5.8141 - 5.7456 + 0.0018 + 0.0229 = 0.0933 Female labour : 5.6733 - 5.7456 + 0.0018 - 0.0229 = -0.0933

28. Political attitudes The basic model Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25:220-233 Coding: effect coding Parameters are subject to constraints: normalisation constraints Only first-order contrasts can be estimated:

29. Political attitudes The basic model (GLIM) Estimate S.E.

30. Political attitudes The basic model (SPSS)

31. Political attitudes The basic model (1) ln 11 = 5.7456 + 0.0018 - 0.0229 - 0.0933 = 5.6312 ln 12 = 5.7456 + 0.0018 + 0.0229 + 0.0933 = 5.8636 ln 21 = 5.7456 - 0.0018 - 0.0229 + 0.0933 = 5.8142 ln 22 = 5.7456 - 0.0018 + 0.0229 - 0.0933 = 5.6734

32. The design-matrix approach

33. Design matrixunsaturated log-linear model  Number of parameters exceeds number of equations  need for additional equations (X’X)-1 is singular  identify linear dependencies

34. Design matrixunsaturated log-linear model (additional eq.) Coding!

35. 3 unknowns  3 equations where is the frequency predicted by the model

36. Political attitudes

37. Political attitudes  314.17*1.0040*0.9772 = 308.23 314.17*[1/1.0040]*0.9772 = 305.78

38. Design matrixSaturated log-linear model

39. Political attitudes exp[5.7456+0.0018-0.0229-0.0933] = exp[5.6312] = 279 exp[5.7456-0.0018-0.0229+0.0933] = 335

40. Political attitudes

41. Design matrix: other restrictions on parameterssaturated log-linear model (SPSS)

42. Political attitudes

43. Political attitudes

44. Political attitudes

45. Political attitudes

46. Political attitudes Prediction of counts or frequencies: A. Effect coding 279 = 312.80 * 0.97736 * 1.00185 * 0.91092 352 = 312.80 * 1.02316 * 1.00185 * 1.09779 335 = 312.80 * 0.97736 * 0.99815 * 1.09779 291 = 312.80 * 1.02316 * 0.99815 * 0.91092 • B. Contrast coding: GLIM • 291 = 279 * 1.2616 * 1.2007 * 0.6885 (females voting labour) • 279 = 279 * 1 * 1 * 1 (males voting conservative = ref.cat) • 352 = 279 * 1.2616 * 1 * 1 (females voting conservative) • 335 = 279 * 1 * 1.2007 * 1 (males voting labour) C. Contrast coding: SPSS (SPSS adds 0.5 to observed values ) 279.5 = 291.5 * 1.15096 * 1.20925 * 0.68894 352.5 = 291.5 * 1 * 1.20925 * 1 291.5 = 291.5 * 1 * 1 * 1 (females voting labour = ref.cat) 335.5 = 291.5 * 1.15096 * 1 * 1

47. The Poisson regression model

48. Political attitudes The Poisson probability model with