Create Presentation
Download Presentation

Download Presentation
## Analysis of count data

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Analysis of count data**Introduction to log-linear models**Log-linear analysis**• Contingency-table analysis • Categorical data analysis • Discrete multivariate analysis (Bishop, Fienberg and Holland, 1975) • Analysis of cross-classified data • Multivariate analysis of qualitative data (Goodman, 1978) • Count data analysis**Log-linear model fit a model to a table of counts /**frequencies Two data sets: Survey: political attitudes of British electors Survey: leaving parental home in the Netherlands**Survey: political attitudes of British electors**Source: Payne, C. (1977) The log-linear model for contingency. In: C.O. Muircheartaigh and C. Payne eds. The analysis of survey data. Vol 2: Model fitting, Wiley, New York, pp. 105-144 [data p. 106].(from Butler and Stokes, ‘Political change in Britain’, Macmillan, 2nd edidition, 1974)**Counts are generated by Poisson process Poisson**distribution**The Poisson probability model**Let N be a random variable representing the number of events during a unit interval and let n be a realisation of n (COUNT): N is a Poisson r.v. following a Poisson distribution with parameter : The parameter is the expected number of events per unit time interval: = E[N]**Likelihood function**Probability mass function: Log-likelihood function: Likelihood equations to determine ‘best’ value of **Likelihood equations**Hence: Hence: Var(N) = **Let i represent an individual with characteristics xi**The probability of observing ni events during a unit interval is: with or Log-linear model**The log-linear model**The objective of log-linear analysis is to determine if the distribution of counts among the cells of a table can be explained by a simpler, underlying structure. Log-linear models specify different structures in terms of the cross-classified variables (rows, columns and layers of the table).**Log-linear models for two-way tables**Saturated log-linear model: Overall effect (level) Main effects (marginal freq.) Interaction effect In case of 2 x 2 table: 4 observations 9 parameters Normalisation constraints**Leaving home**Descriptive statistics • Counts • Percentages • Odds of leaving home early rather than late Reference category**Leaving home**Log-linear models for two-way tables4 models Model 1: Null model or overall effect model All categories are equiprobable (an observation is equally likely to fall into any cell) for all i and j Exp(4.887) = 132.5 = 530/4 = 4.887 s.e. 0.0434 ij is expected count (frequency) in cell (ij): category i of variable A (row) and category j of variable B (column)**Leaving home**Where ij is a cell frequency generated by a Poisson process and Var[aX] = a2 Var[X] where a is a constant (e.g. Fingleton, 1984, p. 29) **Leaving home**Log-linear models for two-way tables Model 2: B null model Categories of variable B (sex) are equiprobable within levels of variable A (time) for all j GLIM estimate s.e. Parameter Exp(parameter) 4.649 0.06914 Overall effect 104.5 0.0000 TIME(1) 0.4291 0.08886 TIME(2) 1.536**Leaving home**Log-linear models for two-way tables Model 2: B null model Categories of variable B (sex) are equiprobable within levels of variable A (time) for all j SPSS estimate s.e. Parameter Exp(parameter) 5.773 0.0558 Overall effect 321.5 -0.4283 0.0888 TIME(1) 0.5616 0.0000 TIME(2)**Leaving home**Log-linear models for two-way tables Model 3: independence model (unsaturated model) Categories of variable B (sex) are not equiprobable but the probability is independent of levels of variable A (time) estimate s.e. Parameter Exp(parameter) 4.697 0.0806 Overall effect 109.62 0.4291 0.0889 TIME(2) 1.536 -0.09819 0.0870 SEX(2) 0.906 GLIM**Leaving home**LOG-LINEAR MODEL: predictions Females leaving home early: 109.62 Females leaving home late: 109.62 * 1.536 = 168.37 Males leaving home early: 109.62 * 0.906 = 99.37 Males leaving home late: 109.62 * 1.536 * 0.906 = 152.63**Leaving home**SPSS Parameter Estimate SE 1 5.0280 .0721 Overall effect 2 -.4291 .0889 Time(1) 3 .0000 . Time(2) 4 .0982 .0870 Sex(1) 5 .0000 . Sex (2)**Leaving home**Log-linear models for two-way tables Model 4: saturated model The values of categories of variable B (sex) depend on levels of variable A (time) estimate s.e. parameter 4.905 0.08607 Overall effect 0.05757 0.1200 TIME(2) -0.6012 0.1446 SEX(2) 0.8201 0.1831 TIME(2).SEX(2) GLIM**Leaving home**Parameter Estimate SE Parameter 1 5.1846 .0748 Overall effect 2 -.8738 .1379 Time(1) 3 .0000 . Time(2) 4 -.2183 .1121 Sex(1) 5 .0000 . Sex(2) 6 .8164 .1827 Time(1) * Sex(1) 7 .0000 . Time(1) * Sex(2) 8 .0000 . Time(2) * Sex(1) 9 .0000 . Time(2) * Sex(2) SPSS**Leaving home**LOG-LINEAR MODEL: predictions Expected frequencies Observed Model 1 Model 2 Model 3 Model 4 Model 5 Fem_<20 F11 135 132.50 104.50 139.00 109.63 135.00 Mal_<20 F12 74 132.50 104.50 126.00 99.37 74.00 Fem_>20 F21 143 132.50 160.50 139.00 168.37 143.00 Mal_>20 F22 178 132.50 160.50 126.00 152.63 178.00 D:\s\1\liebr\2_2\2_2.wq2**Relation log-linear model and Poisson regression model**are dummy variables (0 if i or j is equal to 1and1 if i or j equal to 2) and interaction variable is**Log-linear model fit a model to a table of frequencies**Data: survey of political attitudes of British electors Source: Payne, C. (1977) The log-linear model for contingency. In: C.O. Muircheartaigh and C. Payne eds. The analysis of survey data. Vol 2: Model fitting, Wiley, New York, pp. 105-144 [data p. 106].(from Butler and Stokes, ‘Political change in Britain’, Macmillan, 2nd edidition, 1974)**The classical approach**Geometric means (Birch, 1963) Effect coding (mean is ref. Cat.) Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25:220-233**Political attitudes**The basic model Overall effect : 22.98/4 = 5.7456 Effect of party : Conservative : 11.49/2 - 5.7456 = 0.0018 Labour : 11.49/2 - 5.7456 = -0.0018 Effect of gender : Male : 11.44/2 - 5.7456 = -0.0229 Female : 11.54/2 - 5.7456 = 0.0229 Interaction effects: Gender-Party interaction effect Male conservative : 5.6312 - 5.7456 - 0.0018 + 0.0229 = -0.0933 Female conservative : 5.8636 - 5.7456 - 0.0018 - 0.0229 = 0.0933 Male labour : 5.8141 - 5.7456 + 0.0018 + 0.0229 = 0.0933 Female labour : 5.6733 - 5.7456 + 0.0018 - 0.0229 = -0.0933**Political attitudes**The basic model Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25:220-233 Coding: effect coding Parameters are subject to constraints: normalisation constraints Only first-order contrasts can be estimated:**Political attitudes**The basic model (GLIM) Estimate S.E.**Political attitudes**The basic model (SPSS)**Political attitudes**The basic model (1) ln 11 = 5.7456 + 0.0018 - 0.0229 - 0.0933 = 5.6312 ln 12 = 5.7456 + 0.0018 + 0.0229 + 0.0933 = 5.8636 ln 21 = 5.7456 - 0.0018 - 0.0229 + 0.0933 = 5.8142 ln 22 = 5.7456 - 0.0018 + 0.0229 - 0.0933 = 5.6734**Design matrixunsaturated log-linear model** Number of parameters exceeds number of equations need for additional equations (X’X)-1 is singular identify linear dependencies**Design matrixunsaturated log-linear model**(additional eq.) Coding!**3 unknowns 3 equations**where is the frequency predicted by the model**Political attitudes** 314.17*1.0040*0.9772 = 308.23 314.17*[1/1.0040]*0.9772 = 305.78**Political attitudes**exp[5.7456+0.0018-0.0229-0.0933] = exp[5.6312] = 279 exp[5.7456-0.0018-0.0229+0.0933] = 335**Design matrix: other restrictions on parameterssaturated**log-linear model (SPSS)**Political attitudes**Prediction of counts or frequencies: A. Effect coding 279 = 312.80 * 0.97736 * 1.00185 * 0.91092 352 = 312.80 * 1.02316 * 1.00185 * 1.09779 335 = 312.80 * 0.97736 * 0.99815 * 1.09779 291 = 312.80 * 1.02316 * 0.99815 * 0.91092 • B. Contrast coding: GLIM • 291 = 279 * 1.2616 * 1.2007 * 0.6885 (females voting labour) • 279 = 279 * 1 * 1 * 1 (males voting conservative = ref.cat) • 352 = 279 * 1.2616 * 1 * 1 (females voting conservative) • 335 = 279 * 1 * 1.2007 * 1 (males voting labour) C. Contrast coding: SPSS (SPSS adds 0.5 to observed values ) 279.5 = 291.5 * 1.15096 * 1.20925 * 0.68894 352.5 = 291.5 * 1 * 1.20925 * 1 291.5 = 291.5 * 1 * 1 * 1 (females voting labour = ref.cat) 335.5 = 291.5 * 1.15096 * 1 * 1**Political attitudes**The Poisson probability model with