390 likes | 1.04k Vues
Analysis of time-stratified case-crossover studies in environmental epidemiology using Stata Aurelio Tobías Spanish Council for Scientific Research (CSIC), Barcelona, Spain Ben Armstrong and Antonio Gasparrini London School of Hygiene and Tropical Medicine, UK
 
                
                E N D
Analysis of time-stratified case-crossover studies in environmental epidemiology using Stata Aurelio Tobías Spanish Council for Scientific Research (CSIC), Barcelona, Spain Ben Armstrong and Antonio Gasparrini London School of Hygiene and Tropical Medicine, UK 2014 UK Stata Users Group meeting London, 12th September 2014
Background • The time stratified case crossover design is a popular alternative to conventional time series regression for analysing associations between time series of environmental exposures (air pollution, weather) and counts of health outcomes
The case-crossover design • Proposed by Maclure (1991) to study transient effects on the risk of acute health events Have you made any unusual activity … … compared to your usual routine? Health event
The case-crossover design • Proposed by Maclure (1991) to study transient effects on the risk of acute health events Exposed? Exposed? Health event Control period Risk period
The case-crossover design • Proposed by Maclure (1991) to study transient effects on the risk of acute health events • Analysis likewise a matched case-control Exposed? Exposed? Health event Control period Risk period Control Exp. No a c b d Risk period Exp. No OR = b/c
Application to environmental studies • Firstly adapted to air pollution studies • Philadelphia (Neas et al. 1999) and Barcelona (Sunyer et al. 2000) • comparing exposure levels for a given day (t) when health event occurs vs. levels before (t-7) and after (t+7) the health event • Allows to control for time-trend and seasonality by design, since compares exposure levels between same weekdays within each month of each year
Time-stratified case-crossover (Figueiras et al. 2010)
Option 1: conditional logistic regression • Dataset needs to be reshaped from time-series format to individual matched case-control by using the new command, • . mkcco vary, date(vardate) • This also creates three new variables, • casecase days (=1) • wnweight observations by n events on case day • ccsetset num. to match case-control days • Then, use Stata’s clogit command
. use madrid, clear . list date y pm10 temp in 1/31, noobs clean date y pm10 temp 01jan2003 64 14 9.6 02jan2003 56 12 11 03jan2003 73 16 9.8 04jan2003 69 14 8.8 05jan2003 71 17 4.3 06jan2003 68 9 7 07jan2003 63 19 3 08jan2003 85 13 6.8 09jan2003 67 13 4.6 10jan2003 80 22 2.7 11jan2003 65 23 1 12jan2003 95 17 .85 13jan2003 60 45 .5 14jan2003 76 60 .45 15jan2003 77 72 1.3 16jan2003 75 54 2.6 17jan2003 76 65 2 18jan2003 75 43 .8 19jan2003 74 12 6.3 20jan2003 79 19 5.8 21jan2003 73 16 8 22jan2003 72 21 6.7 23jan2003 72 25 7 24jan2003 74 46 6.2 25jan2003 59 42 8.1 26jan2003 77 26 10 27jan2003 64 22 15 28jan2003 65 41 9.7 29jan2003 74 18 5.7 30jan2003 77 17 4.8 31jan2003 55 17 2.3
. generate dow = dow(date) . list date y pm10 temp in 1/31 if dow==1, noobs clean date y pm10 temp 06jan2003 68 9 7 13jan2003 60 45 .5 20jan2003 79 19 5.8 27jan2003 64 22 15 . mkcco y, date(date) Dataset reshaped from 1096 to 5480 obs. (1096 case days and 4384 control days). New variables case, wn and ccset have been added to the resaphed dataset.
. list date y pm10 temp case ccset wn in 1/36 if dow==1, noobs clean date y pm10 temp case ccset wn 06jan2003 68 9 7 1 2003021 68 13jan2003 60 45 .5 0 2003021 68 20jan2003 79 19 5.8 0 2003021 68 27jan2003 64 22 14.8 0 2003021 68 ...
... date y pm10 temp case ccset wn 06jan2003 68 9 7 0 2003022 60 13jan2003 60 45 .5 1 2003022 60 20jan2003 79 19 5.8 0 2003022 60 27jan2003 64 22 14.8 0 2003022 60
... date y pm10 temp case ccset wn 06jan2003 68 9 7 0 2003023 79 13jan2003 60 45 .5 0 2003023 79 20jan2003 79 19 5.8 1 2003023 79 27jan2003 64 22 14.8 0 2003023 79
... date y pm10 temp case ccset wn 06jan2003 68 9 7 0 2003024 64 13jan2003 60 45 .5 0 2003024 64 20jan2003 79 19 5.8 0 2003024 64 27jan2003 64 22 14.8 1 2003024 64
. clogit case pm10 [w=wn], group(ccset) (frequency weights assumed) ... Conditional (fixed-effects) logistic regression Number of obs = 295025 LR chi2(1) = 8.39 Prob > chi2 = 0.0038 Log likelihood = -98913.41 Pseudo R2 = 0.0000 ------------------------------------------------------------------------------ case | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- pm10 | .0007406 .0002552 2.90 0.004 .0002404 .0012408 ------------------------------------------------------------------------------
Option 1: conditional logistic regression • Advantage • Similar approach to a matched case-control study, being familiar to the epidemiologist • Limitations • Unfriendly data management • Large (huge) data-sets at individual level • Slow convergence when fitting interaction terms for individual factors (sex, age) • Cannot account for overdispersion and residual autocorrelation
Option 2: Poisson regression • Lu and Zeger (2007) • “ … the time-stratified case-crossover design leads to the same estimate as obtained from a Poisson regression with dummy variables indicating the strata” • Strata defined by the 3-way interaction between year, month and day of week • Either poisson or glm (jointly with the scale option) Stata’s commands can be used
. use madrid, clear . generate yy = year(date) . generate mm = month(date) . generate dow = dow(date) . poisson ypm10 i.yy##i.mm#i.dow ... Poisson regression Number of obs = 1096 LR chi2(252) = 1374.12 Prob > chi2 = 0.0000 Log likelihood = -3787.589 Pseudo R2 = 0.1535 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- pm10 | .0007406 .0002552 2.90 0.004 .0002404 .0012408 | ... 252 parameters! | _cons | 4.35927 .0563535 77.36 0.000 4.248819 4.469721 ------------------------------------------------------------------------------
. glm y pm10 i.yy##i.mm#i.dow, family(poisson) link(log) scale(x2) ... Generalized linear models No. of obs = 1096 Optimization : ML Residual df = 843 Scale parameter = 1 Deviance = 1069.73306 (1/df) Deviance = 1.26896 Pearson = 1065.616046 (1/df) Pearson = 1.264076 Variance function: V(u) = u [Poisson] Link function : g(u) = ln(u) [Log] AIC = 7.373338 Log likelihood = -3787.588997 BIC = -4830.78 ------------------------------------------------------------------------------ | OIM y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- pm10 | .0007406 .0002869 2.58 0.010 .0001782 .001303 | ... | _cons | 4.35927 .0633589 68.80 0.000 4.235088 4.483451 ------------------------------------------------------------------------------ (Standard errors scaled using square root of Pearson X2-based dispersion.)
Option 2: Poisson regression • Advantages • Avoids reshape of dataset keeping original time-series format • Can account for overdispersion and residual autocorrelation • Limitations • Large number of interaction parameters, slowing the convergence time (even making difficult it with missing data) • Even slower when fitting interaction terms for individual factors (sex, age)
Option 3: Conditional Poisson regression • Armstrong and Gasparrini (ISEE 2011) • “ … Poisson models with large numbers of indicator variables can alternatively be fit with conditional Poisson models, conditioning on numbers of events in the time stratum” • Firstly proposed by Farrington (1995) for the self-controlled case-series design for vaccine safety studies • Strata defined by grouping same weekday within each month of each year, and then use Stata’s xtpoisson command jointly with the fe option • But overdispersion needs to accounted for with the newxtpois_ovdp command
. egen set = group(yy mm dow) . xtset set date ... . xtpoisson y pm10, fe ... Conditional fixed-effects Poisson regression Number of obs = 1096 Group variable: set Number of groups = 252 Obs per group: min = 4 avg = 4.3 max = 5 Wald chi2(1) = 8.42 Log likelihood = -2854.4979 Prob > chi2 = 0.0037 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- pm10 | .0007406 .0002552 2.90 0.004 .0002404 .0012408 ------------------------------------------------------------------------------ . xtpois_ovdp Estimate and standard errors corrected for over-dipersion df: 843 ; pearson x2: 1065.6 ; dispersion: 1.26 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- pm10 | .0007406 .0002869 2.58 0.010 .0001782 .001303 ------------------------------------------------------------------------------
CPU time • My example (3 years data, 60 events/day) • clogit 0.181 sec. • poisson 2.111 sec. • xtpoisson 0.066 sec.
Option 3: Conditional Poisson regression • Advantages • Avoids reshape of dataset keeping original time-series format • Can account for overdispersion and residual autocorrelation • Easy fit of interaction terms for individual factors (sex, age) with faster convergence • Limitations • ?
Summary • Case-crossover studies can easily be analysed using Stata • Conditional logistic regression requires unfriendly data management, and large (huge) data-sets at individual level • Poisson, 3-way interaction, regression requires to fit large number of interaction parameters, making it difficult convergence • Conditional Poisson seems to solve both