180 likes | 224 Vues
This analysis provides detailed statistical methods and results for a study on endometrial cancer, contrasting correct and incorrect analyses. Significant findings and methodological considerations are outlined thoroughly.
 
                
                E N D
Matched designs Need Matched analysis
Incorrect unmatched analysis • . cc cc exp,exact • Proportion • | Exposed Unexposed | Total Exposed • -----------------+------------------------+---------------------- • Cases | 106 515 | 621 0.1707 • Controls | 95 526 | 621 0.1530 • -----------------+------------------------+---------------------- • Total | 201 1041 | 1242 0.1618 • | | • | Point estimate | [95% Conf. Interval] • |------------------------+---------------------- • Odds ratio | 1.139622 | .8327338 1.560771 (exact) • Attr. frac. ex. | .122516 | -.200864 .359291 (exact) • Attr. frac. pop | .0209125 | • +----------------------------------------------- • 1-sided Fisher's exact P = 0.2205 • 2-sided Fisher's exact P = 0.4411 • This analysis ignores that a matching control was found for each case. Notice that the ‘sample size’ looks to be 1242 and yet nevertheless there is no evidence of a disease-exposure relationship.
Correct classical analysis • . reshape wide cc exp,i(pair) j(ct) • (note: j = 1 2) • Data long -> wide • ----------------------------------------------------------------------------- • Number of obs. 1242 -> 621 • Number of variables 4 -> 5 • j variable (2 values) ct -> (dropped) • xij variables: • cc -> cc1 cc2 • exp -> exp1 exp2 • ----------------------------------------------------------------------------- • . mcc exp2 exp1 • | Controls | • Cases | Exposed Unexposed | Total • -----------------+------------------------+---------- • Exposed | 90 16 | 106 • Unexposed | 5 510 | 515 • -----------------+------------------------+---------- • Total | 95 526 | 621 • McNemar's chi2(1) = 5.76 Prob > chi2 = 0.0164 • Exact McNemar significance probability = 0.0266 • odds ratio 3.2 1.120172 11.16902 (exact). • The ‘sample size’ is 21! But the p-value is less than 5% and the estimated odds ratio is very different from the incorrect analysis
Exact p-value is just the binomial • . bitesti 21 16 0.5 • N Observed k Expected k Assumed p Observed p • ------------------------------------------------------------ • 21 16 10.5 0.50000 0.76190 • Pr(k >= 16) = 0.013302 (one-sided test) • Pr(k <= 16) = 0.996401 (one-sided test) • Pr(k <= 5 or k >= 16) = 0.026604 (two-sided test)
Conditional logistic regression version of the correct classical analysis • . clogit exp cc,group(pair) • note: multiple positive outcomes within groups encountered. • note: 600 groups (1200 obs) dropped due to all positive or • all negative outcomes. • Conditional (fixed-effects) logistic regression Number of obs = 42 • LR chi2(1) = 6.06 • ------------------------------------------------------------------------------ • exp | Coef. Std. Err. z P>|z| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • cc | 1.163151 .5123475 2.27 0.023 .1589681 2.167334 • ------------------------------------------------------------------------------ • . clogit exp cc,group(pair) or • note: multiple positive outcomes within groups encountered. • note: 600 groups (1200 obs) dropped due to all positive or • all negative outcomes. • Conditional (fixed-effects) logistic regression Number of obs = 42 • LR chi2(1) = 6.06 • ------------------------------------------------------------------------------ • exp | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • cc | 3.2 1.639512 2.27 0.023 1.172301 8.734961 • P-values / CIs are based on the normal approximation to the binomial. • 600 concordant pairs are correctly ‘dropped’
4 matching controls per case • LA Study of endometrial cancer • use "C:\Mdsc643.02\la.dta", clear • (LA Study of Endometrial Cancer) • . desc • Contains data from C:\Mdsc643.02\la.dta • obs: 315 LA Study of Endometrial Cancer • vars: 11 24 Nov 1997 15:43 • size: 15,120 (98.6% of memory free) (_dta has notes) • ------------------------------------------------------------------------------- • storage display value • variable name type format label variable label • ------------------------------------------------------------------------------- • row float %9.0g • age float %9.0g Age (yr) • gbd float %9.0g yn Gall Bladder Disease • hyp float %9.0g yn Hyertension • obe float %9.0g yn Obesity • est float %9.0g yn Estrogen (Any) Use • conj float %9.0g cl Conjugated Dose • dur float %9.0g Estrogen Duration (mo) • ned float %9.0g yn Non Estrogen Drug • cc float %9.0g ccl Case/Control • quint float %9.0g 4 Controls: 1 Case • ------------------------------------------------------------------------------- • Sorted by: quint
Incorrect analysis • . cc cc est,exact • Proportion • | Exposed Unexposed | Total Exposed • -----------------+------------------------+---------------------- • Cases | 56 7 | 63 0.8889 • Controls | 127 125 | 252 0.5040 • -----------------+------------------------+---------------------- • Total | 183 132 | 315 0.5810 • | | • | Point estimate | [95% Conf. Interval] • |------------------------+---------------------- • Odds ratio | 7.874016 | 3.38601 21.15736 (exact) • Attr. frac. ex. | .873 | .7046671 .9527351 (exact) • Attr. frac. pop | .776 | • +----------------------------------------------- • 1-sided Fisher's exact P = 0.0000 • 2-sided Fisher's exact P = 0.0000
Classical analysis • . drop row • . sort quint cc • . by quint: gen otf=_n • . reshape wide cc age gbd hyp obe est conj dur ned , i(quint) j(otf) • (note: j = 1 2 3 4 5) • Data long -> wide • ----------------------------------------------------------------------------- • Number of obs. 315 -> 63 • Number of variables 11 -> 46 • j variable (5 values) otf -> (dropped) • xij variables: • cc -> cc1 cc2 ... cc5 • age -> age1 age2 ... age5 • gbd -> gbd1 gbd2 ... gbd5 • hyp -> hyp1 hyp2 ... hyp5 • obe -> obe1 obe2 ... obe5 • est -> est1 est2 ... est5 • conj -> conj1 conj2 ... conj5 • dur -> dur1 dur2 ... dur5 • ned -> ned1 ned2 ... ned5 • -----------------------------------------------------------------------------
A new table • . gen sumcon=est1+est2+est3+est4 • . gen sumcas=est5 • . table sumcas sumcon • ---------------------------------------- • | sumcon • sumcas | 0 1 2 3 4 • ----------+----------------------------- • 0 | 4 1 11 • 1 | 317 1615 5 • ---------------------------------------- • There are 5 concordant pairs. • Exact p-values based on Binomial p= 1/5, 2/5, 3/5 and 4/5
Components to the p-value • . bitesti 7 3 .2 • N Observed k Expected k Assumed p Observed p • ------------------------------------------------------------ • 7 3 1.4 0.20000 0.42857 • Pr(k >= 3) = 0.148032 (one-sided test) • Pr(k <= 3) = 0.966656 (one-sided test) • Pr(k >= 3) = 0.148032 (two-sided test) • note: lower tail of two-sided p-value is empty • . bitesti 18 17 .4 • N Observed k Expected k Assumed p Observed p • ------------------------------------------------------------ • 18 17 7.2 0.40000 0.94444 • Pr(k >= 17) = 0.000002 (one-sided test) • Pr(k <= 17) = 1.000000 (one-sided test) • Pr(k >= 17) = 0.000002 (two-sided test) • note: lower tail of two-sided p-value is empty • return list • scalars: • r(p) = 1.92414534861e-06
Next 2 p-values • . bitesti 17 16 .6 • N Observed k Expected k Assumed p Observed p • ------------------------------------------------------------ • 17 16 10.2 0.60000 0.94118 • Pr(k >= 16) = 0.002088 (one-sided test) • Pr(k <= 16) = 0.999831 (one-sided test) • Pr(k <= 3 or k >= 16) = 0.002539 (two-sided test) • . bitesti 16 15 .8 • N Observed k Expected k Assumed p Observed p • ------------------------------------------------------------ • 16 15 12.8 0.80000 0.93750 • Pr(k >= 15) = 0.140737 (one-sided test) • Pr(k <= 15) = 0.971853 (one-sided test) • Pr(k <= 10 or k >= 15) = 0.222425 (two-sided test)
Correct p-value • TITLE • STB-49 sbe28. Meta-analysis of p values. • DESCRIPTION/AUTHOR(S) • STB insert by Aurelio Tobias, Statistical Consultant, Madrid, Spain. • Support: bledatobias@ctv.es • After installation, see help metap. • INSTALLATION FILES (click here to install) • sbe28/metap.ado • sbe28/metap.hlp • ANCILLARY FILES (click here to get) • sbe28/fleiss.dta
Using the STB ado file • . input pvar • pvar • 1. 0.148032 • 2. 1.92414534861e-06 • 3. 0.002539 • 4. 0.222425 • 5. end • . metap pvar • Meta-analysis of p_values • ------------------------------------------------------------ • Method | chi2 p_value studies • --------------------+--------------------------------------- • Fisher | 45.101012 3.521e-07 4 • ------------------------------------------------------------
Conditional logistic version • . clogit est cc,group(quint) • note: multiple positive outcomes within groups encountered. • note: 5 groups (25 obs) dropped due to all positive or • all negative outcomes. • Conditional (fixed-effects) logistic regression Number of obs = 290 • LR chi2(1) = 35.35 • ------------------------------------------------------------------------------ • est | Coef. Std. Err. z P>|z| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • cc | 2.07376 .4208244 4.93 0.000 1.248959 2.898561 • ------------------------------------------------------------------------------ • . clogit est cc,group(quint) or • note: multiple positive outcomes within groups encountered. • note: 5 groups (25 obs) dropped due to all positive or • all negative outcomes. • Conditional (fixed-effects) logistic regression Number of obs = 290 • LR chi2(1) = 35.35 • Prob > chi2 = 0.0000 • Log likelihood = -99.934552 Pseudo R2 = 0.1503 • ------------------------------------------------------------------------------ • est | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • cc | 7.954675 3.347522 4.93 0.000 3.486712 18.148 • ------------------------------------------------------------------------------
Reversing Case/Control and Exposure • . clogit cc est,group(quint) • Conditional (fixed-effects) logistic regression Number of obs = 315 • LR chi2(1) = 35.35 • Prob > chi2 = 0.0000 • Log likelihood = -83.72159 Pseudo R2 = 0.1743 • ------------------------------------------------------------------------------b • cc | Coef. Std. Err. z P>|z| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • est | 2.073761 .4208245 4.93 0.000 1.24896 2.898562 • ------------------------------------------------------------------------------ • . clogit cc est,group(quint) or • Conditional (fixed-effects) logistic regression Number of obs = 315 • LR chi2(1) = 35.35 • Prob > chi2 = 0.0000 • Log likelihood = -83.72159 Pseudo R2 = 0.1743 • ------------------------------------------------------------------------------ • cc | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • est | 7.954681 3.347525 4.93 0.000 3.486714 18.14802 • ------------------------------------------------------------------------------ • .
Assessment of potential confounder • . clogit est hyp cc,group(quint) or • note: multiple positive outcomes within groups encountered. • note: 5 groups (25 obs) dropped due to all positive or • all negative outcomes. • Iteration 0: log likelihood = -93.816541 • Iteration 1: log likelihood = -93.775297 • Iteration 2: log likelihood = -93.775233 • Iteration 3: log likelihood = -93.775233 • Conditional (fixed-effects) logistic regression Number of obs = 290 • LR chi2(2) = 47.66 • Prob > chi2 = 0.0000 • Log likelihood = -93.775233 Pseudo R2 = 0.2026 • ------------------------------------------------------------------------------ • est | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • hyp | 3.175014 1.099597 3.34 0.001 1.610462 6.259518 • cc | 7.423919 3.149142 4.73 0.000 3.232684 17.04917 • ------------------------------------------------------------------------------
Assessment of age as a potential modifier (even though age was a part of the matching criteria) • gen ac=age*cc • . clogit est hyp cc ac,group(quint) or • note: multiple positive outcomes within groups encountered. • note: 5 groups (25 obs) dropped due to all positive or • all negative outcomes. • Iteration 0: log likelihood = -94.0779 • Iteration 1: log likelihood = -93.779779 • Iteration 2: log likelihood = -93.774339 • Iteration 3: log likelihood = -93.774338 • Conditional (fixed-effects) logistic regression Number of obs = 290 • LR chi2(3) = 47.67 • Prob > chi2 = 0.0000 • Log likelihood = -93.774338 Pseudo R2 = 0.2027 • ------------------------------------------------------------------------------ • est | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • hyp | 3.173201 1.100016 3.33 0.001 1.608502 6.259989 • cc | 6.085438 28.69545 0.38 0.702 .0005895 62816.24 • ac | 1.002775 .0656825 0.04 0.966 .8819609 1.140139 • ------------------------------------------------------------------------------
Notice… • …that age*cc is included in the model even though age is not included. • This is a special case where we CAN interpret a model with an interaction term even though one of the constituents of this interaction is not included in the model