230 likes | 411 Vues
9.7 Poisson regressions for rates. In Section 4.3 we introduced Poisson regression for modeling counts. When outcomes occur over time, space, or some other index of size, it is more relevant to model their rate of occurrence than their raw number.
E N D
9.7 Poisson regressions for rates • In Section 4.3 we introduced Poisson regression for modeling counts. When outcomes occur over time, space, or some other index of size, it is more relevant to model their rate of occurrence than their raw number. • We use GLM with log link, Poisson distribution, log(index) as offset
9.7.1 Analyzing Rates Using Loglinear Models with Offsets • When a response count nihas index equal to ti, the sample rate is ni/ti. Its expected value is µi/ti. • With an explanatory variable x, a loglinear model for the expected rate has form • This model has equivalent representation • The adjustment term, -log ti , to the log link of the mean is called an offset. The fit correspond to using log tias a predictor on the right-hand side and forcing its coefficient to equal 1.0.
Thenis proportional to the index, with proportionality constant depending on the value of x. • Another model is to use identity link, it is less useful as the fitting process may fail because the negative fitted value • However, the log link may also possibly cause the fitted probability >1.
9.7.2 Modeling Death Rates for Heart Valve Operations • Laird and Olivier (1981) analyzed patient survival after heart valve replacement operations. • A sample of 109 patients were classified by type of heart valve (aortic, mitral) and by age (<55, >55). • Follow-up observations occurred until the patient died or the study ended. • Operations occurred throughout the study period, and follow-up observations covered lengths of time varying from 3 to 97 months. • Response: death and corresponding follow up time
The time at risk for a subject is their follow-up time of observation. • For a given age and valve type, the total time at risk is the sum of the times at risk for all subjects in that cell (those who died and those censored).
We now model effects of age and valve type on the rate.where a – age, v – type of valve. • Or identity link
SAS code data table9_11; input age $ vtype $ death totaltime; logtime=log(totaltime); cards; <55 aortic 4 1259 <55 mitral 1 2082 55+ aortic 7 1417 55+ mitral 9 1647 ;
Model fit procgenmoddata=table9_11; class age vtype; model death = age vtype/ dist = poi link = log offset=logtime lrcitype3obstats; procgenmoddata=table9_11; class age vtype; model death = age / dist = poi link = log offset=logtime lrcitype3obstats; procgenmoddata=table9_11; class age vtype; model death = vtype/ dist = poi link = log offset=logtime lrcitype3obstats; /*identity link*/ procgenmoddata=table9_11; class age vtype; model death/totaltime = age vtype/ dist = poi link = identity lrcitype3obstats; odsoutput obstats=obstats Modelfit=Modelfit; run;
It is an estimated difference in death rates between the older and younger age groups for each valve type.
Another example • 2004 birth vital statistics merged to death data in Florida • The predictors: smoking, drinking, education, marital status, Medicaid. • The response: infant death • Purpose: to indentify the maternal characteristics of Medicaid beneficiaries that are significantly associated with infant death so that health care and related services can be focused on risk factors that contribute to the adverse outcome
/*raw table*/ procsql; createtable rawtable as select'smoking'as varlabel, smoking as varlevel, sum(total) as totalsumple, sum(infdth) as totalinfdth from birth2004 groupby smoking unionselect'drk'as varlabel, drk as varlevel, sum(total) as totalsumple, sum(infdth) as totalinfdth from birth2004 groupby drk unionselect'edu 'as varlabel, edu as varlevel, sum(total) as totalsumple, sum(infdth) as totalinfdth from birth2004 groupby edu unionselect'ms 'as varlabel, ms as varlevel, sum(total) as totalsumple, sum(infdth) as totalinfdth from birth2004 groupby ms unionselect'med 'as varlabel, med as varlevel, sum(total) as totalsumple, sum(infdth) as totalinfdth from birth2004 groupby med; data rawtable; set rawtable; percentage=totalinfdth/totalsumple*100; procprint; run;
/*backward model selection starting from main+2fis*/ procgenmoddata=birth2004; class smoking drk edu ms med; model infdth = smoking drk edu ms med smoking*drk smoking*edu smoking*ms smoking*med drk*edu drk*ms drk*med edu*ms edu*med ms*med / dist = poi link = log offset=logtotal lrcitype3; odsoutput type3=type3; run; procsortdata=type3; by ProbChiSq; run; procprintdata=type3; run;
Main effects + 2 factor-interactions • It is not lack of fit • Model might be too complicated
Backward model selection • Sort the Type 3 table by p-value, delete drk*ms
Continue the backward procedure, but keep the main effect even it is not significant but it is included in an interaction • deleting in order • smoking*med • drk*med • smoking*edu • edu*med • edu*ms • smoking*drk • smoking*ms • drk*edu • drk
Final model proc genmod data=birth2004; class smoking drk edu ms med; model infdth = smoking edu ms med ms*med / dist = poi link = log offset=logtotal lrci type3; ods output type3=type3; run; proc sort data=type3; by ProbChiSq; run; proc print data=type3; run;