Data: Crab mating patterns

Data: Challenger (Binomial with random effects) Data: Crab mating patterns (Poisson Regression, ZIP model, Negative Binomial) Flu Data: (Binomial with random effects) Data: Typists (Poisson with random effects)

D.A.D. Mixed (not generalized) Models: Fixed Effects and Random Effects

D.A.D. SAS Global Forum 2010 “Generalized”  non normal distribution Binary for probabilities: Y=0 or 1 Mean p Pr{Y=j}= pj(1-p)(1-j) Link: L=ln(p/(1-p)) = “Logit” Range (over all L): 0<p<1 Poisson for counts: Y in {0,1,2,3,4, ….} Mean count l Pr{Y=j} = exp(- l )(lj)/(j!) Link: L = log(l) Range (over all L): l >0

D.A.D. SAS Global Forum 2010 Generalized (not mixed) linear models. Use link L = g(E{Y}), e.g. ln(p/(1-p)) =ln(E{Y}/(1-E{Y}) Assume L is linear model in the inputs with fixed effects. Estimate model for L, e.g. L=g(E{Y})=bo + b1 X Use maximum likelihood Example: L = -1 + .18*dose Dose = 10, L=0.8, p=exp(0.8)/(1+exp(0.8))= inverse link = 0.86

D.A.D. SAS Global Forum 2010 Challenger was mission 24 From 23 previous launches we have: 6 O-rings per mission Y=0 no damage, Y=1 erosion or blowby p = Pr {Y=1} = f{mission, launch temperature) Features: Random mission effects Logistic link for p procglimmix data=O_ring; class mission; model fail = temp/dist=binomial s; random mission; run; • Generalized • Mixed

Demo O_rings.sas

D.A.D. SAS Global Forum 2010 Estimated G matrix is not positive definite. Covariance Parameter Estimates Cov Standard Parm Estimate Error mission 2.25E-18 . Solutions for Fixed Effects Effect Estimate Error DF t Value Pr > |t| Intercept 5.0850 3.0525 21 1.67 0.1106 temp -0.1156 0.04702 115 -2.46 0.0154

Just logistic regression – no mission variance component

D.A.D. SAS Global Forum 2010 Flu Data CDC Active Flu Virus Weekly Data % positive data FLU; input fluseasn year t week pos specimens; pct_pos=100*pos/specimens; logit=log(pct_pos/100/(1+(pct_pos/100))); label pos = "# positive specimens"; label pct_pos="% positive specimens"; label t = "Week into flu season (first = week 40)"; label week = "Actual week of year"; label fluseasn = "Year flu season started"; logit pct. pos.

Demo Get_Flu.sas

“Sinusoids” S(j) = sin(2pjt/52) C(j)=cos(2pjt/52) PROCGLM DATA=FLU; class fluseasn; model logit = s1 c1 fluseasn*s1 fluseasn*c1 fluseasn*s2 fluseasn*c2 fluseasn*s3 fluseasn*c3 fluseasn*s4 fluseasn*c4; output out=out1 p=p; data out1; set out1; P_hat = exp(p)/(1+exp(p)); label P_hat = "Pr{pos. sample} (est.)"; run; (1) GLM all effects fixed (harmonic main effects insignificant) Logit scale

Demo Flu_GLM.sas

PROCMIXED DATA=FLU method=ml; ** reduced model; class fluseasn; model logit = s1 c1 /outp=outp outpm=outpm ddfm=kr; random intercept/subject=fluseasn; random s1 c1/subject=fluseasn type=toep(1); random s2 c2/subject=fluseasn type=toep(1); random s3 c3/subject=fluseasn type=toep(1); random s4 c4/subject=fluseasn type=toep(1); run; (2) MIXED analysis on logits Random harmonics. Normality assumed Logit scale Probability scale

Demo Flu_MIXED.sas

PROCGLIMMIX DATA=FLU; title2 "GLIMMIX Analysis"; class fluseasn; model pos/specimens = s1 c1 ; * s2 c2 s3 c3 s4 c4; random intercept/subject=fluseasn; random s1 c1/subject=fluseasn type=toep(1); random s2 c2/subject=fluseasn; ** Toep(1) - no converge; random s3 c3/subject=fluseasn type=toep(1); random s4 c4/subject=fluseasn type=toep(1); random _residual_; output out=out2 pred(ilink blup)=pblup pred(ilink noblup) overallpearson = p_resid; run; (3) GLIMMIX analysis Random harmonics. Binomial assumed (overdispersed – lab effects?) Mean – no BLUPs

Demo Flu_GLIMMIX.sas

D.A.D. SAS Global Forum 2010 Flu data Binomial random _residual_ does not affect the fit (just standard errors) Could try Beta distribution instead: PROCGLIMMIX DATA=FLU; title2 "GLIMMIX Analysis"; class fluseasn; model f = s1 c1 /dist=beta link=logit s; random intercept/subject=fluseasn; random s1 c1/subject=fluseasn type=toep(1); random s2 c2/subject=fluseasn type=toep(1); random s3 c3/subject=fluseasn type=toep(1); random s4 c4/subject=fluseasn type=toep(1); output out=out3 pred(ilink blup)=pblup pred(ilink noblup)=overall pearson=p_residbeta; run;

D.A.D. SAS Global Forum 2010 Horseshoe Crab study (reference: SAS GLIMMIX course notes): Female nests have “satellite” males Count data – Poisson?Generalized Linear Features (predictors): Carapace Width, Weight, Color, Spine condition Random Effect: SiteMixed Model

Demo Get_Horseshoe.sas

procglimmix data=crab; class site; model satellites = weight width / dist=poi solution ddfm=kr; random int / subject=site; output out=overdisp pearson=pearson; run; procmeans data=overdisp n mean var; var pearson; run; procunivariate data=crab normal plot; var satellites; run; N Mean Variance --------------------------- 173 -0.0258264 2.6737114 --------------------------- Histogram # Boxplot 15.5+* 1 0 .* 1 0 . 12.5+* 1 | .* 1 | .** 3 | 9.5+** 3 | .*** 6 | .** 4 | 6.5+******* 13 | .******** 15 +-----+ .********** 19 | | 3.5+********** 19 | | .***** 9 *--+--* .******** 16 | | 0.5+******************************* 62 +-----+ ----+----+----+----+----+----+- * may represent up to 2 counts Fit Statistics Gener. Chi-Square / DF 2.77 Cov Parm Subject Estimate Intercept site 0.1625 Effect Estimate Pr > |t| Intercept -1.1019 0.2527 weight 0.5042 0.0035 width 0.0318 0.5229 Zero Inflated ?

Demo Crabs_OVERDISP.sas

D.A.D. SAS Global Forum 2010 Zero Inflated Poisson (ZIP)

D.A.D. SAS Global Forum 2010 Zero Inflated Poisson (ZIP) procnlmixed data=crab; parms b0=0 bwidth=0 bweight=0 c0=-2 c1=0s2u1=1 s2u2=1; x=c0+c1*width+u1; p0 = exp(x)/(1+exp(x)); eta= b0+bwidth*width +bweight*weight +u2; lambda=exp(eta); if satellites=0 then loglike = log(p0 +(1-p0)*exp(-lambda)); else loglike = log(1-p0)+satellites*log(lambda)-lambda-lgamma(satellites+1); expected=(1-p0)*lambda; id p0 expected lambda; model satellites~general(loglike); Random U1 U2~N([0,0],[s2u1,0,s2u2]) subject=site; predict p0+(1-p0)*exp(-lambda) out=out1; run;

D.A.D. SAS Global Forum 2010 Zero Inflated Poisson (ZIP) Parameter Estimates Parameter Estimate t Pr>|t| Lower Upper b0 2.7897 2.55 0.0268 0.3853 5.1942 bwidth -0.0944 -1.65 0.1267-0.2202 0.0314 bweight 0.4649 2.38 0.0366 0.0347 0.8952 c0 13.3739 4.42 0.0010 6.7078 20.0401 c1 -0.5447 -4.61 0.0008 -0.8049 -0.2844 s2u1 0.5114 1.12 0.2852 -0.4905 1.5133 s2u2 0.1054 1.67 0.1239 -0.0339 0.2447

Demo Crabs_ZIP.sas

D.A.D. SAS Global Forum 2010 From fixed part of model, compute Pr{count=j} and plot (3D) versus Weight, Carapace width

D.A.D. SAS Global Forum 2010 Another possibility: Negative binomial Number of failures until kth success ( p=Prob{success} )

D.A.D. SAS Global Forum 2010 3 trials before first success Negative Binomial Crab beer Crab beer Satellites

D.A.D. Negative binomial: In SAS, k is our 1/k SAS Global Forum 2010 procglimmix data=crab; class site; model satellites = weight width / dist=nb solution ddfm=kr; random int / subject=site; run; Fit Statistics -2 Res Log Pseudo-Likelihood 539.06 Generalized Chi-Square 174.83 Gener. Chi-Square / DF 1.03 Covariance Parameter Estimates Cov Parm Subject Estimate Std. Error Intercept site 0.09527 0.07979 Scale 0.7659 0.1349 Standard Effect Estimate Error DF t Value Pr > |t| Intercept -1.2022 1.6883 168.5 -0.71 0.4774 weight 0.6759 0.3239 156.6 2.09 0.0386 width 0.01905 0.08943 166.2 0.21 0.8316 Num Den Effect DF DF F Value Pr > F weight 1 156.6 4.35 0.0386 width 1 166.2 0.05 0.8316

Demo Crabs_NEGBIN.sas

Population average model vs. Individual Specific Model 8 typists Y=Error counts (Poisson distributed) ln(li)= ln(mean of Poisson) = m+Ui for typist i. conditionally (individual specific) Distributions for Y, U~N(0,1) and m=1 l=em=e1=2.7183 = mean for “typical” typist

D.A.D. SAS Global Forum 2010 Population average model Expectation ||||| | | of individual distributions averaged across population of all typists. Run same simulation for 8000 typists, compute mean of conditional population means, exp(m+U). The MEANS Procedure Variable N Mean Std Dev Std Error ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ lambda 8000 4.4280478 6.0083831 0.067175 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Z=(4.428-2.7183)/0.06718 = 25.46 !! Population mean is not em Conditional means, m+U, are lognormal. Log(Y)~N(1,1)  E{Y}=exp(m+0.5s2) = e1.5 = 4.4817

Demo Typists.sas

D.A.D. SAS Global Forum 2010 • Main points: • Generalized linear models with random effects are subject specific models. • Subject specific models have fixed effects that represent an individual with random effects 0 (individual at the random effect distributional means). • Subject specific models when averaged over the subjects do not give the model fixed effects. • Models with only fixed effects do give the fixed effect part of the model when averaged over subjects and are thus called population average models.

Data: Crab mating patterns