Analysis of Horse Kick Fatalities in the Prussian Army: A Poisson Regression Study
This study investigates the trend in the number of deaths resulting from horse kicks across 16 corps in the Prussian army from 1875 to 1894. We utilize Poisson regression to assess whether there was a significant trend in fatalities over the years for the guard corps. The Generalized Linear Model is employed, and the analysis of deviance (ANODEV) is conducted to evaluate the fit of our model. Results indicate no significant trend in deaths (ΔG=0.611, p=0.4343), leading to the conclusion that the risk of death from horse kicks remained stable across the two decades studied.
Analysis of Horse Kick Fatalities in the Prussian Army: A Poisson Regression Study
E N D
Presentation Transcript
Chapter 17.1 Poisson Regression
Classic Poisson Example • Number of deaths by horse kick, for each of 16 corps in the Prussian army, from 1875 to 1894 • Did the risk of death show an trend across years for the guard corps?
1. Construct Model - Formal Write General Linear Model: General linear model inappropriate for count data: • Variance likely increases with mean • Fitted values may be negative • Errors tend not to be normal • Zeros are difficult to handle with transformations
1. Construct Model - Formal Write General Linear Model: Write Generalized Linear Model:
2. Execute analysis & 3. Evaluate model glm1 <- glm(deaths~year, family=poisson(link=log), data=horsekick)
2. Execute analysis & 3. Evaluate model glm1 <- glm(deaths~year, family=poisson(link=log), data=horsekick)
State population and whether sample is representative. • Decide on mode of inference. Is hypothesis testing appropriate? • State HA / Ho pair, tolerance for Type I error Statistic: Distribution:
7. ANODEV. Calculate change in fit (ΔG) due to explanatory variables. • The F-statistic is not used for models with non-normal errors • We will assess improvement in fit (ANODEV)
7. ANODEV. Calculate change in fit (ΔG) due to explanatory variables. > anova(glm1, test="Chisq") Analysis of Deviance Table Model: poisson, link: log Response: deaths Terms added sequentially (first to last) Df Deviance Resid. DfResid. DevPr(>Chi) NULL 19 22.050 year 1 0.61137 18 21.439 0.4343
Assess table in view of evaluation of residuals. • Residuals acceptable • Assess table in view of evaluation of residuals. • Reject HA: There was no apparent trend in deaths by horsekick over two decades (ΔG=0.611, p=0.4343) • Analysis of parameters of biological interest. • βyear was not significant – report mean deaths/yr • 16 deaths / 20 years = 0.8 deaths/year
library(pscl) library(Hmisc) prussian horsekick <- subset(prussian, corp=="G") names(horsekick) <- c("deaths","year","corps") glm0 <- glm(deaths ~ 1, family = poisson(link = log), data = horsekick) # intercept only glm1 <- glm(deaths ~ year, family = poisson(link = log), data = horsekick) plot(glm1, which=1, add.smooth=F, pch=16) plot(glm1$residuals, Lag(glm1$residuals), xlab="Residuals", ylab="Lagged residuals", pch=16) plot(deaths~year, data=horsekick, pch=16, axes=F, xlab="Year", ylab="Deaths (Guard corp)") axis(1, at=75:94, labels=1875:1894) axis(2, at=0:3) box() lines(horsekick$year, glm1$fitted) # with regression term lines(horsekick$year, glm0$fitted, lty=2) # intercept anova(glm1, test="Chisq")