1 / 17

Introduction to Generalized Linear Model (GLM)

Introduction to Generalized Linear Model (GLM). Man Li, Research Fellow International Food Policy Research Institute Technical Training for Modeling Scenarios for Low Emission Development Strategies, September 9 th –20 th , 2013. What is GLM?.

benard
Télécharger la présentation

Introduction to Generalized Linear Model (GLM)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Generalized Linear Model (GLM) Man Li, Research Fellow International Food Policy Research Institute Technical Training for Modeling Scenarios for Low Emission Development Strategies, September 9th–20th, 2013

  2. What is GLM? • In statistics, the GLM is a flexible generalization of ordinary linear (OL) regression that allows for response variable (Y) that other than a normal distribution. • The GLM generalizes linear regression by allowing the linear model to be related Y via a LINK FUNCTION, i.e., E(Y) = μ = g-1(Xβ), where g is the link function s.t.g(μ) = Xβ.

  3. Common distributions with typical uses and canonical link functions

  4. Logit Regression for Binary Responses • Example: Survival and gender in the Donner party―an observational study In 1846 the Donner families left Springfield, Illinois for California by covered wagon. When they reached Fort Bridger, Wyoming in July, the Donner party decided to attempt a new and untested route to the Sacramento Valley. Having reached its full size of 87 people and 20 wagons, the party was delayed in the difficult crossing of the Wasatch Range and again in the crossing of the desert west of the Great Salt Lake. The group became stranded in the eastern Sierra Nevada mountains when hit by heavy snows in late October. By the time the last survivor was rescued on 21 April 1847, 40 of the 87 members had died from famine and exposure to extreme cold.

  5. Example: Donner Party Deaths Ages and sexes of the adult (over 15 years) in the party • These data were used to study the theory that females are better able to withstand harsh conditions than are males

  6. Example: Donner Party Deaths • Question: For a given age, were women more likely to survival than were men? • If linear model: • Yi|Xi= Xiβ (i.i.d) • Y = 1 if survived, = 0 if died • X = (age, sex)

  7. Ordinary Linear Regression • Fitting model: Y = 0.747 – 0.013*age + 0.319*I[sex=female]

  8. Ordinary Linear Regression―with Interaction Term • Fitting model: Y = 0.535 – 0.006*age + 1.091*I[sex=female] – 0.025*age*I[sex=female]

  9. Logit Regression • Model: • Yi|Xi ~ Bin(1, πi) (independent) • g(πi) = log(πi/1- πi) = Xiβ • Y = 1 if survived, = 0 if died • X = (age, sex) • Null model: log odds of survival = β0+β1age+β2I[sex=female]

  10. Possible problems • Logitis not a straight line function of age • Do quadratic age term tests separately for males and females (Wald test) X = (age, agesq) • Slopes are not the same for males and females • Test for the significance of interaction term (Wald test) X = (age, sex, age*sex) • Alternative to Wald: Likelihood ratio test

  11. Exercise • Open R program code that is located at ftp://ftp.cgiar.org/ifpri/leds2013sep/GLM/GLM_code.R • Load data named “donner” • Define indicator variable “survival” and “sex” • Draw a scatterplot: survival vs. age by gender

  12. Exercise • Estimate the null model, examine the sign and the p-Value of age and sex variables • Test for the quadratic term of age by gender group • Test for the interaction of sex and age • Draw two fitting plots: the null model and the model with interaction term

  13. How the Results look like? • H0 model: log odds of survival = 1.633-0.078*age+1.597*I[sex=female] • H1 model: log odds of survival = 0.318-0.032*age+6.928*I[sex=female] – 0.025*age*I[sex=female]

  14. Logit Regression for Multiple Responses • Yi|Xi~ Mult(mi, π1i , π2i ,…, πKi), ∑k πki= 1 Y = 1,2,…,K. (K-category response) • There are K-1 logit models: log(π1i /πKi) = Xiβ1 log(π2i / πKi) = Xiβ2 … log(πk-1i / πKi) = XiβK-1 Note: βK is normalized to be 0 • Rewrite the probabilities Pr(Yi = 1) = exp(Xiβ1)/∑kexp(Xiβk) Pr(Yi = 2) = exp(Xiβ2)/∑kexp(Xiβk) … Pr(Yi = K-1) = exp(XiβK-1)/∑kexp(Xiβk) Pr(Yi = K) = exp(XiβK)/∑kexp(Xiβk)

  15. Logit Regression for Multiple Responses • Maximum likelihood estimates LL(β) = ∑i∑kI[Yi = k] *log(Pr(Yi= k)) ) = ∑i∑kI[Yi = k] *log[exp(Xiβk)/∑jexp(Xiβj)] • Goodness of fit: Likelihood ratio index 𝜌 = 1- LL()/LL(0) • Coefficients βk are difficult to interpret; generally use marginal effects to get economic interpretation • Marginal effects: Given one unit change in Xi, how much would be changed in the prob. of Yi?

  16. R Code • multinom() function library(nnet) count.matrix <- cbind(Y1,Y2,…,YK) fit <- multinom(count.matrix~ X1+X2+…, data=, Hess=True)

  17. Some Extensions • Conditional logit • Xik is specific to alternative choice, but β does not vary across choice, i.e., Xikβ • Nested logit • Can be decomposed into two standard logit • Mixed logit • Integrals of standard logit probabilities over a density of parameters β • See Train (2003) Discrete Choice Methods with Simulation for more discussions

More Related