1 / 56

Chapter 3: Generalized Linear Models

Chapter 3: Generalized Linear Models. Chapter 3: Generalized Linear Models. Objectives. Review the linear model. Generalize the linear model. Describe several common generalized linear models. Review Linear Models.

shiloh
Télécharger la présentation

Chapter 3: Generalized Linear Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 3: Generalized Linear Models

  2. Chapter 3: Generalized Linear Models

  3. Objectives • Review the linear model. • Generalize the linear model. • Describe several common generalized linear models.

  4. Review Linear Models • A model is linear in the parameters when there is only one parameter per term and it is a multiplicative constant. • It is not a matter of a linear response. • The response is modeled as a linear combination of terms. • This model is linear: • This model is not linear:

  5. Linear Model Error • The linear model is for the expected value or the mean of the response. • The linear model includes the response errors as normally distributed deviations with a mean of 0 and a constant variance. • The variance does not depend on any explanatory variable. • The errors are added to the expectation.

  6. Generalized Linear Model • The linear model can be generalized to cases with nonnormal responses that are functions of the mean. • A random component uses any distribution in the natural exponential family. • A systematic component relates the predictors to the response. • A link function relates the mean response to the systematic component.

  7. Random Component • The random component uses any distribution in the natural exponential family. The PMF or PDF is in this form: • a(θi) is a function of the distribution parameter. • b(yi) is a function of the response. • Q(θi) is the natural parameter.

  8. Random Component in JMP • The following distributions are available to serve as the random component of a GLM in JMP: • Normal • Binomial • Poisson • Exponential

  9. Systematic Component • The systematic component uses a linear model.

  10. Systematic Component in JMP • Use Fit Model to specify the systematic component, as you would for ordinary least squares regression. • Create linear combinations of effects by adding terms made from data columns.

  11. Link Component • The link function g relates the random component and the systematic component. • The link is a monotonic and differentiable function. • It is the canonical link function if it transforms the mean to the natural parameter Q(θ),.

  12. Link Component in JMP • The following functions are available to serve as the link component in JMP: • Identity • Log • Logit • Reciprocal • Probit • Power: • Complementary log-log:

  13. 3.01 Quiz • Match the component of a GLM on the top with its representation or an example on the bottom. • Random component • Systematic component • Link component

  14. 3.01 Quiz – Correct Answer • Match the component of a GLM on the top with its representation or an example on the bottom. • Random component • Systematic component • Link component The correct answer is A-3, B-2, and C-1.

  15. Binary Logistic Regression • A binary response can also be modeled with a GLM. • The canonical link function is the logit.

  16. Poisson Regression • A simple model of counts is the Poisson distribution. • The canonical link function is the log.

  17. Poisson Loglinear Model with Offset • The opportunity for the counts might not be constant for all observations. • The opportunity N might be a period of time, a length, an area, or a volume. • Log(Ni) is the offset.

  18. Common Generalized Linear Models

  19. Deviance • The deviance is a measure of goodness of fit. • The deviance assesses the difference between the observed and the predicted response. • Differences should be random (chi-square). • The deviance assesses the value of explanatory variables in the model. • Deviance aids model selection. • The deviance is twice the difference in log-likelihood between the saturated model and the full model.

  20. Over-Dispersion • Lack of fit can result from more variance than expected from the model distribution. • An over-dispersion parameter can be used to account for the excess in the case of a binomial or Poisson distribution. • The parameter equals 1 when there is no over-dispersion.

  21. 3.02 Multiple Answer Poll • Which of the following statements are true of the deviance associated with a GLM? • The deviance is the difference between the predicted response and the observed response. • The deviance is twice the difference in log-likelihood between the saturated model and the full model. • The deviance is a measure of goodness of fit. • The deviance measures the variance of the response.

  22. 3.02 Multiple Answer Poll – Correct Answer • Which of the following statements are true of the deviance associated with a GLM? • The deviance is the difference between the predicted response and the observed response. • The deviance is twice the difference in log-likelihood between the saturated model and the full model. • The deviance is a measure of goodness of fit. • The deviance measures the variance of the response.

  23. Chapter 3: Generalized Linear Models

  24. Objectives • Review binary logistic regression models. • Model binary responses with a GLM.

  25. Binary Logistic Regression Model

  26. Advantage of Using Logistic Regression • JMP provides the following when you use logistic regression: • Likelihood ratio test for lack of fit • Many measures of goodness of fit • Profiles of probability for all levels of the predictor • Odds ratios • ROC curve • Lift curve • Confusion matrix

  27. Advantage of Using Binary GLM • JMP provides the following when you use a GLM for a binary response: • Deviance for lack of fit • Over-dispersion model parameter • Likelihood ratio test for over-dispersion • Four residual plots • Prediction profiler for probability of target level

  28. GLM for a Binary Response • A binary response can also be modeled with a GLM. • The canonical link function is the logit.

  29. Separation Problem • It might happen in any given sample that the binary outcomes are completely separated by the explanatory variable. • This separation causes a problem with estimating the logistic regression or GLM parameters. • Firth’s penalized maximum likelihood estimation method can avoid this problem and reduce bias in the parameter estimates in the case of rare outcomes.

  30. Pearson Residuals • Pearson chi-square for goodness of fit is the sum of the squared Pearson residuals.

  31. Deviance Residuals • The deviance chi-square for goodness of fit is the sum of the squared deviance residuals. • Studentized residuals provide a common scale for inspection.

  32. 3.03 Quiz • What are the three GLM components for a binary response?

  33. 3.03 Quiz – Correct Answer • What are the three GLM components for a binary response? • Random component is the binomial distribution. • Systematic component is a polynomial function. • Link component is the logit function.

  34. GLM for Binary Response Example • Use GLM with the Titanic Passengers data set to related Survived with Siblings and Spouses, Parents and Children, and Fare.

  35. GLM for a Binary Response This demonstration illustrates the concepts discussed previously.

  36. Exercise This exercise reinforces the concepts discussed previously.

  37. Chapter 3: Generalized Linear Models

  38. Objectives • Identify categorical response of counts. • Use a GLM that is also known as Poisson loglinear regression.

  39. Response Is Counts • The response can be simply the count of a particular event in many cases. • Occurrence of a disease • Road accidents • Mold colonies • Number of non-conforming items

  40. Response Is Counts, Constant Opportunity • The response can be the count of a particular event in the same span of time, linear dimension, area, or volume. • Occurrence of a disease per annum • Road accidents each month on the same highway • Mold colonies in a standard Petri dish • Number of non-conforming items in a standard lot size

  41. Poisson Regression • A simple model of counts is the Poisson distribution. • The canonical link function is the log.

  42. Response Is Counts, Opportunity Varies • The response can be simply the count of a particular event in the same span of time or linear dimension, area, or volume. • Occurrence of a disease in different hospitals • Road accidents on different highways • Mold colonies in nonstandard field cases • Number of non-conforming items in lots of different sizes • Requires the use of an offset parameter in the model. • Acts like intercept in the linear model.

  43. Poisson Loglinear Model with Offset • The opportunity for the counts might not be constant for all observations. • The opportunity N might be a period of time, a length, an area, or a volume. • Log(Ni) is the offset.

  44. 3.04 Multiple Answer Poll • How is the logarithm always used in Poisson regression with GLM? • Transform the response variable • Transform the explanatory variable • Transform the offset variable • Link the systematic and random components • Increase the over-dispersion

More Related