1 / 12

Generalized Linear Models CAS - Boston

Generalized Linear Models CAS - Boston. Monday, November 11, 2002. Keith D. Holler Ph.D., FCAS, ASA, ARM, MAAA. High Level. e.g. Eye Color Age Weight Coffee Size. Given Characteristics:. Predict Response:

sheriej
Télécharger la présentation

Generalized Linear Models CAS - Boston

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generalized Linear ModelsCAS - Boston Monday, November 11, 2002 Keith D. Holler Ph.D., FCAS, ASA, ARM, MAAA

  2. High Level e.g. Eye Color Age Weight Coffee Size Given Characteristics: Predict Response: e.g. Probability someone takes Friday off, given it’s sunny and 70°+ e.g. Expected amount spent on lunch

  3. Example – Personal Auto Log (Loss Cost) = Intercept + Driver + Car Age Size Factor i Factor j Parameters e.g. Young Driver, Large Car Loss Cost = exp (6.50 + .75 + 0) = $1,408

  4. Technical Bits • Exponential families – gamma, poisson, normal, binomial • Fit parameters via maximum likelihood • Solve MLE by IRLS or Newton-Raphson • Link Function (e.g. Log Loss Cost) • 1-1 function • Range Predicted Variable  ( - ,  ) • LN  multiplicative model, id  additive model logit  binomial model (yes/no) • g(E[Y]) = X+ • Different means, same scale

  5. Why GLMS? • Multivariate – adjusts for presence of other variables. No overlap. • For non-normal data, GLMS better than OLS. • Preprogrammed – easy to run, flexible model structures. • Maximum likelihood allows testing importance of variables. • Linear structure allows balance between amount of data and number of variables. • Condense data – mean estimate unchanged, scale estimate changes.

  6. Example – Personal Auto Property Damage Frequency Model N – Random number of claims – Average or Expected Value of N Model N ~ Poisson (mean = ) Log ( ) = Intercept + Age + Gender + Marital + Gender * Marital + Credit + SM + Year + * Accidents + log (exposure)

  7. Model Output

  8. Example – Personal Auto Property Damage Frequency How Use? • Have N ~ Poisson ( ), depends on classification variables. • Really want relative difference to a base class. • Example Base Class 40-59, UM, NOHIT, S, 0 accidents • All factors are 0 • Don’t care about intercept, policy year, or exposure • Base rate set for base class e.g. $100 • To rate anyone else – factor X base rate • E.g. 30, U, F, E06, S, 2 accidents • Factor – exp(.09 + 0 -.03 + .14 + 0 + 2 x .28) = 2.14 • Rate = 2.14 x 100 = $214

  9. Diagnostics • Actual vs Modeled on Training and Test data • P-values and confidence intervals • Actual vs Modeled on variables NOT used in model. • Graphs – Standardized deviance residuals vs linear predictor OR Q-Q Plot. • Leverage and influential points. • Likelihood ratio tests for entire variables. • 50/50 modeling.

  10. Personal Auto Class Plan Issues: • Territories or other many level variables • Deductibles and Limits • Loss Development • Trend • Frequency, Severity or Pure Premium • Exposure • Model Selection – penalized likelihood an option

  11. Software and References Software: SAS, GLIM, SPLUS, EMBLEM, Pretium GENSTAT, MATLAB, STATA, SPSS References: Part 9 paper bibliography Greg Taylor (Melbourne 1997) Stephen Mildenhall (1999) Hosmer and Lemeshow (2000) Farrokh Guiahi (June 2000) Karl P. Murphy (Winter 2000) Other: R “http://www.r-project.org/” Venables and Ripley (SPLUS)

  12. R Code Example • > Options(contrasts = c(“contr.treatment”, “contr.treatment”)) • > pd.data_read.table(“c:\\kdh\\temp\\tree1000.dat”,header=F) • > pd.data[1:3,] • V1 V2 V3 V4 V5 V6 V7 V8 V9 • 553.67 19 A39 F M E02 M 1995 0 • 61.86 3 A39 F M E02 M 1995 1 • 7.35 0 A39 F M E02 M 1995 2 • > model1_ glm(V2~V3+V4+V5+V4*V5+V6+V7+ • as.factor(V8)+V9+offset(log(V1)), • family=poisson(link=“log”),data=pd.data) • > summary(model1) • Keith.Holler2@thehartford.com

More Related