130 likes | 154 Vues
Generalized Linear Models CAS - Boston. Monday, November 11, 2002. Keith D. Holler Ph.D., FCAS, ASA, ARM, MAAA. High Level. e.g. Eye Color Age Weight Coffee Size. Given Characteristics:. Predict Response:
E N D
Generalized Linear ModelsCAS - Boston Monday, November 11, 2002 Keith D. Holler Ph.D., FCAS, ASA, ARM, MAAA
High Level e.g. Eye Color Age Weight Coffee Size Given Characteristics: Predict Response: e.g. Probability someone takes Friday off, given it’s sunny and 70°+ e.g. Expected amount spent on lunch
Example – Personal Auto Log (Loss Cost) = Intercept + Driver + Car Age Size Factor i Factor j Parameters e.g. Young Driver, Large Car Loss Cost = exp (6.50 + .75 + 0) = $1,408
Technical Bits • Exponential families – gamma, poisson, normal, binomial • Fit parameters via maximum likelihood • Solve MLE by IRLS or Newton-Raphson • Link Function (e.g. Log Loss Cost) • 1-1 function • Range Predicted Variable ( - , ) • LN multiplicative model, id additive model logit binomial model (yes/no) • g(E[Y]) = X+ • Different means, same scale
Why GLMS? • Multivariate – adjusts for presence of other variables. No overlap. • For non-normal data, GLMS better than OLS. • Preprogrammed – easy to run, flexible model structures. • Maximum likelihood allows testing importance of variables. • Linear structure allows balance between amount of data and number of variables. • Condense data – mean estimate unchanged, scale estimate changes.
Example – Personal Auto Property Damage Frequency Model N – Random number of claims – Average or Expected Value of N Model N ~ Poisson (mean = ) Log ( ) = Intercept + Age + Gender + Marital + Gender * Marital + Credit + SM + Year + * Accidents + log (exposure)
Example – Personal Auto Property Damage Frequency How Use? • Have N ~ Poisson ( ), depends on classification variables. • Really want relative difference to a base class. • Example Base Class 40-59, UM, NOHIT, S, 0 accidents • All factors are 0 • Don’t care about intercept, policy year, or exposure • Base rate set for base class e.g. $100 • To rate anyone else – factor X base rate • E.g. 30, U, F, E06, S, 2 accidents • Factor – exp(.09 + 0 -.03 + .14 + 0 + 2 x .28) = 2.14 • Rate = 2.14 x 100 = $214
Diagnostics • Actual vs Modeled on Training and Test data • P-values and confidence intervals • Actual vs Modeled on variables NOT used in model. • Graphs – Standardized deviance residuals vs linear predictor OR Q-Q Plot. • Leverage and influential points. • Likelihood ratio tests for entire variables. • 50/50 modeling.
Personal Auto Class Plan Issues: • Territories or other many level variables • Deductibles and Limits • Loss Development • Trend • Frequency, Severity or Pure Premium • Exposure • Model Selection – penalized likelihood an option
Software and References Software: SAS, GLIM, SPLUS, EMBLEM, Pretium GENSTAT, MATLAB, STATA, SPSS References: Part 9 paper bibliography Greg Taylor (Melbourne 1997) Stephen Mildenhall (1999) Hosmer and Lemeshow (2000) Farrokh Guiahi (June 2000) Karl P. Murphy (Winter 2000) Other: R “http://www.r-project.org/” Venables and Ripley (SPLUS)
R Code Example • > Options(contrasts = c(“contr.treatment”, “contr.treatment”)) • > pd.data_read.table(“c:\\kdh\\temp\\tree1000.dat”,header=F) • > pd.data[1:3,] • V1 V2 V3 V4 V5 V6 V7 V8 V9 • 553.67 19 A39 F M E02 M 1995 0 • 61.86 3 A39 F M E02 M 1995 1 • 7.35 0 A39 F M E02 M 1995 2 • > model1_ glm(V2~V3+V4+V5+V4*V5+V6+V7+ • as.factor(V8)+V9+offset(log(V1)), • family=poisson(link=“log”),data=pd.data) • > summary(model1) • Keith.Holler2@thehartford.com