230 likes | 352 Vues
Applications of Using GLM Offset Option. Jun Yan - Deloitte Consulting LLP Matthew Flynn - ISO. Applications of Using GLM Offset Option. Introduction Constrained, multiple factors analysis Sequential Modeling Run Time Dynamic Offset. Introduction
E N D
Applications of Using GLM Offset Option Jun Yan - Deloitte Consulting LLP Matthew Flynn - ISO
Applications of Using GLM Offset Option • Introduction • Constrained, multiple factors analysis • Sequential Modeling • Run Time Dynamic Offset
Introduction Intuitively, it is a method used to run a model against the residual of a set of given factors. Mathematically Y-g(z) = f(x) Where Y is the target variable, z are the offset variables, and x are the explanatory variables. “Y-g(z) ”is the residual generated with offset.
Constrained, multiple factors analysis • Business reasoning • An Example • A frequency model with two predictive variables: • 1. Driver_age_group (1, 2, 3 and 4) • 2. Type (‘S’(Single) and ‘M’(Multi)) • Required Constraints: • 1. The frequency relativities of driver_age_group for ‘3’ • and ‘4’ are 1.05 and 1.25, respectively. • 2. The other relativities are estimated using GLM
Constrained, multiple factors analysis An Example – SAS Code data freq_data; set input; claim_freq=claim_count / exposure; ************************ Define offset factors *****************************************************; offset_factor=1; if driver_age_group =3 then offset_factor=1.05; if driver_age_group =4 then offset_factor=1.25; logoffset=log(offset_factor); *********** Put the constraint as the base of the new variable****; if driver_age_group in (1,2) then driver_age_group_new= driver_age_group; else driver_age_group_new =9; run; proc genmod data=freq_data; class driver_age_group_new type; model claim_freq = driver_age_group_new type / dist=poisson link=log offset= logoffset; run;
Constrained, multiple factors analysis An Example – Parameter Estimates and Final Result
Sequential Modeling • Why need sequential modeling? • Tear down a complicated modeling process to a sequence of simple modeling steps • To better identify significance of adding new variables • To generate reasonable model estimates • Regulatory Considerations (E.g. California Proposition 103) • “personal automobile insurance rates must be determined • using the following factors in decreasing order of importance– • insured's driving safety record, number of miles driven • annually by the insured, and number of years of driving • experience the insured has had.”
Sequential Modeling • A simple example for sequential modeling • Using 3 variables, • type • driver_age_group • vehicle_use • to generate frequency tables for PD coverage. • Assume driver_age_group and vehicle_use are interactive • Two steps sequential modeling • Model marginal effects • Model interaction of driver_age_group and vehicle_use
Sequential Modeling Example Step 1 - Model Marginal Effects Model Setup Input Dataset = The PD Modeling Dataset; Target Variable = pd_freq (pd_claim_count/pd_exposure); Three Categorical Predictive Variables = type (value: ‘M’ and ‘S’) driver_age_group (value: 1, 2, 3, 4) vehicle_use (value: ‘PL’, ‘WK’); model pd_freq = type driver_age_group vehicle_use; Distribution = poisson; Link Function = Log; weight = pd_exposure;
Sequential Modeling Example Step 1 - Model Marginal Effect Model Output: 2 Parameter Estimates and Frequency Relativities
Sequential Modeling Example Step 2 – Model Interactions Create an offset factor using the estimates of Step 1 Model output offset_mod1=0; if Type='M' then offset_mod1 = offset_mod1 + -0.26; if Type='S' then offset_mod1 = offset_mod1 + 0.00; if Driver_age_group = 1 then offset_mod1 = offset_mod1 + 0.37; if Driver_age_group = 2 then offset_mod1 = offset_mod1 + 0.04; if Driver_age_group = 3 then offset_mod1 = offset_mod1 + -0.63; if Driver_age_group = 4 then offset_mod1 = offset_mod1 + 0.00; if Vehicle_Use = 'PL' then offset_mod1 = offset_mod1 + -0.36; if Vehicle_Use = 'WK' then offset_mod1 = offset_mod1 + 0.0000;
Sequential Modeling Example Step 2 - Model Interactions ModelSetup Input Dataset = The PD Modeling Dataset; Target Variable = pd_freq (pd_claim_count/pd_exposure); Two Categorical Predictive Variables = driver_age_group (value: 1, 2, 3, 4) vehicle_use (value: ‘PL’, ‘WK’); model pd_freq = Driver_age_group*Vehicle_Use; Distribution = Poisson; Link Function = Log; Offset = offset_mod1; weight = pd_exposure;
Sequential Modeling Example Step 2 - Model Interactions Model Output
Sequential Modeling Final Result from combining two steps
Run Time Dynamic Offset • What is “Run Time Dynamic Offset”? • Select offset variables • Insert those offset variables to regression as predictive variables and run the regression • Zero out the parameter estimates of the offset variables from the post modeling score program • When we need a “Run Time Dynamic Offset”? • 1. Offset factors are not given • 2. Offset factors are dynamically changed
Run Time Dynamic Offset • A simple example for run time dynamic offset • Create PD frequency relativities for class plan variables • A Business Initiatives: • Create frequency relativity tables for PD • Need to remove the territory bias from frequency model output • Modeling Issues: • Territory Frequency relativity table is not available • The territory impact on model output is dynamic
Run Time Dynamic Offset Frequency Model Setup Input Dataset = The PD Modeling Dataset; Target Variable = pd_freq ( pd_claim/pd_exp); 4 Categorical Predictive Variables = driver_age_group (value: 1, 2, 3, 4) vehicle_use (value: ‘PL’, ‘WK’) type (‘S’, ‘M’) territory(1, 2, 3, 4, 5); model pd_freq = Driver_age_group Vehicle_Use Type Territory; Distribution = poisson; Link Function = Log; weight = pd_exp;
Run Time Dynamic Offset Frequency Model Output
Run Time Dynamic Offset Frequency Model Output – After Offsetting Territory Impact
Run Time Dynamic Offset (RTDO) • RTDO is equivalent to a two step regressionprocess • Insert the offset variables to the first regression as predictive variables and run the regression • Set up an offset parameter using the estimates of the offset variables from the prior regression to run the second regression
Run Time Dynamic Offset (RTDO) • RTDO is equivalent to a process with two step regressions • Step 1: Create Territory Offset • Terr_offset = 0; • if territory = ‘T1’ then terr_offset = -0.55; • if territory = ‘T2’ then terr_offset =-0.40; • if territory = ‘T3’ then terr_offset =-0.25; • if territory = ‘T4’ then terr_offset =-0.18;
Run Time Dynamic Offset (RTDO) • RTDO is equivalent to a process with two step regressions • Step 2: Model Setup • Input Dataset = The PD Modeling Dataset; • Target Variable = pd_freq ( pd_claim/pd_exp); • 4 Categorical Predictive Variables = driver_age_group (value: 1, 2, 3, 4) • vehicle_use (value: ‘PL’, ‘WK’) • type (‘S’, ‘M’); • model pd_freq = Driver_age_group Vehicle_Use Type; • Distribution = poisson; • Link Function = Log; • Offset = terr_offset; • weight = pd_exp;
Summary of the Presentation • Why offset? • To eliminate the impact of the variables that will not be included in the model but will influence modeling results • Specific Applications • Constrained Modeling • Sequential modeling • Runtime Dynamic Offset