1 / 46

Data Mining Applications in P&C Insurance

Data Mining Applications in P&C Insurance. CASE Spring Meeting April 12, 2005 Lijia Guo, PhD, ASA, MAAA University of Central Florida. Agenda. Introductions to data mining modeling Understanding the data mining process Data mining (DM) techniques Applications in P&C Insurance Case Study.

brit
Télécharger la présentation

Data Mining Applications in P&C Insurance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Applications in P&C Insurance CASE Spring Meeting April 12, 2005 Lijia Guo, PhD, ASA, MAAA University of Central Florida

  2. Agenda • Introductions to data mining modeling • Understanding the data mining process • Data mining (DM) techniques • Applications in P&C Insurance • Case Study Guo

  3. Introduction – What is Data Mining? • Process of exploration and analysis of large quantities of data in order to discover meaningful patterns and rules. • Uses a variety of data analysis tools to discover relationships that may be used to make valid predictions. • It is not a magic wand: • Must know your business • Understand your data • Understand the analytical methods Guo

  4. Introduction - DM Modeling • An information discovery process. • Knowing your goals • Understanding your data • Choosing the right methods • Understanding the limitations • Validation and testing • Make crucial business decisions Guo

  5. Define the Goal Understand the Economics Identify Data Sources Prepare Data Transform Data Apply DM Models IMPLEMENT Validate DM Models Introduction – DM Process Guo

  6. Introduction – DM Goals • Identifying responsive potential customers • Identifying existing customers that more likely to terminate • Identifying high risk purchaser • Identifying the factors that cause large claims • Identifying interactions among risk factors Guo

  7. Introduction – DM Process Guo

  8. DM Techniques • Decision Trees • Logistic regression • Neural Networks • Fuzzy Logics • Genetic Algorithms • Clustering • Associated discovery • Sequence Discovery • Bayesian analysis • Visualization Hybrid algorithms Guo

  9. DM Techniques -- Decision Trees • What are decision trees • Classify observations based on the values of nominal, binary, or ordinal targets • Predict outcomes for interval targets • Predict the appropriate decision when you specify decision alternatives Guo

  10. DM Techniques -- Decision Trees Example Guo

  11. DM Techniques -- Decision Trees • Strengths and weaknesses • Insights into the decision-making process • Efficient and is thus suitable for large data sets • Relatively unstable • Difficult to detect linear or quadratic relationships Guo

  12. DM Techniques -- Logistic regression • What is Logistic regression • How Logistic regression works • Odds ratios • Each dependent variable affects logit linearly Guo

  13. DM Techniques - Logistic Regression • Strengths and weaknesses • Maximum Likelihood Curve Fitting • Multiple Logistic Regression Model • Interaction-effect modifier • Multinomial Logistic Regression Model Guo

  14. network architecture with two hidden layers DM Techniques -- Neural Networks • What are Neural Networks • Input layer - a unit for each input variable • Output layer - the target • Hidden layer - hidden unit (neurons) y Guo

  15. DM Techniques – Neural Networks • : output activation function. • : activation functions-nonlinear transformations. • : weights • : Bias Guo

  16. DM Techniques –Neural Networks • How Neural Networks work • Processing elements • Training • Predicting • Activation Functions • logistic function • hyperbolic tangent Guo

  17. DM Techniques -- Neural Networks • Strengths and weaknesses • Accurately prediction for complex problems • Black box predict engine • Overtraining • Training speed Guo

  18. DM Techniques -- Hybrid Algorithms • Problems with standard algorithms • Advanced algorithms • Discovery-driven approaches • Mixture of algorithms Guo

  19. DM Applications in P&C Insurance • Data Warehouse • Underwriting • Pricing/Rate Making • Claim Scoring • Risk Management • Policy Level Analysis • Variable Selection Guo

  20. Primary Selection:WHO? UniquePatient List Transactions Transactions Surveys Surveys Demographics Demographics PharmacyClaims Secondary Selection: WHAT DATA? Rx Service Level Table Derived Variables/ Flags PhysicianClaims Operational Data Store Med Claims Surveys ... Tertiary Selection: WHAT DOES THE TRANSACTION DATA TELL US? Group by Patient HospitalClaims Summary: WHAT DO WE KNOW ABOUT THIS PATIENT? Service Level Variables Summary Level Variables Summary Level Table Data Warehousing Example Guo

  21. DM in Insurance Underwriting • Improving profit margin. • Gaining competitive edge • Risk evaluation process. • Lots of variables • Lots of interactions • Easy to follow procedure. • Decision tree can be used Guo

  22. DM in Insurance Underwriting - Auto Driver’s Claim Information Guo

  23. DM in Insurance Underwriting - Decision Tree Diagram Guo

  24. DM in Pricing/Rate Making • Data: Auto Driver’s Claim Information • Decision trees analysis to identify risk factors that predict profits, claims and losses • Logistic regression applied to model • Claim frequency • Effect of each risk factor Guo

  25. DM in Pricing/Rate Making Effect T-scores from the logistic regression Guo

  26. DM in Pricing/Rate Making - Assessment • Assessment • Cross-model comparisons of the expected to actual profits/losses • Independent of all other factors (sample size,..) • Lift charts • % claim-occurrence value to a random baseline model • Performance quality demonstrated by the degree the lift chart curve pushes upward and to the left Guo

  27. DM in Pricing/Rate Making- Lift Chart for Logistic Regression logistic Regression - Captured 30% of the drivers in the 10th percentile - Better predictive power from about the 20th to the 80th percentiles Guo

  28. DM in Risk Management • Reinsurance • To structure more effectively by segmentation • Hedging • Target retention and building loyalty Guo

  29. DM in Policy Level Analysis • Retention analysis • Profitability analysis • Policyholder’s behavior • DM methods used • Neural networks • Decision trees • Logistic regression Guo

  30. Applications – Variable Selection • Problem -- Given {Y,X} where • Find F, such that • Find and F*, such that • Improving model accuracy and efficiency • Making crucial business decisions Guo

  31. Case Study - Group Insurance • Identify ways to build upon the current manual rating structure utilizing exiting rating variables to develop a practical tool to guild underwriting in rates adjustments • Identify any new rating variables with significant predictive power • Currently gathered, but not utilized data • Transformations of existing variables • introduce new rating variables (e.g. external financial data) Guo

  32. Case Study – Group Insurance • Profit margin over x year period • 128 input variables • Principle Components Analysis applied • 42 variables remains • How to improve business profit? Guo

  33. Case Study - Goals • Developing a practical underwriting tool • Detecting deviations • Identifying key drivers • Improving model predictive power • Risk selection Guo

  34. Function Approximation • is the initial guess • Stegewise approximation • Each stage added by reducing errors • Each stage is weak linear – a small tree. • Sequential adjustment Guo

  35. Regression Tree Example Profit=6.5% +1.2% , if male young than 30 +0.8% , if AS > 421 -1.1% , otherwise -0.5% , otherwise Guo

  36. Function Approximation • GIVEN • Y: Output and X: Inputs or Predictors • L(Y, F): Loss Function • ESTIMATE Guo

  37. Classical Function Approximation • Solve from Guo

  38. Nonparametric Function Approximation • Compute • Initial guess • Take a step in the steepest descent direction Guo

  39. Gradient Boosting • Initial guess • FOR m = 1 TO M • Fit an L-node regression tree to the current residuals • For each given node, calculate node average residual • Update: • END Guo

  40. Case Study Guo

  41. Case Study Guo

  42. Case Study- Single Stats and Variable Importance Input Additive Multiplicative Importance Variable 1 0.2679 0.2690 100.00 Variable 2 0.2779 0.3203 75.23 Variable 3 0.1456 0.1771 54.65 Variable 4 0.2263 0.2469 47.41 Variable 5 0.1059 0.1425 42.81 Variable 6 0.2741 0.2847 34.81 Variable 7 0.1289 0.1306 34.27 Variable 8 0.0797 0.0864 25.35 Variable 9 0.1129 0.1148 23.37 Guo

  43. Case Study- Pair Stats and Variable Importance VariablesAdditive Multiplicative Variable 1 & Variable 20.3714 0.3847 Variable 2 & Variable30.3704 0.4066 Variable 2 & Variable 40.3686 0.4010 Variable 2 & Variable 7 0.3401 0.3856 Variable 3 & Variable 40.2795 0.3137 Variable 3 & Variable 6 0.2895 0.3082 Variable 4 & Variable 70.2417 0.2592 Variable 5 & Variable 6 0.2622 0.2766 Variable 6 & Variable 70.2904 0.3066 Guo

  44. Predictive Modeling • Predicts deviations from expected profitability (used 9 variables) • Practical guide for underwriters to use for rates adjustments • New variables Identified to have strong predictive power • Improve business profit (20% Profit margin) Guo

  45. Importance of Multiple Techniques • Robust model with high predictive accuracy • Practical constrains • Algorithm complexity • Ease of understanding of results Guo

  46. Is Data Mining for you? • Defining the goals • Understanding your data • Using multiple techniques • Improving your decision making process • Gaining competitive edges! Thank you! Guo

More Related