Create Presentation
Download Presentation

Download Presentation
## Data mining in Health Insurance

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Introduction**• Rob Konijn, rob.konijn@achmea.nl • VU University Amsterdam • Leiden Institute of Advanced Computer Science (LIACS) • Achmea Health Insurance • Currently working here • Delivering leads for other departments to follow up • Fraud, abuse • Research topic keywords: data mining/ unsupervised learning / fraud detection**Outline**• Intro Application • Health Insurance • Fraud detection • Part 1: Subgroup discovery • Part 2: Anomaly detection (slides partly by Z. Slavik, VU)**Intro Application**• Health Insurance Data • Health Insurance in NL • Obligatory • Only private insurance companies • About 100 euro/month(everyone)+170 euro (income) • Premium increase of 5-12% each year Achmea: about 6 million customers**Funding of Health Insurance Costs in the Netherlands**vereveningsfonds vereveningsfonds vereveningsfonds vereveningsfonds vereveningsfonds vereveningsfonds vereveningsfonds vereveningsfonds rijksbijdrage verzekerden 18- 2 mld vereveningsbijdrage inkomensafh. bijdrage werkgevers 17 mld 18 mld zorgverzekeraar verzekerde zorgverzekeraar nominale premie 18+: - rekenpremie (~€ 947/vrz): 12 mld - opslag (~€ 150/vrz) : 2 mld 30 mld zorguitgaven**Verevenings-model**Mannen Vrouwen 0 - 4 jr 1,400 1,210 • By population characteristics • Age • Gender • Income, social class • Type of work • Calculation afterwards • High costs compensation (>15.000 euro) 5 - 9 jr 1,026 936 10 - 14 jr 907 918 15 - 17 jr 964 1,062 18 - 24 jr 892 1,214 25 - 29 jr 870 1,768 30 - 34 jr 905 1,876 35 - 39 jr 980 1,476 40 - 44 jr 1,044 1,232 45 - 49 jr 1,183 1,366 50 - 54 jr 1,354 1,532 55 - 59 jr 1,639 1,713 60 - 64 jr 1,885 1,905 65 - 69 jr 2,394 2,201 70 - 74 jr 2,826 2,560 75 - 79 jr 3,244 2,886 80 - 84 jr 3,349 3,018 85 - 89 jr 3,424 3,034 90 jr e.o. 3,464 3,014**Introduction Application:The Data**• Transactional data • Records of an event • Visit to a medical practitioner • Charged directly by medical practioner • Patient is not involved • Risk of fraud**Transactional Data**• Transactions: Facts • Achmea: About 200 mln transactions per year • Info of customers and practitioners: dimensions**Different levels of hierarchy**• Records represent events • However, for example for fraud detection, we are interested in customers, or medical practitoners • See examples next pages • Groups of records: Subgroup Discovery • Individual patients/practioners: outlier detection**Different types of fraud hierarchy**• On a patient level, or on a hospital level:**Handling different hierarchy**• Creating profiles from transactional data • Aggregating costs over a time period • Each record: patient • Each attribute i =1 to n: cost spent on treatment i • Feature construction, for example • The ratio of long/short consults (G.P.) • The ratio of 3-way and 2 way fillings (Dentist) • Usually used for one-way analysis**Different types of fraud detection**• Supervised • A labeled fraud set • A labeled non-fraud set • Credit cards, debit cards • Unsupervised • No labels • Health Insurance, Cargo, telecom, tax etc.**Unsupervised learning in Health Insurance Data**• Anomaly Detection (outlier detection) • Finding individual deviating points • Subgroup Discovery • Finding (descriptions of) deviating groups • Focus on differences and uncommon behavior • In contrast to other unsupervised learning methods • Clustering • Frequent Pattern mining**Subgroup Discovery**• Goal: Find differences in claim behavior of medical practitioners • To detect inefficient claim behavior • Actions: • A visit from the account manager • To include in contract negotiations • In the extreme case: fraud • Investigation by the fraud detection department • By describing deviations of a practitioner from its peers • Subgroups**Patient-level, Subgroup Discovery**• Subgroup (orange): group of patients • Target (red) • Indicates whether a patient visited a practitioner (1), or not (0)**Subgroup Discovery: Quality Measures**• Target Dentist: 1672 patiënten • Compare with peer group, 100.000 patients in total • Subgroup V11 > 42 euro : 10347 patients • V11: one sided filling • Crosstable**The cross table**• Cross table in data • Cross table expected: • Assuming independence**Calculating Wracc and Lift**• Size subgroup = P(S) = 0.10347, size target dentist = P(T) = 0.01672 • Weighted Relative ACCuracy (WRAcc) = P(ST) – P(S)P(T) = (871 – 173)/100000 = 689/100000 • Lift = P(ST)/P(S)P(T) = 871/173 = 5.03**Making SD more useful: adding prior knowledge**• Adding prior knowledge • Background variables patient (age, gender, etc.) • Specialism practitioner • For dentistry: choice of insurance • Adding already known differences • Already detected by domain experts themselves • Already detected during a previous data mining run**The idea: create an expected cross table using prior**knowledge**Quality Measures**• Ratio (Lift) • Difference (WRAcc) • Squared sum (Chi-square statistic)**Example, iterative approach**• Idea: add subgroup to prior knowledge iteratively • Target = single pharmacy • Patients that visited the hospital in last 3 years removed from data • Compare with peer group (400,000 patients), 2929 patiënts of target pharmacy • Top subgroup : “B03XA01 (Erythropoietin)>0 euro” 1 ‘target’ pharmacy rest subgroup B03XA01 > 0 rest**Next iteration**• Add “B03XA01 (EPO) >0 euro” to prior knowledge • Next best subgroup: “N05AX08 (Risperdal)>= 500 euro”**Figure describing subgroup:N05AX08 > 500**Left: target pharmacy, right: other pharmacies**Addition: adding costs to quality measure**• M55: dental cleaning • V11: 1-way filling • V21: polishing • Cost of treatments in subgroup 370 euro (average) • 791 more patients than expected • Total quality 791*370 = 292,469 euro**Iterative approach, top 3 subgroups**• V12: 2-sided filling • V21: polishing • V60: indirect pulpa covering • V21 and V60 are not allowed on the same day • Claim back (from all dentists): 1.3 million euro**Other target types: double binary target**• Target 1: year: 2009 or 2008 • Target 2: target practitioner • Pattern: • M59: extensive (expensive) dental cleaning • C12: second consult in one year • Crosstable:**Other target types: Multiclass target**• Subgroup (orange): group of patients • Target (red), now is a multi-value column, one value per dentist**Anemaly Detection**The exampleabovecontains a contextualanomaly...**Outline Anomaly Detection**• Anomalies • Definition • Types • Technique categories • Examples • Lecture based on • Chandola et al. (2009). Anomaly Detection: A Survey • Paper in BB 38**Definition**• “Anomaly detection refers to the problem of finding patternsin data that do not conform to expected behavior” • Anomalies, aka. • Outliers • Discordant observations • Exceptions • Aberrations • Surprises • Peculiarities • Contaminants**Anomaly types**Point anomalies • A data point is anomalous with respect to the rest of the data**Not covered today**• Other types of anomalies: • Collective anomalies • Contextual anomalies • Other detection approaches: • Supervised learning • Semi supervised • Assume training data is from normal class • Use to detect anomalies in the future**We focus on outlier scores**• Scores • You get a ranked list of anomalies • “We investigate the top 10” • “An anomaly has a score of at least 134” • Leads followed by fraud investigators • Labels ANOMALY**Detectionmethodcategorisation**• Model based • Depth based • Distance Based • Information theory related (not covered) • Spectral theory related (not covered)**Model based**• Build a (statistical) model of the data • Data instances occur in high probability regions of a stochastic model, while anomalies occur in low probability regions • Or: data instances have a high distance to the model are outliers • Or: data instances have a high influence on the model are outliers**Example: one way outlier detection**• Pharmacy records • Records represent patients • One attribute at a time: • This example: attribute describing the costs spent on fertility medication (gonodatropin) in a year • We could use such one way detection for each attribute in the data**Example, model = non-parametric distribution**• Left: kernel density estimate • Right: boxplot**Other models possible**• Probabilistic • Bayesian networks • Regression models • Regression trees/ random forests • Neural networks • Outlier score = prediction error (residual)**Depth based methods**• Applied on 1-4 dimensional datasets • Or 1-4 attributes at a time • Objects that have a high distance to the “center of the data” are considered outliers • Example Pharmacy: • Records represent patients • 2 attributes: • Costs spent on diabetes medication • Costs spent on diabetes testing material