250 likes | 584 Vues
Data Mining For Credit Card Fraud : A Comparative Study. Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation. Overview. Credit Card Fraud Data Mining Techniques Data Experimental Setup Results. Credit Card Fraud. Two Types: Application Fraud
E N D
Data Mining For Credit Card Fraud: A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation
Overview • Credit Card Fraud • Data Mining Techniques • Data • Experimental Setup • Results Graduate Presentation | DSCI 5240 | Xxxxxxx
Credit Card Fraud • Two Types: • Application Fraud • Obtain new cards using false information • Behavioral Fraud • Mail theft • Stolen/lost card • Counterfeit card Graduate Presentation | DSCI 5240 | Xxxxxxx
Credit Card Fraud • Online Revenue loss due to Fraud (cybersource.com) Graduate Presentation | DSCI 5240 | Xxxxxxx
Data Mining Techniques • Logistic Regression • Used to predict outcome of categorical dependent variable • Fraud variable is binary • Support Vector Machines • Random Forest Graduate Presentation | DSCI 5240 | Xxxxxxx
Support Vector Machines (SVM) • Supervised learning models with associated learning algorithms that analyze and recognize patterns • Linear classifiers that work in high dimensional feature space that is non-linear mapping of input space • Two properties of SVM • Kernel representation • Margin optimization Graduate Presentation | DSCI 5240 | Xxxxxxx
Random Forest (RF) • Ensemble of classification trees • Performs well when individual members are dissimilar Graduate Presentation | DSCI 5240 | Xxxxxxx
Data: Datasets • 13 Months of data (Jan 2006 – Jan 2007) • 50 Million credit card transactions on 1 Million credit cards • 2420 known fraudulent transactions with 506 credit cards Graduate Presentation | DSCI 5240 | Xxxxxxx
Percentage of Transaction by transaction type Graduate Presentation | DSCI 5240 | Xxxxxxx
Data Selection Graduate Presentation | DSCI 5240 | Xxxxxxx
Primary attributes in Dataset Graduate Presentation | DSCI 5240 | Xxxxxxx
Derived Attributes Graduate Presentation | DSCI 5240 | Xxxxxxx
Experimental Setup • For SVM, Gaussian radial basis function was used as the kernel function • For Random Forest, number of attributes considered at the node and number of trees was set. • Data were sampled at different rates using random under sampling of majority class Graduate Presentation | DSCI 5240 | Xxxxxxx
Training and testing data Graduate Presentation | DSCI 5240 | Xxxxxxx
Results Graduate Presentation | DSCI 5240 | Xxxxxxx
Proportion of fraud captured at different depths Graduate Presentation | DSCI 5240 | Xxxxxxx
Fraud Capture Rate w/ Different Fraud Rates in Training Data Graduate Presentation | DSCI 5240 | Xxxxxxx
Conclusion • Examine the performance of two data mining techniques • SVM and RF together with logistic regression • Used real life data set from Jan 2006 – Jan 2007 • Used data undersampling approach to sample data • Random forest showed much higher performance at upper file depths • SVM performance at the upper file depths tended to increase with lower proportion of fraud in the training data • Random forest demonstrated overall better performance Graduate Presentation | DSCI 5240 | Xxxxxxx
Questions Graduate Presentation | DSCI 5240 | Xxxxxxx