Advanced Data Mining Techniques for Business Applications
E N D
Presentation Transcript
Data Mining Research David L. Olson University of Nebraska
Data Mining Research • Business Applications • Credit scoring • Customer classification • Fraud detection • Human resource management • Algorithms • Database related • Data warehouse products claim internal data mining • Text mining • Data Mining Process
Personal (with others) • Business Applications • Introduction to Business Data Mining with Yong Shi [2006] • Qing Cao - RFM • Algorithms • Advanced Data Mining Techniques with DursunDelen [2008] • Moshkovich & Mechitov – Ordinal scales in trees • Data set balancing • Database related • encyclopedia • Text mining • Web log ethics • Data Mining Process • Ton Stam, DursunDelen
RFMwith Qing Cao, ChingGu, Donhee Lee • Recency • Time since customer made last purchase • Frequency • Number of purchases this customer made over time frame • Monetary • Average purchase amount (or total)
Variants • F & M highly correlated • Bult & Wansbeek [1995] Journal of Marketing Science • Value = M/R • Yang (2004) Journal of Targeting, Measurement and Analysis for Marketing
Limitations • Other attributes may be important • Product variation • Customer age • Customer income • Customer lifestyle • Still, RFM widely used • Works well if response rate is high
Data • Meat retailer in Nebraska • 64,180 purchase orders (mail) • 10,000 individual customers • Oct 11, 1998 to Oct 3, 2003 • ORDER DATA • ORDER AMOUNT • PRESENCE OF PROMOTION
Data • Nebraska food products firm • 64,180 individual purchase orders (by mail) • 10,000 individual customers • 11 Oct 1998 to 3 Oct 2003 • Data: • Order date • Order amount (price) • Whether or not promotion involved
Treatment • Used 5,000 observations to build model • To the end of 2002 • Used another 5,000 for testing • 2003
Correlations* - 0.01 significance; ** - 0.05 significance; *** - 0.001 significance
BALANCE CELLS • Adjusted boundaries of 5 x 5 x 5 matrix • Can’t get all to equal average of 8 • Lumpy (due to ties) • Ranged from 4 to 11
Alternatives • LIFT • Sort groups by best response • Apply your marketing budget to the most profitable (until you run out of budget) • LIFT is the gain obtained above par (random) • VALUE FUNCTION • (Yang, 2004) • Throw out F (correlated with M) • Use ratio of M/R • Logistic Regression • Decision Tree • Neural Network
Models • Regression: -0.4775 + 0.00853 R + 0.1675 F + 0.00213 M Test data: Correct 0.8230 • Decision Tree IF R ≤ 82 AND R≤ 32 YES (1567 right, 198 wrong) ELSE R> 32 AND F ≤ 3 AND M≤ 296 NO (285 right, 91 wrong) ELSE M > 296 YES (28 right, 9 wrong) ELSE F > 3 YES (729 right, 110 wrong) ELSE R > 82 YES (2391 right, 3 wrong) Test data: Correct 0.8678 • Neural Network Test data: Correct 0.8674