Data Mining Quantitative Values
E N D
Presentation Transcript
Data Mining Quantitative Values By Noah Clemons Andrew Seidel
Associational Rule Mining • Data in market basket format: Each “Basket” is list of items (integers) present. • Returns rules based on items. • Rules useful to discover trends.
Problem • Data not in market basket format. • How do we fit data to necessary format? • Convert the data.
Approaches To Conversion • Static Approach: • Convert data before using associational mining tool. • Good if doing a lot of runs on one dataset with one conversion table. • Speed depends on tools used to convert.
Approaches To Conversion • Dynamic Approach • Convert data as it is used by associational mining tool. • Can be much faster than Static. • Good for changing datasets or conversion tables.
Static vs. Dynamic • Speed of 16 Static Runs: • 769.05 Seconds • Speed of 16 Dynamic Runs: • 27.53 Seconds • Static 27.9 times slower.
Rules • Run with 20 Buckets, .1% Support 80% Confidence • 646 Rules • Sample Rules: • AB_551_558 RBI_116_147 ==> HR_37_51 (0.866667, 13) • BB_35_37 H_193_226 ==> AB_637_689 (0.846154, 11) • IBB_18_31 SO_136_180 R_112_137 ==> RBI_116_147 (0.833333, 5) • GIDP_5 AB_543_550 ==> 3B_2 (0.833333, 5)
Rules • Run with 80 Buckets, .1% Support 80% Confidence • 60 Rules • Sample Rules: • H_112_114 2B_22 ==> 3B_2 (0.833333, 5) • AB_465_469 SH_3 ==> 3B_4 (0.833333, 5) • SB_25 HBP_4 ==> GIDP_8 (0.833333, 5) • H_200_205 IBB_4 ==> CS_4 (1, 5) • BB_57 SB_1 ==> CS_1 (0.833333, 5)
Problems Encountered • Hard to pick good values for support, confidence, conversion table. • Many values related, lead to large rules. • At Bats, Games, Etc.
Future Work • Use correlated mining to find items. • Create tool to find optimum values for support, confidence, and conversion table.