Treatment Learning: Implementation and Application

Treatment Learning:Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia

Outline • An example • Background Review • TAR2 Treatment Learner • TARZAN: Tim Menzies • TAR2: Ying Hu & Tim Menzies • TAR3: improved tar2 • TAR3: Ying Hu • Evaluation of treatment learning • Application of Treatment Learning • Conclusion Ying Hu http://www.ece.ubc.ca/~yingh 2

low high • C4.5’s decision tree: • Treatment learner: 6.7 <= rooms < 9.8 and 12.6 <= parent teacher ratio < 15.9 0.6 <= nitric oxide < 1.9 and 17.16 <= living standard < 39 First Impression • Boston Housing Dataset (506 examples, 4 classes) Ying Hu http://www.ece.ubc.ca/~yingh 3

Review: Background • What is KDD ? • KDD = Knowledge Discovery in Database [fayyad96] • Data mining: one step in KDD process • Machine learning: learning algorithms • Common data mining tasks • Classification • Decision tree induction (C4.5) [quinlan86] • Nearest neighbors [cover67] • Neural networks [rosenblatt62] • Naive Baye’s classifier [duda73] • Association rule mining • APRIORI algorithm [agrawal93] • Variants of APRIORI Ying Hu http://www.ece.ubc.ca/~yingh 4

Input: classified dataset Assume: classes are ordered Output: Rx=conjunction of attribute-value pairs Size of Rx = # of pairs in the Rx confidence(Rx w.r.t Class) = P(Class|Rx) Goal: to find Rx that have different level of confidence across classes Evaluate Rx: lift Visualization form of output Treatment Learning: Definition Ying Hu http://www.ece.ubc.ca/~yingh 5

Motivation: Narrow Funnel Effect • When is enough learning enough? • Attributes: < 50%, accuracy: decrease 3-5% [shavlik91] • 1-level decision tree is comparable to C4 [Holte93] • Data engineering: ignoring 81% features result in 2% increase of accuracy [kohavi97] • Scheduling: random sampling outperforms complete search (depth-first) [crawford94] • Narrow funnel effect • Control variables vs. derived variables • Treatment learning: finding funnel variables Ying Hu http://www.ece.ubc.ca/~yingh 6

TAR2: The Algorithm • Search + attribute utility estimation • Estimation heuristic: Confidence1 • Search: depth-first search • Search space: confidence1 > threshold • Discretization: equal width interval binning • Reporting Rx • Lift(Rx) > threshold • Software package and online distribution Ying Hu http://www.ece.ubc.ca/~yingh 7

The Pilot Case Study • Requirement optimization • Goal: optimal set of mitigations in a cost effective manner Risks Cost relates Requirements incur reduce achieve Mitigations Benefit • Iterative learning cycle Ying Hu http://www.ece.ubc.ca/~yingh 8

Compared to Simulated Annealing The Pilot Study (continue) • Cost-benefit distribution (30/99 mitigations) Ying Hu http://www.ece.ubc.ca/~yingh 9

Problem of TAR2 • Runtime vs. Rx size • To generate Rx of size r: • To generate Rx from size [1..N] Ying Hu http://www.ece.ubc.ca/~yingh 10

TAR3: the improvement • Random sampling • Key idea: • Confidence1 distribution = probability distribution • sample Rx from confidence1 distribution • Steps: • Place item (ai) in increasing order according to confidence1 value • Compute CDF of each ai • Sample a uniform value u in [0..1] • The sample is the least ai whose CDF>u • Repeat till we get a Rx of given size Ying Hu http://www.ece.ubc.ca/~yingh 11

Runtime vs. Data size • Runtime vs. TAR2 • Runtime vs. Rx size Comparison of Efficiency Ying Hu http://www.ece.ubc.ca/~yingh 12

pilot2 dataset (58 * 30k ) • Mean and STD in each round Comparison of Results • 10 UCI domains, identical best Rx • Final Rx: TAR2=19, TAR3=20 Ying Hu http://www.ece.ubc.ca/~yingh 13

learning Compare Accuracy some attributes learning External Evaluation C4.5 Naive Bayes • FSS framework All attributes (10 UCI datasets) Feature subset selector TAR2less Ying Hu http://www.ece.ubc.ca/~yingh 14

The Results • Number of attributes • Accuracy using Naïve Bayes • Accuracy using C4.5 (avg decrease 0.9%) (Avg increase = 0.8% ) Ying Hu http://www.ece.ubc.ca/~yingh 15

Compare to other FSS methods • # of attribute selected (Naive Bayes) • # of attribute selected (C4.5 ) • 17/20, fewest attributes selected • Another evidence for funnels Ying Hu http://www.ece.ubc.ca/~yingh 16

Applications of Treatment Learning • Downloading site: http://www.ece.ubc.ca/~yingh/ • Collaborators: JPL, WV, Portland, Miami • Application examples • pair programming vs. conventional programming • identify software matrix that are superior error indicators • identify attributes that make FSMs easy to test • find the best software inspection policy for a particular software development organization • Other applications: • 1 journal, 4 conference, 6 workshop papers Ying Hu http://www.ece.ubc.ca/~yingh 17

Main Contributions • New learning approach • A novel mining algorithm • Algorithm optimization • Complete package and online distribution • Narrow funnel effect • Treatment learner as FSS • Application on various research domains Ying Hu http://www.ece.ubc.ca/~yingh 18

====================== • Some notes follow Ying Hu http://www.ece.ubc.ca/~yingh 19

Input example classified dataset Output example: Rx=conjunction of attribute-value pairs confidence(Rx w.r.t C) = P(C|Rx) Rx Definition example Ying Hu http://www.ece.ubc.ca/~yingh 20

TAR2 in practice • Domains containing narrow funnels • A tail in the confidence1 distribution • A small number of variables that have disproportionally large confidence1 value • Satisfactory Rx of small size (<6) Ying Hu http://www.ece.ubc.ca/~yingh 21

Background: Classification • 2-step procedure • The learning phase • The testing phase • Strategies employed • Eager learning • Decision tree induction (e.g. C4.5) • Neural Networks (e.g. Backpropagation) • Lazy learning • Nearest neighbor classifiers (e.g. K-nearest neighbor classifier) Ying Hu http://www.ece.ubc.ca/~yingh 22

Possible Rule: B => C,E [support=2%, confidence= 80%] Where support(X->Y) = P(X) confidence(X->Y) = P(Y|X) Representative algorithms APRIORI Apriori property of large itemset Max-Miner More concise representation of the discovered rules Different prune strategies. Background: Association Rule Ying Hu http://www.ece.ubc.ca/~yingh 23

Background: Extension • CBA classifier • CBA = Classification Based on Association • X=>Y, Y = class label • More accurate than C4.5 (16/26) • JEP classifier • JEP = Jumping Emerging Patterns • Support(X w.r.t D1) = 0, Support(X w.r.t D2) > 0 • Model: collection of JEPs • Classify: maximum collective impact • More accurate than both C4.5 & CBA (15/25) Ying Hu http://www.ece.ubc.ca/~yingh 24

Background: Standard FSS Method • Information Gain attribute ranking • Relief • Principle Component Analysis (PCA) • Correlation based feature selection • Consistency based subset evaluation • Wrapper subset evaluation Ying Hu http://www.ece.ubc.ca/~yingh 25

Comparison • Relation to classification • Class boundary / class density • Class weighting • Relation to association rule mining • Multiple classes / no class • Confidence-based pruning • Relation to change detecting algorithm • support: |P(X|y=c1)-P(X|y=c2)| • confidence: |P(y=c1|X)-P(y=c2|X)| • Baye’s rule Ying Hu http://www.ece.ubc.ca/~yingh 26

Confidence Property • Universal-extential upward closure R1: Age.young -> Salary.low R2: Age.young, Gender.m -> Salary.low R2: Age.young, Gender.f -> Salary.low • Long rule tend to have high confidence • Large Rx tend to have high lift value Ying Hu http://www.ece.ubc.ca/~yingh 27

TAR3: Usability • Usability: more user-friendly • Intuitive, default setting Ying Hu http://www.ece.ubc.ca/~yingh 28

Treatment Learning: Implementation and Application