210 likes | 309 Vues
Decision Tree. Developed by: Dr Eddie Ip Modified by: Dr Arif Ansari. Outline. Example Concept. Decision Tree: Example 1. Life insurance: whether preferred rate should be given? ID low risk group & give preferred rate Criteria: smoking ? overweight ?. Decision Tree: Example 2.
E N D
Decision Tree Developed by: Dr Eddie Ip Modified by: Dr Arif Ansari
Outline • Example • Concept
Decision Tree: Example 1 • Life insurance: whether preferred rate should be given? • ID low risk group & give preferred rate • Criteria: smoking ? overweight ?
Decision Tree: Example 2 • Database of loan applications • Variable of interest = Loan approved / not approved (binary) • Predictors = age, gender, income group, own a house, …. • Similar application in Direct mail: To whom should I send mail ?
Decision Tree: Concept • Classification of customers in DB known • Use historical data (“learn”) to guide your future decisions (classify)
Steps • Build a model by “learning” from past data (learning/training set) • Tune model by using data not seen by model (testing set) • Evaluate accuracy of decision tree model by yet another new data set (evaluation set) • Use tree to classify new customers
Decision Tree: Concept • Pattern recognition tool • Used in • recognizing hand writing • recognizing chemicals • recognizing ships at sea
Decision Tree: Example • Vermont Country Store – student presentation
Decision Tree: terminology • Variable of interest: response/ target (Y) • other variables : predictors (X) • loan example: X & Y • training set = records from DB
Decision Tree: terminology • node (root & leaf) • child (left & right)
Decision Tree: Concept • tree creates a set of bins into which records are tossed • start with root node (all records) • get best split so as to produce 2 homogeneous groups
Decision Tree: Concept • Go down till a tree is formed • Stopping criteria: statistical test or grow-full-tree & prune
Decision Tree: Technical issues • Example: loan application • measure of homogeneity/ diversity • e.g. Gini index, p(1-p)
Decision Tree: concept • Grow tree = continue splitting • till no further split reduces diversity • Two philosophies: stopping rule or grow full tree & prune • Testing set may be required to stop growing or prune tree • In final decision tree, each terminal node is given a class
Decision Tree: concept • NOTE: each terminal leaf node is not pure • Misclassification (error) rate =% incorrectly classified by tree
Decision Tree: concept • Misclassification rate (sample) • Bigger tree ==> lower misclassifcation rate on training set • Big tree => overfit= “getting too close to data” • misclassification rate on training set over optimistic • More objective: evaluation set
Decison Tree: products • CART = classification & regression tree(statistics) • C4.5/ C5.0 (machine-learning) • CHAID (statistics)
Decision Tree : summary • method for classification & prediction • iterative splitting • training/testing to obtain an optimal tree • concept of overfit