140 likes | 264 Vues
This tutorial introduces the Decision Tree Classification algorithm, a fundamental method for predictive modeling. Decision trees are renowned for their simplicity and interpretability, presented through a series of decision rules. The tutorial explores the process of model creation by calculating entropy and gain based on training data attributes, leading to predictions for target variables. It highlights the significance of class label fields in categorizing data. Additionally, a practical example using hair length, weight, and age demonstrates the steps involved in building and interpreting a decision tree.
E N D
classification • A Tree Classification algorithm is used to compute a decision tree. Decision trees are easy to understand and modify, and the model developed can be expressed as a set of decision rules.
classification • By classifying larger data sets, you will be able to improve the accuracy of the Classification model. In Classification, the given situation is a set of example records, called a training set, where each record consists of several fields or attributes. Attributes are either numerical (coming from an ordered domain), or categorical (coming from an unordered domain). One of the attributes, called the class label field (target field), indicates the class to which each example belongs.
classification • A Decision Tree model contains rules to predict the target variable. • The Tree Classification algorithm (ID3).
ID3 Algorithm • First: Calculate Entropy (s) for all data: • Second: Try all attribute and calculate Gain for each one. • Third: Build a tree starting division with maximum Gain.
Hair length Weight Age
9 Persons Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 no yes Hair Length <4? 3 Males 4 Females, 2Males Let us try splitting on Hair length Entropy(4F,2M) = -(4/6)log2(4/6) - (2/6)log2(2/6) = 0.92 Entropy(0F,3M) = -(0/3)log2(0/3) - (3/3)log2(3/3) = 0 Gain(Hair Length < 4) = 0.9911 – (3/9 * 0+ 6/9 * 0.92) = 0.3789
9 Persons Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 no yes Weight < 170? 4 Females, 1 Male 4 Males Let us try splitting on Weight Entropy(0F,4M) = -(0/4)log2(0/4) - (4/4)log2(4/4) = 0 Entropy(4F,1M) = -(4/5)log2(4/5) - (1/5)log2(1/5) = 0.7219 Gain(Weight < 170) = 0.9911 – (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900
9 Persons Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 no yes age <= 40? 3 Females, 3 Males 1 Female, 2 Males Let us try splitting on Age Entropy(1F,2M) = -(1/3)log2(1/3) - (2/3)log2(2/3) = 0.9183 Entropy(3F,3M) = -(3/6)log2(3/6) - (3/6)log2(3/6) = 1 Gain(Age <= 40) = 0.9911 – (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183
Decision Tree: 9 Persons no yes Weight < 170? 1 Male 4 Females 4 Males no yes Hair Length < 4? 1 Male 4 Females
Weight < 170? Convert Decision Trees to rules… yes no Hair Length < 4? Male no yes Male Female Rules to Classify Males/Females IfWeightgreater thanorequal 170, classify as Male ElseifHair Lengthless than 4, classify as Male Else classify as Female
Try weka Program • Insert same data (in file test.csv) in example to weka and show the same tree.
References: • Quinlan, J.R. 1986, Machine Learning, 1, 81 • http://dms.irb.hr/tutorial/tut_dtrees.php • http://www.dcs.napier.ac.uk/~peter/vldb/dm/node11.html • http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/4_dtrees2.html • Professor Sin-Min Lee, SJSU. http://cs.sjsu.edu/~lee/cs157b/cs157b.html