1 / 40

Decision Tree

The decision tree is the classification algorithm in ML(machine learning). A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.<br><br>To understand the algorithm of the decision tree we need to know about the classification.<br><br>https://www.learnbay.co/data-science-course/blog-post/decision-tree/

Télécharger la présentation

Decision Tree

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DecisionTree Data Science Program Bangalore

  2. Introduction 2 Givenanevent,predictiscategory. Examples: Whowonagivenballgame? Howshouldwefileagivenemail? What word sense was intended for a given occurrence of aword?  Event = list of features. Examples: Ballgame:Whichplayerswereon offense? Email: Who sent theemail? Disambiguation: What was the preceding word?

  3. Introduction 3  Use adecisiontreetopredictcategoriesfornewevents.  Usetrainingdatatobuildthedecisiontree. New Events Training Events and Categories Decision Tree Category

  4. Decision Tree forPlayTennis 4 Outlook Sunny Overcast Rain Humidity Each internal node tests anattribute High Normal Each branch corresponds toan attribute valuenode Each leaf node assigns aclassification No Yes

  5. Word SenseDisambiguation 5  Givenanoccurrenceofaword,decidewhichsense,ormeaning,was intended.  Example:"run"  run1: move swiftly (I ran to thestore.)  run2: operate (I run astore.)  run3: flow (Water runs from thespring.) run4:lengthoftornstitches(Herstockingshadarun.) etc.

  6. Word SenseDisambiguation 6 Categories  Use word sense labels (run1, run2, etc.) to name the possiblecategories. Features Featuresdescribethecontextofthewordwewantto disambiguate. Possiblefeaturesinclude: near(w):isthegivenwordnearanoccurrenceofwordw? pos:theword’spartofspeech left(w):isthewordimmediatelyprecededbythewordw? etc.

  7. Word SenseDisambiguation 7 Example decision tree: pos nounverb near(race) yes no run1 near(river) yes no near(stocking) yes no run4 run3 (Note:DecisiontreesforWSDtendtobequitelarge)

  8. WSD: Sample TrainingData 8

  9. Decision Tree forConjunction 9 Outlook=Sunny Wind=Weak Outlook Sunny Overcast Rain Wind No No Strong Weak No Yes

  10. Decision Tree forDisjunction 10 Outlook=Sunny Wind=Weak Outlook Sunny Overcast Rain Yes Wind Wind Strong Weak Strong Weak No Yes No Yes

  11. Decision Tree forXOR 11 Outlook=Sunny XORWind=Weak Outlook Sunny Overcast Rain Wind Wind Wind Strong Weak Strong Weak Strong Weak Yes No No Yes No Yes

  12. DecisionTree 12 • decision trees represent disjunctions ofconjunctions Outlook Sunny Overcast Rain Humidity Yes Wind Strong Weak High Normal No Yes No Yes (Outlook=Sunny Humidity=Normal) (Outlook=Overcast) (Outlook=Rain Wind=Weak)

  13. When toconsider Decision Trees 13 Instancesdescribablebyattribute-valuepairs Targetfunctionisdiscretevalued Disjunctivehypothesismayberequired Possibly noisy training data Missing attribute values Examples: Medicaldiagnosis Credit riskanalysis Object classification for robot manipulator (Tan1993)

  14. Top-Down Induction ofDecision TreesID3 14 A the “best” decision attribute for nextnode Assign A as decision attribute fornode For each value of A create newdescendant Sort training examples to leaf node according to the attribute value of thebranch If all training examples are perfectly classified (same value of target attribute) stop, else iterate over new leafnodes.

  15. Which attribute isbest? 15 A1=? [29+,35-] A2=? [29+,35-] G H L M [18+,33-] [21+,5-] [8+,30-] [11+,2-]

  16. Entropy 16 Sisasampleoftrainingexamples p+istheproportionofpositiveexamples  p- is the proportion of negative examples EntropymeasurestheimpurityofS Entropy(S) = -p+ log2 p+ - p- log2p-

  17. Entropy 17  Entropy(S)= expected number of bits needed to encodeclass (+ or -) of randomly drawn members of S (under the optimal, shortestlength-code) Why? Informationtheoryoptimallengthcodeassign –log2pbitstomessageshavingprobabilityp.  Sotheexpectednumberofbitstoencode (+ or -) of random member ofS: -p+ log2 p+ - p- log2p-

  18. Information Gain(S=E) 18 Gain(S,A):expectedreductioninentropydueto sorting S on attributeA Entropy([29+,35-]) = -29/64 log2 29/64 – 35/64 log235/64 = 0.99 [29+,35-] A1=? A2=? [29+,35-] G H True False [21+,5-] [8+,30-] [18+,33-] [11+,2-]

  19. InformationGain 19 Entropy([21+,5-]) =0.71 Entropy([8+,30-]) =0.74 Gain(S,A1)=Entropy(S) -26/64*Entropy([21+,5-]) -38/64*Entropy([8+,30-]) =0.27 [29+,35-] A1=? Entropy([18+,33-]) =0.94 Entropy([11+,2-]) =0.62 Gain(S,A2)=Entropy(S) -51/64*Entropy([18+,33-]) -13/64*Entropy([11+,2-]) =0.12 A2=? [29+,35-] True False True False [18+,33-] [11+,2-] [21+,5-] [8+,30-]

  20. TrainingExamples 20

  21. Selecting the NextAttribute 21 S=[9+,5-] E=0.940 S=[9+,5-] E=0.940 Humidity Wind High Normal Weak Strong [3+,4-] [6+,1-] [6+,2-] [3+,3-] E=0.811 E=1.0 Gain(S,Wind) =0.940-(8/14)*0.811 – (6/14)*1.0 =0.048 E=0.985 E=0.592 Gain(S,Humidity) =0.940-(7/14)*0.985 – (7/14)*0.592 =0.151 Humidity provides greater info. gain than Wind, w.r.t targetclassification.

  22. Selecting the NextAttribute 22 S=[9+,5-] E=0.940 Outlook Over cast Rain Sunny [3+,2-] [2+,3-] [4+,0] E=0.971 E=0.0 E=0.971 Gain(S,Outlook) =0.940-(5/14)*0.971 -(4/14)*0.0 –(5/14)*0.0971 =0.247

  23. Selecting the NextAttribute 23 • The information gain values for the 4 attributes are: • Gain(S,Outlook)=0.247 • Gain(S,Humidity)=0.151 • Gain(S,Wind)=0.048 • Gain(S,Temperature)=0.029 • where S denotes the collection of training examples

  24. ID3Algorithm 24 [D1,D2,…,D14] Outlook [9+,5-] Sunny Overcast Rain Ssunny =[D1,D2,D8,D9,D11] [D3,D7,D12,D13] [D4,D5,D6,D10,D14] [2+,3-] [4+,0-] [3+,2-] Yes ? ? Gain(Ssunny, Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970 Gain(Ssunny, Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570 Gain(Ssunny, Wind)=0.970= -(2/5)1.0 – 3/5(0.918) =0.019

  25. ID3Algorithm 25 Outlook Sunny Overcast Rain Humidity Yes Wind [D3,D7,D12,D13] Strong Weak High Normal No Yes No Yes [D8,D9,D11] [mistake] [D6,D14] [D4,D5,D10] [D1,D2]

  26. Occam’sRazor 26 ”Iftwotheoriesexplainthefactsequallyweel,thenthe simpler theory is to bepreferred” Arguments infavor: Fewer short hypotheses than long hypotheses Ashorthypothesisthatfitsthedataisunlikelytobe acoincidence Alonghypothesisthatfitsthedatamightbea coincidence Argumentsopposed: Therearemanywaystodefinesmallsetsof hypotheses

  27. Overfitting 27  One of the biggest problems with decision trees isOverfitting

  28. AvoidOverfitting 28  Stopgrowingwhensplitnotstatisticallysignificant  Grow full tree, thenpost-prune Select “best”tree:  measure performance over trainingdata  measureperformanceoverseparatevalidationdataset  min(|tree|+|misclassifications(tree)|)

  29. Effect of ReducedError Pruning 29

  30. Converting aTree to Rules 30 Outlook Sunny Overcast Rain Humidity Yes Wind High Normal Strong Weak No Yes No Yes R1: If (Outlook=Sunny) (Humidity=High) Then PlayTennis=No R2: If (Outlook=Sunny) (Humidity=Normal) Then PlayTennis=Yes R3: If (Outlook=Overcast) ThenPlayTennis=Yes R4: If(Outlook=Rain)(Wind=Strong) Then PlayTennis=No R5: If(Outlook=Rain)(Wind=Weak) ThenPlayTennis=Yes

  31. Continuous ValuedAttributes 31 Createadiscreteattributetotestcontinuous Temperature = 24.50C (Temperature > 20.00C) = {true, false} Where to set thethreshold?

  32. Random forestclassifier  Randomforestclassifier,anextensiontobaggingwhichusesde-correlated trees.  Tosayitinsimplewords:Randomforestbuildsmultipledecisiontreesand mergesthemtogethertogetamoreaccurateandstableprediction

  33. Random ForestClassifier TrainingData Mfeatures Nexamples

  34. Random ForestClassifier Create bootstrapsamples from the trainingdata Mfeatures Nexamples ....…

  35. Random ForestClassifier Construct a decisiontree Mfeatures Nexamples ....…

  36. Random ForestClassifier Ateachnodeinchoosingthesplitfeature choose only among m<Mfeatures Mfeatures Nexamples ....…

  37. Random ForestClassifier Create decision tree from each bootstrapsample Mfeatures Nexamples ....… ....…

  38. Random ForestClassifier Mfeatures Nexamples Take the majority vote ....… ....…

  39. Unknown AttributeValues 39 WhatifsomeexampleshavemissingvaluesofA? Usetrainingexampleanywaysortthroughtree IfnodentestsA,assignmostcommonvalueofA amongotherexamplessortedtonoden.  Assign most common value of A among other exampleswithsametargetvalue AssignprobabilitypitoeachpossiblevalueviofA Assign fraction pi of example to each descendant in tree Classifynewexamplesinthesamefashion

  40. Cross-Validation 40  Estimatetheaccuracyofanhypothesisinducedbyasupervised learningalgorithm  Predicttheaccuracyofanhypothesisoverfutureunseen instances  Selecttheoptimalhypothesisfromagivensetofalternative hypotheses  Pruning decisiontrees  Modelselection  Featureselection  Combining multiple classifiers(boosting)

More Related