1 / 26

Induction of Decision Trees

Induction of Decision Trees. An Example Data Set and Decision Tree. outlook. sunny. rainy. yes. company. no. big. med. no. sailboat. yes. small. big. yes. no. Classification. outlook. sunny. rainy. yes. company. no. big. med. no. sailboat. yes. small. big. yes. no.

gmurray
Télécharger la présentation

Induction of Decision Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Induction of Decision Trees

  2. An Example Data Set and Decision Tree outlook sunny rainy yes company no big med no sailboat yes small big yes no

  3. Classification outlook sunny rainy yes company no big med no sailboat yes small big yes no

  4. Breast Cancer Recurrence Degree of Malig < 3 >= 3 Tumor Size Involved Nodes < 15 >= 15 < 3 >= 3 Age no rec 125 recurr 39 no rec 30 recurr 18 recurr 27 no_rec 10 no rec 4 recurr 1 no rec 32 recurr 0 Tree induced by Assistant Professional Interesting: Accuracy of this tree compared to medical specialists

  5. Another Example

  6. Simple Tree Outlook sunny rainy overcast Humidity P Windy high normal yes no N P N P

  7. Complicated Tree Temperature hot cold moderate Outlook Outlook Windy sunny rainy sunny rainy yes no overcast overcast P P Windy Windy P Humidity N Humidity yes no yes no high normal high normal N P P N Windy P Outlook P yes no sunny rainy overcast N P N P null

  8. Attribute Selection Criteria • Main principle • Select attribute which partitions the learning set into subsets as “pure” as possible • Various measures of purity • Information-theoretic • Gini index • X2 • ReliefF • ...

  9. Information-Theoretic Approach • To classify an object, a certain information is needed • I, information • After we have learned the value of attribute A, we only need some remaining amount of information to classify the object • Ires, residual information • Gain • Gain(A) = I – Ires(A) • The most ‘informative’ attribute is the one that minimizes Ires, i.e., maximizes Gain

  10. Entropy • The average amount of information I needed to classify an object is given by the entropy measure • For a two-class problem: entropy p(c1)

  11. Triangles and Squares

  12. Triangles and Squares Data Set: A set of classified objects . . . . . .

  13. Entropy • 5 triangles • 9 squares • class probabilities • entropy . . . . . .

  14. Entropyreductionbydata setpartitioning . . . . . . . . red green yellow . . . . Color?

  15. Entropija vrednosti atributa . . . . . . . . . . . . red Color? green yellow

  16. Information Gain . . . . . . . . . . . . red Color? green yellow

  17. Information Gain of The Attribute • Attributes • Gain(Color) = 0.246 • Gain(Outline) = 0.151 • Gain(Dot) = 0.048 • Heuristics: attribute with the highest gain is chosen • This heuristics is local (local minimization of impurity)

  18. . . . . . . . . . . . . red Color? green yellow Gain(Outline) = 0.971 – 0 = 0.971 bits Gain(Dot) = 0.971 – 0.951 = 0.020 bits

  19. . . . . . . . . . . . . . . red Gain(Outline) = 0.971 – 0.951 = 0.020 bits Gain(Dot) = 0.971 – 0 = 0.971 bits Color? green yellow solid Outline? dashed

  20. . . . . . . . . . . . . . . red . yes Dot? . Color? no green yellow solid Outline? dashed

  21. Decision Tree . . . . . . Color red green yellow Dot square Outline yes no dashed solid triangle square triangle square

  22. Gini Index • Another sensible measure of impurity(i and j are classes) • After applying attribute A, the resulting Gini index is • Gini can be interpreted as expected error rate

  23. Gini Index . . . . . .

  24. Gini Index for Color . . . . . . . . . . . . red Color? green yellow

  25. Gain of Gini Index

  26. Three Impurity Measures • These impurity measures assess the effect of a single attribute • Criterion “most informative” that they define is local (and “myopic”) • It does not reliably predict the effect of several attributes applied jointly

More Related