1 / 18

Data Mining CSCI 307, Spring 2019 Lecture 16

Data Mining CSCI 307, Spring 2019 Lecture 16. Constructing Trees Covering algorithms. Computing the Gain Ratio. Gain ratio corrects the information gain by taking the intrinsic information of a split into account. Example : intrinsic information for ID code

corinna
Télécharger la présentation

Data Mining CSCI 307, Spring 2019 Lecture 16

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data MiningCSCI 307, Spring 2019Lecture 16 Constructing Trees Covering algorithms

  2. Computing the Gain Ratio Gain ratio corrects the information gain by taking the intrinsic information of a split into account. Example: intrinsic information for ID code info([1,1,...,1]) = 14 x (-1/14 x log(1/14)) = 3.807bits Value of attribute decreases as intrinsic information gets larger Definition of gain ratio: Example: 0.940 bits 3.807 bits gain(attribute) intrinsic_info(attribute) gain_ratio(ID code) = = 0.246 gain_ratio(attribute) =

  3. Gain Ratios for Outlook Outlook Info: 0.693 Gain:0.940-0.693 0.247 Split info: Gain Ratio: To calculate the gain ratio, we first need to calculate the split information, i.e.info([5,4,5]) Now, calculate the ratio, gain/info([5,4,5])

  4. Gain Ratios for Weather Data Humidity Info: 0.788 Gain:0.940-0.788 0.152 Split info:info([7,7]) 1.000 Gain Ratio:0.152/1 0.152 Outlook Info: 0.693 Gain:0.940-0.693 0.247 Split info: Gain Ratio: Temperature Info: 0.911 Gain:0.940-0.911 0.029 Split info:info([4,6,4] 1.557 Gain Ratio:0.029/1.557 0.019 Windy Info: 0.892 Gain:0.940-0.892 0.048 Split info:info([8,6]) 0.985 Gain Ratio:0.048/0.9850.049

  5. More on the Gain Ratio • “Outlook” still comes out top • However: “ID code” has greater gain ratio • Standard fix: ad hoc test to prevent splitting on that type of attribute • Problem with gain ratio: it may overcompensate • May choose an attribute just because its intrinsic information is very low • Standard fix: only consider attributes with greater than average information gain

  6. Discussion • This top-down induction of decision trees: essentially the same as ID3 algorithm developed by Ross Quinlan • Gain ratio just one modification of this basic algorithm • Eventually turned into ==> C4.5: deals with numeric attributes, missing values, noisy data • Similar approach: CART (classification and regression tree) algorithm. • There are many other attribute selection criteria! (But little difference in accuracy of result)

  7. Covering Algorithm • Convert decision tree into a rule set • Straightforward, but rule set overly complex • More effective conversions are not trivial • Instead, can generate rule set directly • for each class in turn find rule set that covers all instances in it (excluding instances not in the class) • Called a coveringapproach: • at each stage a rule is identified that “covers” some of the instances

  8. Example: Generating a Rule

  9. Rules versus Trees Corresponding decision tree: (produces exactly the same predictions) • But: rule sets can be more perspicuous when decision trees suffer from replicated subtrees • Also: in multiclass situations, covering algorithm concentrates on one class at a time whereas decision tree learner takes all classes into account

  10. Simple Covering Algorithm • Generates a rule by adding tests that maximize rule’s accuracy • Similar to situation in decision trees: problem of selecting an attribute to split on • But: decision tree inducer maximizes overall purity Each new test reduces the rule's coverage:

  11. Selecting a Test • Goal: maximize accuracy • t total number of instances covered by rule • ppositive examples of the class covered by rule • t–pnumber of errors made by rule • Select test that maximizes the ratio p/t • We are finished whenp/t=1or the set of instances cannot be split any further

  12. The Contact Lenses Data

  13. Example: Contact Lens Data If ? then recommendation = hard Rule we seek: Possible Tests: Age = Young Age = Pre-presbyopic Age = Presbyopic Spectacle prescription = Myope Spectacle prescription = Hypermetrope Astigmatism = no Astigmatism = yes Tear production rate = Reduced Tear production rate = Normal

  14. Modified Rule and Resulting Data Rule with the best test is added: Instances covered by modified rule:

  15. Further Refinement Current State: Possible Tests: Age = Young Age = Pre-presbyopic Age = Presbyopic Spectacle prescription = Myope Spectacle prescription = Hypermetrope Tear production rate = Reduced Tear production rate = Normal

  16. Modified Rule and Resulting Data Rule with the best test is added: Instances covered by modified rule:

  17. Further Refinement Current State: Possible Tests: Age = Young Age = Pre-presbyopic Age = Presbyopic Spectacle prescription = Myope Spectacle prescription = Hypermetrope

  18. The Result Final Rule: This does not cover all the "hard lens" rules.

More Related