1 / 22

Data Mining Using Genetic Programming: A PhD Talk

Data Mining Using Genetic Programming: A PhD Talk. Jeroen Eggermont. Knowledge is Power !. Sir Francis Bacon (1561-1626). Information Age ?. 2002: 5 x 10 18 bytes produced 92% on hard-disk More than twice the amount of 1999. Information vs Knowledge. Information is not Knowledge

bern
Télécharger la présentation

Data Mining Using Genetic Programming: A PhD Talk

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Using Genetic Programming: A PhD Talk Jeroen Eggermont

  2. Knowledge is Power ! Sir Francis Bacon (1561-1626)

  3. Information Age ? 2002: • 5 x 1018 bytes produced • 92% on hard-disk • More than twice the amount of 1999

  4. Information vs Knowledge Information is not Knowledge Albert Einstein Where is the Knowledge we have lost in Information ? T.S. Eliot

  5. Knowledge Discovery Knowledge Discovery in Databases ``the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data ´´ Data Mining phase: identifies, searches or constructs patterns

  6. Classification Construct or find a model in order to predict the category of some data value. Question: Do we want to have a BBQ ?

  7. Wind < 3 false true Rain = yes BBQ:=yes true false BBQ:=yes BBQ:=no Decision Trees

  8. Evolutionary Computation • is based on biological metaphors • has great practical potential • is (getting) popular in many fields • yields powerful, diverse applications • gives high performance against low costs • AND IT’S FUN !

  9. The Metaphor EVOLUTION Individual Fitness Environment PROBLEM SOLVING Candidate Solution Quality Problem

  10. Classification using EC ? EVOLUTION Individual Fitness Environment CLASSIFICATION Decision Tree Accuracy Data Set

  11. Wind < 3 false true Rain = yes BBQ:=yes true false BBQ:=yes BBQ:=no Genetic Programming Evolutionary Computation using Trees

  12. Population Parents parent selection evaluation & crossover selection X mutation Offspring Genetic Programming

  13. Classification using GP ? • WHY: • ML: Local Search • EC: Global Search • EC copes well with attribute interactions • Easy to adapt for different types of decision trees • FUN !!!

  14. Simple Representation • Binary Trees • Each node contains an atom: • Internal Nodes:< or = • Leaf Nodes: assignment • Atoms can occur more than once • Maximum of 63 nodes

  15. X Y Z 1 a yes 2 b yes 3 a no 4 b no X < 1 X < 2 X < 3 X < 4 Y = a Y = b Z := yes Z := no Simple Representation • Attribute operator value combinations • Six internal nodes • Two leaf nodes • Maximal 63 nodes • Yields 2 10103 trees

  16. Refining the Search Space How can we reduce the search space? • reduce number of classes • reduce atoms for non-numerical attributes • reduce atoms for numerical attributes

  17. Refining the Search Space Split domain of a numerical attributes • Heuristics: • gain • gain_ratio • K-means clustering

  18. X Y Z 1 a yes 2 b yes 3 a no 4 b no Y = a Y = b Z := yes Z := no Refined Representation (k = 2) • Three internal atoms • Two leaf nodes • Maximum 63 nodes • Smaller Search Space X < 1 X < 2 X < 3 X < 4

  19. Storm false true Showers BBQ:=no true false BBQ:=yes BBQ:=no Fuzzy Decision Trees

  20. D(X) = [0,10] D(X) = (3,10] D(X) = [0,3] D(X) = (3,10] D(X)=Ø Introns X > 3 X < 5 B B A

  21. {A, B} {B} {A} {A} {A} Introns X > 3 Y < 2 B A A

  22. Conclusions • Refining the search space can greatlyimprove performance • Fuzzy decision trees more robust ? • Removing introns increases speed • Nothing works always Free C++ Library for Evolutionary Computation http://eodev.sourceforge.net

More Related