Breeding Decision Trees Using Evolutionary Techniques

Breeding Decision Trees Using Evolutionary Techniques Papagelis Athanasios - Kalles DimitriosComputer Technology Institute & AHEAD RM

Introduction • We use GAs to evolve simple and accurate binary decision trees • Simple genetic operators over tree structures • Experiments with UCI datasets • very good size • competitive accuracy results • Experiments with synthetic datasets • Superior accuracy results

Current tree induction algorithms… • .. Use greedy heuristics • To guide search during tree building • To prune the resulting trees • Fast implementations • Accurate results on widely used benchmark datasets (like UCI datasets) • Optimal results ? • No • Good for real world problems? • There are not many real world datasets available for research.

More on greedy heuristics • They can quickly guide us to desired solutions • On the other hand they can substantially deviate from optimal • WHY? • They are very strict • Which means that they are VERY GOOD just for a limited problem space

Why GAs should work ? • GAs are not • Hill climbers • Blind on complex search spaces • Exhaustive searchers • Extremely expensive • They are … • Beam searchers • They balance between time needed and space searched • Application on bigger problem space • Good results for much more problems • No need to tune or derive new algorithms

Another way to see it.. • Biases • Preference bias • Characteristics of output • We should choose about it • e.g small trees • Procedural bias • How we will search? • We should not choose about it • Unfortunately we have to: • Greedy heuristics make strong hypotheses about search space • GAs make weak hypotheses about search space

The real world question… • Are there datasets where hill-climbing techniques are really inadequate ? • e.g unnecessarily big – misguiding output • Yes there are… • Conditionally dependent attributes • e.g XOR • Irrelevant attributes • Many solutions that use GAs as a preprocessor so as to select adequate attributes • Direct genetic search can be proven more efficient for those datasets

The proposed solution • Select the desired decision tree characteristics (e.g small size) • Adopt a decision tree representation with appropriate genetic operators • Create an appropriate fitness function • Produce a representative initial population • Evolve for as long as you wish!

Initialization procedure • Population of minimum decision trees • Simple and fast • Choose a random value as test value • Choose two random classes as leaves A=2 Class=2 Class=1

Genetic operators

Payoff function • Balance between accuracy and size • set x depending on the desired output characteristics. • Small Trees ?  x near 1 • Emphasis on accuracy ?  x grows big

Advanced System Characteristics • Scalled payoff function (Goldberg, 1989) • Alternative crossovers • Evolution towards fit subtrees • Accurate subtrees had less chance to be used for crossover or mutation. • Limited Error Fitness (LEF) (Gathercole & Ross, 1997) • significant CPU timesavings and insignificant accuracy loses

Second Layer GA • Test the effectiveness of all those components • coded information about the mutation/crossover rates and different heuristics as well as a number of other optimizing parameters • Most recurring results: • mutation rate 0.005 • crossover rate 0.93 • use a crowding avoidance technique • Alternative crossover/mutation techniques did not produce better results than basic crossover/mutation

Search space / Induction costs • 10 leaves,6 values,2 classes • Search space >50,173,704,142,848(HUGE!) • Greedy feature selection • O(ak) a=attributes,k=instances (Quinlan 1986) • O(a2k2)one level lookahead (Murthy and Salzberg, 1995) • O(adkd) for d-1 levels of lookahead • Proposed heuristic • O(gen* k2*a). • Extended heuristic • O(gen*k*a)

How it works? An example (a) • An artificial dataset with eight rules (26 possible value, three classes) • First two activation-rules as below: • (15.0 %) c1 A=(a or b or t) & B=(a or h or q or x) • (14.0%) c1 B=(f or l or s or w) & C=(c or e or f or k) • Huge Search Space !!!

How it works? An example (b)

Illustration of greedy heuristics problem • An example dataset (XOR over A1&A2)

A1=t A2=f A2=t t f f t C4.5 result tree A3=t A1=t A2=f A2=t t f f t Totally unacceptable!!!

More experiments towards this direction

Results for artificial datasets

Results for UCI datasets

C4.5 / OneR deficiencies • Similar preference biases • Accurate, small decision trees • This is acceptable • Not optimized procedural biases • Emphasis on accuracy (C4.5) • Not optimized tree’s size • Emphasis on size (OneR) • Trivial search policy • Pruning as a greedy heuristic has similar disadvantages

Average needed re-classification Future work • Minimize evolution time • crossover/mutation operators change the tree from a node downwards • we can classify only the instances that belong to the changed-node’s subtree. • But we need to maintain more node statistics

Future work (2) • Choose the output class using a majority vote over the produced tree forest (experts voting) • Pruning is a greedy heuristic • A GA’s pruning?

Breeding Decision Trees Using Evolutionary Techniques

Breeding Decision Trees Using Evolutionary Techniques

Presentation Transcript

Decision Trees

Decision Trees

Data Mining using Decision Trees

Decision Trees

Automatic nanodesign using evolutionary techniques

Decision Trees

Decision Trees

Decision Trees

Classification using Decision Trees

Decision Trees

Data Mining using Decision Trees

Comparing Evolutionary Trees

RECONSTRUCTING EVOLUTIONARY TREES

Decision Trees

Decision Trees

DECISION TREES

Decision Trees

Decision trees

Decision Trees

Decision Trees