Appendix D: Application of Genetic Algorithm in Classification

Appendix D: Application of Genetic Algorithm in Classification Duong Tuan Anh 5/2014

Classification with Decision trees Training data Class No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No

Decision tree There exists the algorithm to create a decision tree from the training set (ID3, C4.5)

Classification rules from decision tree • Represent the knowledge in the form of IF-THEN rules • One rule is created for each path from the root to a leaf • Each attribute-value pair along a path forms a conjunction • The leaf node holds the class prediction • Rules are easier for humans to understand. Example IF age = “<=30” AND student = “no” THEN buys_computer = “no” IF age = “<=30” AND student = “yes” THEN buys_computer = “yes” IF age = “31…40” THEN buys_computer = “yes” IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “no” IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “yes”

GA for classification rule discovery • Individual representation • Each individual encodes a single classification rule • Each rule is represented as a bit string • Example: Instances in the training set are describe by two Boolean attributes A1 and A2 and two classes: C1 and C2 • Rule: IF A1 AND NOT A2 THEN C2 bit string “100” • Rule: IF NOT A2 AND NOT A2 THEN C1 bit string “001” • If the attribute has k values, k > 2 then k bits are used to encode the attribute values. Classes can be encoded in a similar fashion.

Genetic operators for rule discovery • Generalizing/Specializing Crossover • Overfitting: a situation in which a rule is covering one training example.  generalization • Underfitting: a situation in which a rule is covering too many training examples.  specialization • The generalizing/specialization crossover operators can be implemented as the logical OR and AND, respectively. • Example: Two crossover points children produced by children produced by Parents generalization crossover specialization crossover 0 | 1 0 | 1 0 | 1 1 | 1 0 | 0 0 | 1 1 | 0 1 | 0 1 | 1 1 | 0 1 | 0 0 | 0 OR AND

Fitness function • Let a rule be of the form: IF A THEN C where A is the antecedent and C is the predicted class. Predictive accuracy of a rule called confidence factor (CF) is defined: CF = |A  C|/|A| |A|: the number of examples satisfying all the conditions in the antecedent A |A  C|: the number of examples that both satisfy the antecedent A and have the class predicted by the consequent C. Example: A rule covers 10 examples (i.e. |A| = 10), in which 8 examples have the class predicted by the rule (i.e. |A & C| = 8), then CF of the rule is CF = 80%. • The performance of a rule can be summarized by a matrix called a confusion matrix.

Confusion matrix TP = True positives = Number of examples satisfying A and C FP = False positives = Number of examples satisfying A but not C FN = False negatives = Number of examples not satisfying A but satisfying C TN = True negatives = Number of examples not satisfying A nor C CF measure is defined in terms of the above notation: CF = TP/(TP + FP).

Fitness function (cont.) • We can know measure the predictive measure of a rule by taking into account not only its CF but also a measure of how “complete” a rule is. • Completeness of the rule: what is the proportion of examples having the predicted class C that is actually covered by the rule antecedent. • The rule completeness measure: Comp = TP/(TP+FN) • The fitness function combines the CF and Comp measures: Fitness = CF  Comp. • An initial population is created consisting of randomly generated rules. • The process of generating a new population based on prior populations of rules continues until a population, P, evolves where each rule in P satisfies a prespecified fitness threshold.

Reference • A. A. Freitas, A Survey of Evolutionary Algorithms for Data Mining and Knowledge, in: Advances in Evolutionary Computing, Springer, 2003.

Appendix D: Application of Genetic Algorithm in Classification

Appendix D: Application of Genetic Algorithm in Classification

Presentation Transcript

Microbial Genetics

Animal Systematics

Genetic Technologies

Classification of Medically Important Viruses

The Simplex Algorithm

EM Algorithm: Expectation Maximazation Clustering Algorithm book: “ DataMining, Morgan Kaufmann, Frank ”

Using Mean-Variance Model and Genetic Algorithm to Find the Optimized Weights of Portfolio of Funds.

Genetic Algorithms

Pedigree Analysis

Chapter 14 Genetic Recombination and Genetic Engineering

Chapter 14 Genes in Action

Knowledge Integration by Genetic Algorithms

Implement the DiffServ QoS Model

CARDIAC INOTROPES

One Pass Algorithm Presented By: Pradhyuman raol ID : 114

Genetic Algorithms

Router Design

Introduction to Classification

Chapter 6. Classification and Prediction