1 / 14

Genetic-Algorithm-Based Instance and Feature Selection

Genetic-Algorithm-Based Instance and Feature Selection. Instance Selection and Construction for Data Mining Ch. 6 H. Ishibuchi, T. Nakashima, and M. Nii. Abstract. GA based approach for selecting a small number of instances from a given data set in a pattern classification problem.

jefflang
Télécharger la présentation

Genetic-Algorithm-Based Instance and Feature Selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genetic-Algorithm-Based Instance and Feature Selection Instance Selection and Construction for Data Mining Ch. 6 H. Ishibuchi, T. Nakashima, and M. Nii

  2. Abstract • GA based approach for selecting a small number of instances from a given data set in a pattern classification problem. • To improve the classification ability of our nearest neighbor classifier by searching for an appropriate reference set.

  3. Genetic Algorithm • Coding • Binary string of the length (n+m) • ai: inclusion or exclusion of the i-th feature • sp : the inclusion or exclusion of the p-th instance • Fitness function • Minimize |F|, minimize |P|, and maximize g(S) • |F| : number of selected feature • |P| : number of selected instance • g(S) : classification performance

  4. Genetic Algorithm • Performance measure (first one) : gA(S) • The number of correctly classified instances • Minimize |P| subject to gA(S) = m • Performance measure (second one) : gB(S) • When an instance xq was included in the reference set, xq was not selected as its own nearest neighbor. • fitness

  5. Genetic Algorithm • Initialization • Genetic Operation: Iterate the following procedure Npop/2 times to generate Npop string • Randomly select a pair of strings • Apply a uniform crossover • Apply a mutation operator • Generation Update: Select the Npop best string from 2Npop • Termination test

  6. Numerical Example

  7. Biased Mutation • For effectively decreasing the number of selected instances is to bias the mutation probability • In the biased mutation, a much larger probability is assigned to the mutation from sp = 1 to sp = 0.

  8. Data sets • 2 artificial + 4 real • Normal distribution with small overlap • Normal distribution with large overlap • Iris data • Appendicitis Data • Cancer Data • Wine Data

  9. Parameter Specifications • Pop Size : 50 • Crossover Prob. : 1.0 • Mutation Prob. • Pm = 0.01 for feature selection • Pm(1  0) = 0.1 for instance selection • Pm(0  1) = 0.01 for instance selection • Stopping condition : 500 gen. • Weight values : Wg = 5; WF = 1; WP = 1 • Performance measure : gA(S) or gB(S) • 30 trials for each data

  10. Performance on Training Data

  11. Performance on Test Data • Leaving-one-out procedure (iris & appendicitis) • 10-fold cross-validation (cancer & wine)

  12. Effect of Feature Selection

  13. Effect on NN

  14. Some Variants

More Related