1 / 13

Analysis of Genetic Programming Runs

Analysis of Genetic Programming Runs. Vic Ciesielski and Xiang Li {vc, xiali}@cs.rmit.edu.au School of Computer Science and Information Technology RMIT University, Australia. Overview. Introduction Brief explain why we analyze runs Our research questions Problems and their backgrounds

kristyh
Télécharger la présentation

Analysis of Genetic Programming Runs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Genetic Programming Runs Vic Ciesielski and Xiang Li {vc, xiali}@cs.rmit.edu.au School of Computer Science and Information Technology RMIT University, Australia Analysis of Genetic Programming Runs

  2. Overview • Introduction • Brief explain why we analyze runs • Our research questions • Problems and their backgrounds • Methodology • What information we looked into in detail. • How we measure them. • Results • Duplicate evaluations, Tree shapes, Program sizes and depth changes along with generations. • Conclusion • What we have learned from the runs, what their implications are and what we can do to improve the genetic programming process. Analysis of Genetic Programming Runs

  3. Introduction • Analyzing GP runs is important • Help to understand the problem in depth; Find important patterns; Improve the search techniques • Eg . MAX by Gathorcole and Ross, 1996 ; Langdon and Poli, 1997 • Eg . Fitness Landscape by Kinnear, Jr. 1994; Ant by Langdon and Poli, 1998 • Previous works have limitations • Only one or two problems • Only limited aspects of the problems, eg. convergence, bloat, fitness landscape, tree shapes (limited) • Research questions • How many duplicate individuals are evaluated in a run? • What percentage of possible tree shapes are evaluated? • Are there any patterns in the size and depth of the trees examined? • Are there any differences between the toy and the real world problems? • Are any improvements to the GP process suggested by the discovered patterns? Analysis of Genetic Programming Runs

  4. Problems & Backgrounds & Genetic Variables Settings • ‘Toy’ Problems ( 9) • Santa Fe Ant (Langdon and Poli, 1998) , Pop. Size : 500, Max. Depth 5 • Modified Santa Fe Ant (Ciesielski and Li, 2004), Pop. Size : 100, Max. Depth 5 • Lawnmower (Koza, 1992), Pop Size : 100, Max. Depth 7 • MAX(Gathercole and Ross, 1996; Langdon and Poli, 1997), Pop. Size 100, Max. Depth 5 • Symbolic Regression (Koza, 1992), Pop. Size : 100, Max. Depth 7 • 5 Even Parity (Koza, 1992), Pop. Size : 500, Max. Depth 8 • Binary Image Classification 1 (Li and Ciesielski, 2004) • Binary Image Classification 2, Pop. Size : 100, Max. Depth 7 • Soccer Goal Scoring (Bajurnow and Ciesielski, 2004), Pop. Size : 100, Max. Depth 5 • Real world problems (3) • Cephalometric Landmarks (Ciesielski et al., 2003), Pop. Size : 100, Max Depth 9 • Texture Classification (Song and Ciesielski, 2004), Pop. Size : 500, Max. Depth 14, Elitism 10%, Crossover 85%, Mutation 5%, Max. Gen. 50 • Evolution of Texture Features (Lam Ciesielski, 2004), Pop. Size : 100, Max. Depth 7 • Common Genetic Variables Settings unless specified • Elitism Rate : 2% Crossover Rate : 70% Mutation Rate : 28% Max. Gen. 100 Analysis of Genetic Programming Runs

  5. Methodology • GP Environments • RMIT-GP (http://www.cs.rmit.edu.au/~vc) • Strong typed • Ramped half and half method for initialization • Duplications are found, but hard to determine, because • Exactly identical • Logically equivalent, eg (+ A 1) vs. (+ A (/ B B)) • Commutative equivalent, eg (+ A B) vs. (+ B A) Analysis of Genetic Programming Runs

  6. Measures (a) (b) (c) (d) Possible Tree Shapes Invalid Tree Shapes • Number of fitness evaluations • Number of tree shapes • Converting programs into shapes • (* (+ (/ 24 X) (+ Y 81)) (+ (* X 34) (+ 86 44)) Translated into  (((##)(##))((##)(##)) • Determine Total number of possible trees • Most of GP problem examined use binary trees • Depth= { 1, 2, 3, 4, 5, 6, 7, …} • No.of.Possible.Shapes ={ 1, 3, 21, 651, 457653, 2.10E+11, 4.4E22, …} • More shapes for ternary and grows exponentially with tree depth • Number of commutative-distinct individuals evaluated • Number of string-distinct individuals evaluated Analysis of Genetic Programming Runs

  7. Results - 1 Run Analysis of Genetic Programming Runs

  8. Results - 5 Runs Comments : We understand 5 runs are still not enough, but at least they repeatedly demonstrate the same pattern. Duplications exist and they are not in a small amount. Analysis of Genetic Programming Runs

  9. Tree Shapes - 5 Runs Comments : Even 5 runs, the number of distinct of shapes is still trivial compared with possible shapes besides 1 runs. Analysis of Genetic Programming Runs

  10. Visualizations A “Toy”Problem - MAX Analysis of Genetic Programming Runs

  11. A Real World Problem Texture Features Analysis of Genetic Programming Runs

  12. Conclusion • Number of duplicate individuals • There are a lot than expected. Some are understandable like MAX problem, some need more analysis • Percentage of possible tree shapes • Only a very tiny percentage of the possible shapes were examined • Patterns in size and depth • Fitness vs. generation vs. size follows a roughly triangular pattern • Toy vs. Real World Problems • There are no clear differences. There are many duplications in real world and many single node trees were evaluated. • Suggested Improvements Caching the string representation of the programs and reuse the fitness values when evaluations are expensive. Analysis of Genetic Programming Runs

  13. Questions & Suggestions Acknowledgement: This work was partially supported by grant EPPNRM054 from the Victorian Partnership for Advanced Computing. Thanks for people in ECML group, Andy, Andrew, Brian, Teja for their providing some runs. Analysis of Genetic Programming Runs

More Related