1 / 20

with

Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India. High Energy Physics Event Selection. with. Gene Expression Programming. Liliana Teodorescu. Outline. Introduction to evolutionary computation Gene Expression Programming (GEP)

gladys
Télécharger la présentation

with

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India High Energy Physics Event Selection with Gene Expression Programming Liliana Teodorescu

  2. Outline • Introduction to evolutionary computation • Gene Expression Programming (GEP) • Application of GEP for Event Selection • Conclusion Liliana Teodorescu, Brunel University

  3. Evolutionary Computation • Evolutionary computationsimulates thenatural evolutionon a computer process leading to maintenance or increase of a population ability to surviveand reproduce in a specific environment quantitatively measured byevolutionary fitness • Goal of natural evolution - to generatea population of individuals with increasing fitness • Goal of evolutionary computation - to generate a set of solutions (to a problem) of increasing quality Liliana Teodorescu, Brunel University

  4. Terminology • Individual– candidate solution to a problem decoding encoding • Chromosome– representation of the candidate solution • Gene– constituent entity of the chromosome • Population– set of individuals/chromosomes • Fitness function – representation of how good a candidate solution is • Genetic operators– operators applied on chromosomes in order to create genetic variation (other chromosomes) Liliana Teodorescu, Brunel University

  5. Evolutionary Algorithms Natural evolution simulation - core of theevolutionary algorithms: Basic evolutionary algorithm Run Start Initial population creation (randomly) • Problem definition • Encoding of the candidate solution • Fitness definition • Run • Decoding the best fitted chromosome = solution Fitness evaluation (of each chromosome) yes Terminate? Stop no New generation Selection of individuals (proportional with fitness) Reproduction (genetic operators) Replacement of the current population with the new one Liliana Teodorescu, Brunel University

  6. Evolutionary Algorithms • Genetic Algorithms (GA) (J. H. Holland, 1975) • Genetic Programming (GP) (J. R. Koza, 1992) • Gene Expression Programming (GEP)(C. Ferreira, 2001) Main differences • Encoding method • Reproduction method Liliana Teodorescu, Brunel University

  7. Gene Expression Programming • search for the computer program that solve the problem (as GP) • works with two entities: chromosomesand expression trees Encoding Candidate solutionrepresented by an expression tree (ET) ET encoded in a chromosome: read ET left - right and top - down Q * Q*-+abcd Q means sqrt + - a b c d • Decoding the chromosome(translates the chromosome in an ET) • first line of ET (root) – first element of the chromosome • next line of ET – as many arguments needed by the element in • the previous line Liliana Teodorescu, Brunel University

  8. Gene Expression Programming Reproduction Genetic operators applied on chromosoms not on ET => always produce sintactically correct structures! • Recombination – exchanges parts of two chromosomes • Mutation – changes the value of a node • Transposition– moves a part of the chromosome to another location in the same chromosome e.g. Mutation: Q replaced with * *b+a-aQab+//+b+babbabbbababbaaa *b+a-aQab+//+b+babbabbbababbaaa * * + + b b - - a a a Q a * a b a Liliana Teodorescu, Brunel University

  9. Gene Expression Programming Chromosome– has one or more genes of equal length Gene– head:contains both functions and terminals (length h) - tail: contains only terminals (length t) n – number of arguments of the function with the highest number of arguments t=h(n-1)+1 e.g. set of functions: Q,*,/,-,+ set of terminals: a,b n=2; h=15 (choosen) t =16 => length of gene=15+16=31 * + b - a a Q *b+a-aQab+//+b+babbabbbababbaaa a ET ends before the end of the gene! Liliana Teodorescu, Brunel University

  10. GEP in HEP • GEP for event selection • cuts/selection criteria finding • classification problem (signal/background classification) • statistical learning approach • Data samples: • Monte-Carlo simulation from BaBar experiment • Ks production in e+e- (~10 GeV) • 5000 training events (for classification rule extraction) • 5000 test events (others than training events) • limitations imposed by APS 3.0 • S/N = 0.25, 1, 5 • Software resources • APS 3.0 (Automatic Problem Solver) - commercial package (Windows based) - www.gepsoft.com - function finding, classification, time series analysis Liliana Teodorescu, Brunel University

  11. Input Parameters Functions and constantsto be used in the classification rules (cut type rule) 10 functions - AND1 (x<0 and y<0 => 1 else 0), AND2 (x0 and y0 => 1 else 0) - OR1 (x<0 or y<0 => 1 else 0), OR2 (x0 or y0 => 1 else 0) - <, >, <=, >=, =, != 36 functions -previous 10 functions + common mathematical functions constants - floating point constants (-10,10) Data– variables usually used in a cut based analysis for selection - doca (distance of closest approach) - RXY, |RZ| (region around interaction point) - |cos(hel)| - SFL (Signed Flight Length) - Fsig (Flight Significance) - Pchi (2 probability of the vertex) - Mass (KS reconstructed mass) GEP parameters - fitness function: number of hits (events correctly classified) - gene length (head = 1-20) - no. of chromosomes per generation: 100 - no. of generations per run: 1000-20000 - genetic operators rates: mutation 0.044, inversion 0.1, transposition 0.3, recombination 0.1 Liliana Teodorescu, Brunel University

  12. Analysis 1 Data sample: S/N =0.25 10 functions No. of genes = 1, Head length =10 Fsig  5.26, Rxy < 0.19, doca <1, Pchi > 0 Classification Accuracy = 95.36% Liliana Teodorescu, Brunel University

  13. Complex chromosomes Data samples: S/N =0.25, 1, 5 10 functions 95% S/N = 0.25 5% 92% S/N = 5 92% 8% S/N = 1 8% Liliana Teodorescu, Brunel University

  14. Classification rules Data sample: S/N =0.25 10 functions Liliana Teodorescu, Brunel University

  15. Classification rules GEP Cut-based analysis Fsig  4.1 Rxy  0.2cm SFL  0.2cm Pchi > 0 No cut Reduction S: 15% B: 98% Fsig  4.0 Rxy  0.2cm SFL  0cm Pchi > 0.001 doca  0.4cm |Rz|  2.8cm Previous cuts + doca  0.4 | Rz|  2.8cm Previous cuts + doca > 0 Rxy  Mass Reduction S: 16% B: 98.3% Liliana Teodorescu, Brunel University

  16. Classification rules GEP Fsig  3.63 |Rz|  2.65cm Fsig  3.63 |Rz|  2.65cm Rxy<Pchi Reduction S: 7.6% B: 87.8% Reduction S: 16% B: 97.8% Liliana Teodorescu, Brunel University

  17. Test of Classification rules Data samples: S/N =0.25, 1, 5 10 functions S/N = 0.25 S/N = 5 S/N = 1 Liliana Teodorescu, Brunel University

  18. Analysis 2 Data sample: S/N =0.25 36 mathematical functions No. of genes = 1, Head length =10 Classification Accuracy = 95.00% Liliana Teodorescu, Brunel University

  19. Complex chromosomes Data sample: S/N =0.25 36 functions Classification Accuracy  95% Liliana Teodorescu, Brunel University

  20. Conclusions GEP allows • fast identification of powerful cuts • signal/background separation of 92-95% accuracy for samples with S/N = 0.25, 1, 5 • potential of discovering new correlations between variables • large number of selection functions does not improve the classification accuracy • GEP • is still in the R&D phase • needs software development -> underway Liliana Teodorescu, Brunel University

More Related