230 likes | 346 Vues
This presentation provides an introduction to statistical learning methods applicable in high energy physics (HEP) and astrophysics. It covers topics such as Occam’s Razor, decision trees, and local density estimators, demonstrating how these methods can assist in background suppression and event purification. The significance of training and test sets, overtraining, and regularization in statistical learning is emphasized. Attendees will obtain insights into practical applications through examples, concluding with essential takeaways for researchers in HEP and astrophysics.
E N D
Statistical Learning Methods in HEAP Jens Zimmermann, Christian Kiesling Max-Planck-Institut für Physik, München MPI für extraterrestrische Physik, München Forschungszentrum Jülich GmbH Statistical Learning: Introduction with a simple example Occam‘s Razor Decision Trees Local Density Estimators Methods Based on Linear Separation Examples: Triggers in HEP and Astrophysics Conclusion C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
Statistical Learning • Does not use prior knowledge „No theory required“ • Learns only from examples „Trial and error“ „Learning by reinforcement“ • Two classes of statistical learning: discrete output 0/1: „classification“ continuous output: „regression“ • Application in High Energy- and Astro-Physics: Background suppression, purification of events Estimation of parameters not directly measured C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
A simple Example: Preparing a Talk # slides 0 1 2 3 4 5 6 x10 ExperimentalistsTheorists 0 1 2 3 4 5 6 x10 # formulas Data base established by Jens duringYoung Scientists Meeting at MPI C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
# slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas Discriminating Theorists from Experimentalists: A First Analysis 0 2 4 6 x10 # formulas 0 2 4 6 x10 # slides Experimentalists Theorists First talks handed in Talks a week beforemeeting C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
Simple „model“, but no completeseparation Completely separable, but only via complicated boundary # slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas First Problems New talk by Ludger: 28 formulas on 31 slides # slides 0 1 2 3 4 5 6 x10 At this point we cannot know which feature is „real“! Use Train/Test or Cross-Validation! 0 1 2 3 4 5 6 x10 # formulas C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
Training Set E Test Set # slides 0 1 2 3 4 5 6 x10 Overtraining Training epochs 0 1 2 3 4 5 6 x10 # formulas See Overtraining - Want Generalization Need Regularization Train Test Want to tune the parameters of the learning algorithm depending on the overtraining seen! C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
Training Set E Test Set # slides 0 1 2 3 4 5 6 x10 Training epochs Regularization will ensure adequate performance (e.g. VC dimensions):Limit the complexity of the model “Factor 10” - Rule: (“Uncle Bernie’s Rule #2”) 0 1 2 3 4 5 6 x10 # formulas See Overtraining - Want Generalization Need Regularization Train Test C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
Yes! But not of much use. No! „No free lunch“-theorem Wolpert 1996 Philosophy: Occam‘s Razor • Pluralitas non est ponenda sine necessitate. • Do not make assumptions, unless they are really • necessary. • From theories which describe the same phenomenon equally well • choose the one which contains the least number of assumptions. 14th century First razor: Given two models with the same generalization error, the simpler one should be preferred because simplicity is desirable in itself. Second razor: Given two models with the same training-set error, the simpler one should be preferred because it is likely to have lower generalization error. C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
# formulas #formulas < 20 exp #formulas > 60 th 0 2 4 6 x10 # slides #slides > 40 exp #slides < 40 th 0 2 4 6 x10 all events Classify Ringaile: 31 formulas on 32 slides #formulas > 60 #formulas < 20 rest th exp subset 20 < #formulas < 60 #slides < 40 #slides > 40 th th exp Decision Trees 20 < #formulas < 60 ? Regularization: Pruning C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
Local Density Estimators Search for similar events already classified within specified region, count the members of the two classes in that region. # slides 0 1 2 3 4 5 6 x10 # slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas 0 1 2 3 4 5 6 x10 # formulas C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
# formulas # slides 0 2 4 6 x10 0 2 4 6 x10 31 32 out= Maximum Likelihood Regularization: Binning Correlation gets lost completely by projection! C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
k=2 out= k=3 out= k=4 out= k=5 out= k-Nearest-Neighbour k=1 out= # slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas Regularization: Parameter k For every evaluation position the distances to each training position need to be determined! C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
5 4 3 1 y 7 # slides 0 1 2 3 4 5 6 x10 6 5 8 3 x 10 7 8 6 Small box: checked 1,2,4,9 out= 0 1 2 3 4 5 6 x10 # formulas Large box: checked all out= Range Search 1 x 2 3 y y 9 6 4 5 8 x x 10 7 9 2 10 Regularization: Box-Size Tree needs to be traversed only partially if box size is small enough! C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
# slides 0 1 2 3 4 5 6 x10 # slides 0 1 2 3 4 5 6 x10 0 1 2 3 4 5 6 x10 # formulas 0 1 2 3 4 5 6 x10 # formulas Methods Based on Linear Separation Divide the input space into regions separated by one or more hyperplanes. Extrapolation is done! LDA (Fisher discr.) C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
1 0 -1.8 +3.6 +3.6 arbitrary inputs and hidden neurons 0 1 2 3 4 5 6 x10 -50 +20 +1.1 -1.1 +0.1 +0.2 # formulas # slides 0 1 2 3 4 5 6 x10 Neural Networks Network with two hidden neurons (gradient descent): Regularization: # hidden neurons weight decay C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
Separating hyperplane with maximum distance to each data point: Maximum margin classifier Found by setting up condition for correct classfication and minimizing which leads to the Lagrangian Necessary condition for a minimum is Output becomes No! Replace dot products: The mapping to feature space is hidden in a kernel Non-separable case: Support Vector Machines Only linear separation? C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
Physics Applications: Neural Network Trigger at HERA H1 keep physics reject background C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
Eff@Rej=95%: NN 99.6% SVM 98.3% k-NN 97.7% RS 97.5% C4.5 97.5% ML 91.2% LDA 82% Trigger for J/y Events H1 C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
Eff@Rej=80%: NN 74% SVM 73% C4.5 72% RS 72% k-NN 71% LDA 68% ML 65% Triggering Charged Current Events signal background C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
Astrophysics: MAGIC - Gamma/Hadron Separation Photon Hadron Training with Data and MC Evaluation with Data vs. s = signal (photon) enhancement factor Random Forest: s = 93.3 Neural Net: s = 96.5 C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
transfer direction ~10µm ~300µm electron potential s of reconstruction in µm NN 3.6 SVM 3.6 k-NN 3.7 RS 3.7 ETA 3.9 CCOM 4.0 Future Experiment XEUS: Position of X-ray Photons (Application of Stat. Learning in Regression Problems) XEUS C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
Conclusion • Statistical learning theory is full of subtle details (models statistics) • Widely used statistical learning methods studied: • Decision Trees • LDE: ML, k-NN, RS • Linear separation: LDA, Neural Nets, SVM‘s • Neural Networks found superior in the HEP and Astrophysics applications (classification, regression) studied so far • Further applications (trigger, offline analyses) under study C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003
k-NN RS 2 2 4 3 4 3 2 2 3 3 5 5 5 5 Fit Gauss NN a=s(-2.1x - 1)b=s(+2.1x - 1) out=s(-12.7a-12.7b+9.4) From Classification to Regression C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 2003