1 / 26

The Roller Coaster of Organizing a HEP Challenge on Kaggle and Codalab

This presentation by David Rousseau provides an overview of the organization and development of the TrackML challenge in high energy physics (HEP), which aims to optimize software and tackle parallelism in tracking algorithms. The challenge involves accurately grouping 3D points into tracks and evaluating participant solutions based on score and throughput. The timeline, dataset, and scoring metrics are discussed, highlighting the complexity of real-life scenarios compared to the simplified challenge environment.

Télécharger la présentation

The Roller Coaster of Organizing a HEP Challenge on Kaggle and Codalab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. : the roller coaster of organizing a HEP challenge on Kaggle and Codalab - David Rousseau LAL-Orsay rousseau@lal.in2p3.fr@dhpmrou EPS-HEP Conference 10-17 Jul 2019, Gand, BElgique

  2. Who and How 5-6 FTE year Organisation: Jean-Roch Vlimant (Caltech), Vincenzo Innocente, Andreas Salzburger (CERN), Sabrina Amrouche, Tobias Golling, Moritz Kiehn (Geneva University), David Rousseau, Yetkin Yilmaz (LAL-Orsay), Paolo Calafiura, Steven Farrell, Heather Gray (LBNL), Vladimir Vava Gligorov (LPNHE-Paris), Laurent Basara, Cécile Germain, Isabelle Guyon, Victor Estrade (LRI-Orsay), Edward Moyse (University of Massachussets), MikhailHushchyn, Andrey Ustyuzhanin (Yandex, HSE) TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  3. sponsors TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  4. Tracking crisis Similar plots from CMS • Tracking (in particular pattern recognition) dominates reconstruction CPU time at LHC • HL-LHC (phase 2) perspective : increased pileup :Run 1 (2012): <>~20, Run 2 (2015): <>~50,Phase 2 (2025): <>~200 • CPU time quadratic/exponential extrapolation • On-going Large effort within HEP to optimise software and tackle micro and macro parallelism. • >20 years of LHC tracking development. Everything has been tried? • Maybe yes, but maybe algorithm slower at low lumi but with a better scaling have been dismissed ? • Maybe no, brand new ideas from ML • challenge !!! TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  5. 10-100 billion events/year 6 m 2 m Point precision ~5 mm to 3mm Current situation 100k points 10k tracks / event 5

  6. Tracking outside HEP …is very different TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  7. TrackML in a nutshell • Accurate simulation engine (ACTS https://gitlab.cern.ch/acts/acts-core) to produce realistic events • Ttbar events with 200 pileup • Silicon detector with barrels and disks (simplified HL-LHC ATLAS or CMS Si detector) • One file with list of 3D points • Ground truth : one file with point to particle association • Ground truth auxiliary : true particle parameter (origin, direction, curvature) • Typical events with ~200 parasitic collisions (~10.000 tracks/event) • Large training sample 10k events, 0.1 billion tracks, 1 billion points, ~100GByte • Accuracy phase (May to August 2018) on Kaggle • Participants are given the test sample (with usual split for public and private leaderboard) and run the evaluation to find the tracks • They should upload the tracks they have found • A track is a list of 3D points • Score : fraction of points correctly grouped together • Evaluation on test sample with per-mille precision on 100 event • Throughput phase Sep to Mar 2019 on Codalab • Participants submit their code to solve the same probmem • Strong CPU incentive TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  8. From domain to challenge and back Challenge organisation Challenge Domaine.g. HEP simplify ~years Problem Problem The crowd solves the challenge problem Domain experts solve the domain problem ~ months reimport ~years Solution Solution TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  9. TrackMLtimeline Slow maturation of the problem Building of the team Decide to Focus on pattern recognition Define the score Codalabimplementation Kaggleimplementation, documentation etc… 2D Hackathon Orsay HSF workshop CodalabThroughput challenge ACTS simulation Datasetdefinition KaggleAccuracy challenge Seattle challenge CTD CERN Grande Final Spin-offs ……… Kick off Oct 2018 Mar 2019 May Aug 2018 Jul 2019 Mar 2017 Mar 2015 Mar 2018 TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  10. Dataset 3D points TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  11. Dataset 3D points tracks TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  12. Score Weneed1number to specify how good an algorithmis! plus CPU time • Bigdecision : score is ~ « the weighted fraction of hits correctlyassociated ». Include all tracksabove 150MeV 2017 CMS tracker Technical Design Report : Chapter 6 expected performance 31 pages 58 figures ATLAS Si strip Technical Design ReportChapter 4 ITk Performance and Physics Benchmark Studies 54 pages 80 figures TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  13. Real life vs challenge • One event type (ttbar) • ACTS (MS, energy loss, hadronic interaction, solenoidal magnetic field, inefficiency) • Cylinders and slabs • Simple, ideal, geometry (cylinders and disks) • No hit merging • Disallow shared hits • Output is hit clustering • Single number metrics Simpler, but not too simple! Wide type of physics events Full detailed Geant 4 / data Detailed dead matter description Complex geometry (tilted modules, double layers, misalignments…) Hit merging Allow shared hits Output is hit clustering, track parameter and covariance matrix Multiple metrics (see TDR’s) TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  14. Evolution of leaderboard TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  15. Final Leaderboard HEP HEP Only public LB to private LB rank change TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand 

  16. Experiencewith first phase • 630 participants • Someonlydownloadedprovided solutions, but >100 didprovide original code (or tuning of existing code) • Lots of exchange on the forum • People googling courses on HEP tracking… • Exchangingideas, and even code… • …up to a certain point (score <=50%) • A variety of algorithmswithvariousrole for ML TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  17. e.g. Participant Data Analysis Weprovided a data visualisation notebook: but participants didmuchbetterwithintwodays: Seelink TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  18. Efficiency all Optimising score == optimise physicsquantities See talk Laurent Basara, thisconference TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  19. Throughput Phase Now participants submittheir software… … and are evaluated on accuracy AND speed ! Launched 6th Sep 2018 until 12th March 2019 on Codalab

  20. Throughputplatform • Kaggleinitiallytold us theywouldalsoprovide the speed estimate… • …but theysuddenlydeclined. • …sowediditourself on Codalab, with U Paris-Sud resources. • Specificdifficulties: • Speed measurementreproducibility no betterthan 3% (even on dedicated machine) • Many hacks anticipated (e.g. dumping the data in the log file…) • More hacks for sure… • decision : remeasure speed at the end of the competitionmany times on a dedicated machine • itworked • Providing for competitionwithaccurate online time measurementis an open problem (Kaggleisworking on it, given the demand, seee.g. « the Airbus ShipDetection challenge ») TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  21. Throughput phase LB Privateleaderboard 0.943 HEP 0.60 1.16 0.944 HEP 1.00 1.12 ~HEP 0.927 7.41 0.897 0.895 13.7 0.770 TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  22. Wheredid ML people go ? • 100 participants registered on Codalab but only 10 submitted non trivial code. Why ? Our guesses: • Kagglevisibility vs Codalabvisibility. • On Kaggle people win points acrosscompetition, canaccess « Grand master status », etc… veryvaluable on their CV • « Professional » kagglers move from one challenge to the next. No interest in long terminvolvement • (stillwehadsomepraiseslike « mostinteresting challenge I hadeverdone ») • Codalabis a researchplatform • No GPU (while ML code « naturally » run on GPU) • C++ vs python : python wasallowed but people realisetheyhad to write in C++ for speed. Many ML people do not know C++ • Not completely trivial effort to properly wrap code for submission TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  23. HEP wins at the end • The podium are HEP experts. Wasitworthit ? • Definitely : best solutions in <1 s to becompared to >10 s for ATLAS or CMS (order of magnitude comparison) • HEP people liked the gamification of the problem. • Also one is ALICE, one is ATLAS, one isComputing Center management. • The datasetwillbereleased on CERN Open Data portal for future development • Alreadyused in researchpaperse.g. trackingwith quantum computing(see talk in CERN Grand Finale workshop) • On goingwork to integrate the best ideas (of both phases) in future algorithms for ATLAS and CMS TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  24. Visualisation spin-off Visit at CERN Tobias Isenberg visualisation scientist at LRI-Orsay with PhD studentXiyao Wang They are usingTrackMLdataset to experimentwith visualisation/interaction with Microsoft’ Hololens(see talk in CERN Grand Finale workshop) TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  25. TrackMLConferencetalks Connecting The Dots 2015 Seattle Connecting The Dots 2016 Vienna CHEP 2016 Okinawa Connecting The Dots / Intelligent Trackers 2017 Orsay NeurIPS 2017 Los Angeles CiML workshop Connecting The Dots 2018 Seattle CHEP 2018 Sofia WCCI 2018 Rio de Janeiro ICHEP 2018 Seoul IEEE NSSMIC 2018 Sidney IEEE eScience 2018 Amsterdam NeurIPS 2018 MontrealCompetition workshop ACAT 2019 Saas-Fe Connecting The Dots 2019 Valencia EPS 2019 Ghent CHEP 2019 Adelaïde …and much more workshops and seminars…. TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

  26. Useful links • See also Laurent Basara’s talk in Detector and Data Handling session Friday 12:45, about the algorithms exposed • Contact : trackml.contact@gmail.comhttps://sites.google.com/site/trackmlparticle Twitter : @trackmllhc • Accuracy phase @ Kaggle : https://www.kaggle.com/c/trackml-particle-identification • chapter in the NeurIPS 2018 Competition book arXiv:1904.06778 final version just released • Throughput phase @ Codalab : https://competitions.codalab.org/competitions/20112 • Write-up being finalized • CERN Grand Finale workshop 1-2 Jul 2019 : https://indico.cern.ch/event/813759/ TrackML, David Rousseau, EPS-HEP outreach , Jul 2019, Gand

More Related