1 / 29

HPC computing at CERN  - use cases from the engineering and physics communities

HPC computing at CERN  - use cases from the engineering and physics communities. Michal HUSEJKO, Ioannis AGTZIDIS IT/PES/ES. Agenda. Introduction – Where we are now CERN used applications requiring HPC infrastructure User cases (Engineering) Ansys Mechanical Ansys Fluent

pennie
Télécharger la présentation

HPC computing at CERN  - use cases from the engineering and physics communities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HPC computing at CERN - use cases from the engineering and physics communities Michal HUSEJKO, Ioannis AGTZIDIS IT/PES/ES

  2. Agenda • Introduction – Where we are now • CERN used applications requiring HPC infrastructure • User cases (Engineering) • Ansys Mechanical • Ansys Fluent • Physics HPC applications • Next steps • Q&A

  3. Agenda • Introduction – Where we are now • CERN used applications requiring HPC infrastructure • User cases (Engineering) • Ansys Mechanical • Ansys Fluent • Physics HPC applications • Next steps • Q&A

  4. Introduction • Some 95% of our applications are served well with bread-and-butter machines • We (CERN IT) have invested heavily in AI including layered approach to responsibilities, virtualization, private cloud. • There are certain applications, traditionally called HPC applications, which have different requirements • Even though these applications sail under common HPC name, they are different and have different requirements • These applications need detailed requirements analysis

  5. Scope of talk • We contacted our user community and started to gather continuously user requirements • We have started detailed system analysis of our HPC applications to gain knowledge of their behavior. • In this talk I would like to present the progress and the next steps • At a later stage, we will look how the HPC requirements can fit into the IT infrastructure

  6. HPC applications • Engineering applications: • Used at CERN in different departments to model and design parts of the LHC machine. • IT-PES-ES section is supporting the user community of these tools • Tools used for: structural analysis, Fluid Dynamics, Electromagnetics, Multiphysics • Major commercial tools: Ansys, Fluent, HFSS, Comsol, CST • but also open source: OpenFOAM (fluid dynamics) • Physics simulation applications • PH-TH Lattice QCD simulations • BE LINAC4 plasma simulations • BE beams simulation (CLIC, LHC etc) • HEP simulation applications for theory and accelerator physics

  7. Agenda • Introduction – Where we are now • CERN used applications requiring HPC infrastructure • User cases (Engineering) • Ansys Mechanical • Ansys Fluent • Physics HPC applications • Next steps • Q&A

  8. Use case 1: Ansys Mechanical • Where? • LINAC4 Beam Dump System • Who ? • Ivo Vicente Leitao, Mechanical Engineer (EN/STI/TCD) • How ? • Ansys Mechanical for design modeling and simulations (stress and thermal structural analysis) Use case 1: Ansys Mechanical

  9. How does it work ? • Ansys Mechanical • Structural analysis: stress and thermal, steady and transient • Finite Element Method • We have physical problem defined by differential equations • It is impossible to analytically solve it for complicated structure (problem) • We divide problem into subdomains (elements) • We solve differential equations (numerically) for selected points (nodes) • And then by the means of approximation functions we project solution tothe global structure • Example has 6.0 Million (6M0) of mesh nodes • Compute intensive • Memory intensive Use case 1: Ansys Mechanical

  10. Use case 1: Ansys Mechanical

  11. Simulation results • Measurement hardware configuration: • 2x HP 580 G7 server (4x E7-8837, 512 GB RAM, 32c), 10 Gb low latency Ethernet link • Time to obtain single cycle 6M0 solution: • 8 cores -> 63 hours to finish simulation, 60 GB RAM used during simulation • 64 cores -> 17 hours to finish simulation, 2*200 GB RAM used during simulation • User interested in 50 cycles: would need 130 days on 8 cores, or 31 days on 64 cores • It is impossible to get simulation results for this case in a reasonable time on a standard user engineering workstation Use case 1: Ansys Mechanical

  12. Challenges • Why do we care ? • Everyday we are facing users asking us how to speed up some engineering application • Challenges • Problem size and its complexity are challenging user computer workstations in terms of computing power, memory size, and file I/O • This can be extrapolated to other Engineering HPC applications • How to solve the problem ? • Can we use current infrastructure to provide a platform for these demanding applications ? • … or do we need something completely new ? • … and if something new, how this could fit into our IT infrastructure • So, let’s have a look at what is happening behind the scene

  13. Analysis tools • Standard Linux performance monitoring tools used: • Memory usage: sar, • Memory bandwidth: Intel PCM (Performance Counter Monitor, open source) • CPU usage: iostat, dstat • Disk I/O : dstat • Network traffic monitoring: netstat • Monitoring scripts started from the same node where the simulation job is started. Collection of measurement results is done automatically by our tools.

  14. Multi-core scalability • Measurement info: • LINAC4 beam dump system, single cycle simulation • 64c@1TB, 2 nodes of (quad socket Westmere, E7-8837, 512 GB), 10 Gb iWARP • Results: • AnsysMechnical simulation scales well beyond single multi-core box. • Greatly improved number of jobs/week, or simulation cycles/week • Next steps: scale on more than two nodes and measure impact of MPI • Conclusion • Multi-core platforms neededto finish simulation in reasonabletime Use case 1 : Ansys Mechanical

  15. Memory requirements • In-core/out-core simulations (avoiding costly file I/O) • In-core = most of temporary data is stored in the RAM (still can write to disk during simulation) • Out-of-core = uses files on file system to store temporary data. • Preferable mode is in-core to avoid costly disk I/O accesses, but this requires increased RAM memory and its bandwidth • Ansys Mechanical (and some other engineering applications) has limited scalability • Depends heavily on solver and user problem • All commercial engineering application use some licencing scheme, which can put skew on choice of a platform • Conclusion: • We are investigating if we can spread required memory on multiple dual socket systems, or 4 socket systems are necessary for some HPC applications • There are certain engineering simulations which seem to be limited by a memory bandwidth, this has to be also considered when choosing a platform Use case 1 : Ansys Mechanical

  16. Disk I/O impact • Ansys Mechanical • BE CLIC test system • Two Supermicro servers (dual E5-2650, 128 GB), 10 Gb iWARP back to back. • Disk I/O impact on speedup. Two configurations compared. • Measured with sar, and iostat • Applications spends a lot of time in iowait • Using SSD instead of HDD increasesjobs/week by almost 100 % • Conclusion: • We need to investigate more casesto see if this is a marginal caseor something more common Use case 3 : Ansys Mechanical

  17. Agenda • Introduction – Where we are now • CERN used applications requiring HPC infrastructure • User cases (Engineering) • Ansys Mechanical • Ansys Fluent • Physics HPC applications • Next steps • Q&A

  18. Use case 2: Fluent CFD • Computational Fluid Dynamics (CFD) application, Fluent (now provided by Ansys) • Beam dump system at PS booster. • Heat is generated inside the dump and you need to cool it in order to avoid it to melt or break because of mechanical stresses. • Extensively parallelized MPI-based software • Performance characteristics similar to other MPI-based software: • Importance of low latency for short messages • Importance of bandwidth for medium and big messages

  19. Interconnect network latency impact • Ansys Fluent • CFD “heavy” test case from CFD group ( EN-CV-PJ) • 64c@1TB, 2 nodes of (quad socket Westmere, E7-8837, 512 GB), 10 Gb iWARP • Speedup beyond single node can be diminished because of high latency interconnect. • The graph shows good scalability for 10 Gb low latency beyond single box, and dips in performance when switched to 1 Gb for node to node MPI • Next step: Perform MPI statistical analysis (size and type of messages, computation vs. communication) Use case 2 : Fluent

  20. Memory bandwidth impact • Ansys Fluent: • Measured with Intel PCM • SupermicroSandyBridge server (Dual E5-2650), 102.5 GB/s peak memory bandwidth • Observed “few” seconds peaks demanding 57 GB/s, during period=5s. This is very close to numbers measured with STREAMsynthetic benchmark on this platform. • Memory bandwidth measured with Intel PCM at memory controller level • Next step: check impact of memory speed on solution time Use case 2 : Fluent

  21. Analysis done so far • We have invested our time to build first generation of tools in order to monitor different system parameters • Multi-core scalability (Ansys Mechanical) • Memory size requirements (Ansys Mechanical) • Memory bandwidth requirements (Fluent) • Interconnect network (Fluent) • File I/O (Ansys Mechanical) • Redo some parts • Westmere4 sockets -> SandyBridge 4 sockets • Next steps: • Start performing detailed interconnect monitoring by using MPI tracing tools (Intel Trace Analyzer and Collector)

  22. Agenda • Introduction – Where we are now • CERN used applications requiring HPC infrastructure • User cases (Engineering) • Ansys Mechanical • Ansys Fluent • Physics HPC applications • Next steps • Q&A

  23. Physics HPC applications • PH-TH: • Lattice QCD simulations • BE LINAC4 plasma simulations: • plasma formation in the Linac4 ion source • BE CLIC simulations: • preservation of the Luminosity in time, under the effects of dynamic imperfections, such as vibrations, ground motion, failures of accelerator components

  24. Lattice QCD • MPI based application with inline assembly in the most time-critical parts of the program • Main objective is to investigate: • Impact of memory bandwidth on performance • Impact of interconnection network on performance (comparison of 10 Gb iWARP and Infiniband QDR)

  25. BE LINAC4 Plasma studies • MPI based application • Users requesting system with 250 GB of RAM for 48 cores. • Main objective is to investigate: • Scalability of application beyond 48 cores for a reason of spreading memory requirement on more cores than 48

  26. Clusters • To better understand requirements of CERN Physics HPC applications two clusters have been prepared • Investigate Scalability • Investigate importance of interconnect, memory bandwidth and file i/o • Test configuration • 20x Sandy Bridge dual socket nodes with 10 Gb iWARP low latency link • 16x Sandy Bridge dual socket nodes with Quad Data Rate (40Gb/s) Infiniband

  27. Agenda • Introduction – Where we are now • CERN used applications requiring HPC infrastructure • User cases (Engineering) • Ansys Mechanical • Ansys Fluent • Physics HPC applications • Next steps • Q&A

  28. Next steps • Activity started to better understand requirements of CERN HPC applications • The standard Linux performance monitoring tools give us a very detailed overview of system behavior for different applications • Next steps are to: • Refine our approach and our scripts to work at higher scale (first target 20 nodes). • gain more knowledge about impact of interconnection network on MPI jobs

  29. Thank you Q&A

More Related