1 / 47

Integrated Performance Analysis in the Uintah Computational Framework

Integrated Performance Analysis in the Uintah Computational Framework. Steven G. Parker

hedya
Télécharger la présentation

Integrated Performance Analysis in the Uintah Computational Framework

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integrated Performance Analysisin theUintah Computational Framework Steven G. Parker Allen Morris, Scott Bardenhagen, Biswajit Banerje, James Bigler, Jared Campbell, Curtis Larsen, Dav De St. Germain, Dawen Li, Divya Ramachandran, David Walter, Jim Guilkey, Todd Harman, John Schmidt, Jesse Hall, Jun Wang, Kurt Zimmerman, John McCorquodale, Misha Ovchinnikov, Jason Morgan, Nick Benson, Phil Sutton, Rajesh Rawat, Scott Morris, Seshadri Kumar, Steven Parker, Jennifer Spinti, Honglai Tan, Wing Yee, Wayne Witzel, Xiaodong Chen, Runing Zhang

  2. The Beginning • C-SAFE funded in September 1997 • SCIRun PSE existed: • Shared memory only • Combustion code existed: • NOT parallel • Steady state, NOT transient • C-SAFE MPM code did not exist • C-SAFE ICE code did not exist ? ?

  3. ASCI

  4. Example situation

  5. C-SAFEGoal

  6. Now: Scalable Simulations • September 2001 • SCIRun Uintah: • Distributed memory, CCA-based component model • Shared-memory visualization tools • Arches: • Modular, parallel, transient • C-SAFE MPM: • Modular, parallel, transient • C-SAFE ICE: • Modular, parallel, transient • Coupled with MPM

  7. C-SAFE High Level Architecture Problem Specification Scheduler Subgrid Model Chemistry Database Controller Chemistry Databases Mixing Model Numerical Solvers Fluid Model Simulation Controller High Energy Simulations Material Properties Database Numerical Solvers MPM Non-PSE Components Post Processing And Analysis Parallel Services Data Manager Resource Management Implicitly Connected to All Components Visualization Performance Analysis UCF Database Data PSE Components Control / Light Data Checkpointing Blazer How did we get here? • Designed and implemented a parallel component architecture (Uintah) • Designed and implemented the Uintah Computational Framework (UCF) on top of the component architecture

  8. Introduction to Components

  9. Introduction to Components • Good Fences make Good Neighbors • A component architecture is all about building (and sometimes enforcing) the fences • Popular in the software industry (Microsoft COM, CORBA, Enterprise Java Beans) • Commercial component architectures not suitable for Scientific Computing (CCA Forum organized to address this point) • Visual programming sometimes used to connect components together

  10. Parallel Components Fluid Model Simulation Controller • Two ways to split up work • Task based • Data based • (Or a combination) • Which is right? • Key point: Components, by definition, make local decisions However, parallelism (scalable) is a global decision MPM Data Manager

  11. Uintah Scalability Challenges Wide range of computational loads, due to: • AMR • Particles in subset of space • Cost of ODE solvers can vary spatially • Radiation models • Architectural communication limitations

  12. UCF Architecture Overview • Application programmers provide: • A description of the computation (tasks and variables) • Code to perform each task on a single Patch (subregion of space) • C++ or Fortran supported • UCF uses this information to create a scalable parallel simulation

  13. Problem Specification XML Simulation (One of Arches, ICE, MPM, MPMICE, MPMArches, …) Load Balancer Callbacks Assignments Tasks Data Archiver Scheduler Tasks Configuration Callbacks MPI How Does It Work? Simulation Controller

  14. How does the scheduler work? • Scheduler component uses description of computation to create a taskgraph • Taskgraph gets mapped to processing resources using the Load Balancer component

  15. What is a graph?

  16. CS Graphs: Vertex or Node Edge A B C Taskgraph: A graph where the nodes are tasks (jobs) to be performed, and the edges are dependencies between those tasks D

  17. Example Taskgraphs

  18. Example Taskgraphs

  19. Example Taskgraphs

  20. Taskgraph advantages • Can accommodate flexible integration needs • Can accommodate a wide range of unforeseen work loads • Can accommodate a mix of static and dynamic load balance • Helps manage complexity of a mixed threads/MPI programming model • Allows pieces (including the scheduler) to evolve independently

  21. Looking forward to AMR • Entire UCF infrastructure is designed around complex meshes • Able to achieve scalability like a structured grid code • Some codes can currently handle irregular boundaries

  22. Achieving scalability • Parallel Taskgraph implementation • Use 125 (of 128) processors per box • Remaining 3 perform O/S functions • 125 processors organized into 5x5x5 cube • Multiple boxes by abutting cubes • Nirvana load balancer performs this mapping for regular grid problems

  23. Performance Analysis Tools • Integrated Tools • TAU calls describe costs for each Task • Post-processing tools for: • Average/Standard Deviation Timings • Critical path/Near-critical path analysis • Performance regression testing • Load imbalance • TAU/VAMPIR Analysis

  24. Tuning and Analysis Utilities (TAU) • Integration of TAU from Oregon • Working with Allen Malony and friends to help with the integration • Have identified bottlenecks and this influenced design of new scalable scheduler • Have identified numerous ways in which to collaborate in the future

  25. MPM Simulation 27 processors

  26. Arches Simulation 40 of 125 processors

  27. XPARE • Performance Tuning typically done only for final products • Or sometimes just one/twice during development • Performance Analysis throughout development process • Retrospective analysis possible • Understanding impact of design decisions • More informed optimization later

  28. XPARE • Regression Analyzer: alerts parties of violations of the thresholds • Comparison tool: used by the automation system to report violations. Also can be run manually • Integrated in a weekly testing harness for the Uintah / C-SAFE • Performance comparisons • Compiler flags • O/S upgrades • Platforms

  29. XPARE • Alan Morris – Utah • Allen D. Malony - Oregon • Sameer S. Shende - Oregon • J. Davison de St. Germain - Utah • Steven G. Parker - Utah • XPARE - eXPeriment Alerting and REporting • http://www.acl.lanl.gov/tau/xpare

  30. Load balancing • Taskgraph provides a nice mechanism for flexible load-balancing algorithms • To date: simple, static mechanisms have sufficed • But, we are outgrowing those

  31. Real-world scalability • Parallel I/O • Parallel compiles • Production run obtained speedup of 1.95 going from 500 to 1000 processors

  32. New scalability - MPM

  33. Breakdown

  34. Mixed MPI/Thread scheduler • Most ASCI platforms have SMP nodes • Multi-threading and asynchronous MPI could give us ~2X speed improvement • SGI MPI Implementation is supposedly thread-safe, but….

  35. Network traffic into Utah Visual Supercomputing Center 2 hour average 1 day average

  36. Volume Rendering

  37. Volume Rendering

  38. MPM Simulation - 500 processors 6.8 million particles, 22 timesteps interactively visualized using the real-time ray tracer (6-10 fps)

  39. RTRT with MPM Data

  40. Other SCIRun Applications

  41. Geo Sciences

  42. Conclusions • Holistic performance approach • Architecture • Tools • Scalability achieved, now we can keep it

More Related