120 likes | 262 Vues
This document discusses the interplay between open source tools and parallel computing, addressing the philosophy behind promoting parallel software development. It examines the potential of MPI2, user needs, runtime tools, and debugging challenges. The importance of collaboration with academia and commercial partners is emphasized alongside the need for standard interfaces in parallel environments to facilitate development and support. While open source offers promising solutions, it also presents significant hurdles that need to be addressed for effective implementation in complex systems.
E N D
OS, MESSAGE PASSING & RUNTIME TOOLS Parallel software promotion philosophy OpenSource - How rosy is the promise? MPI2 - What features? RTS - How far parallel do we need to go? How might OpenSource accelerate new tools? Panel Comments by Mary Zosel ASCI PSE / ASDE LLNL For Fourth Workshop on Distributed Supercomputers
Philosophy - for promoting parallel simulation development environment • Standards - promote and encourage use • Set high platform software expectations • Software in procurements • ISV support gives portability and 2nd source • Keep academia involved • Need their ideas & need their students • Local prototypes where needed • Preferably partnership with commercial partner • Full local support only as last resort • It’s fun when new - but costly burden later So where in this picture does OpenSource fit ??? It facilitates academia and prototyping, but the support issue is a concern. UCRL-VG-137868
OpenSource - Does it measure up to the promise? Disclaimer --- I haven’t been actively involved in this area, but at second-look, it isn’t as promising as it first seems. There is a lot of good and successful opensource software But there are also red-flags … • One promising OpenSource tool we picked up was so full of use of platform specific “.h” files that we couldn’t make it build anywhere else. • Another OpenSource promise for a key library we were counting on evaporated. • The lawyers are still there - and source release isn’t easy. • All the usual gnu-software restriction issues … • Software-police issues will be interesting … UCRL-VG-137868
MPI2 - What do the users need? • MPI-I/O • Thread - safety … actually need more support than MPI2 gives us • Dynamic process control - starting to get queries about this • Language bindings • They say they want one-sided • Various “abstraction” features (info, error…) UCRL-VG-137868
Runtime tools for 1000s of cpu’s. • Yes - the users are asking for debugger support. • My code seems to be hung - what’s it doing? • My code is growing after a couple of hours why? • Where is all my memory going and why? • (Similar set of questions for performance issues.) • Easy to provide? No … • Tool infrastucture needs to be designed for scalability • Obvious gui and data presentation issues • User debug time ties up resources - another challenge • Access to resources for development- even “on-site” • But there are solutions in the works …e.g. • Variety of collapsing and filtering of data • Macros together with good CLI look promising UCRL-VG-137868
ORIGINAL STRUCT ARRAY Just the values of “val1” struct member UCRL-VG-137868
Sorted array values Checksum of same array UCRL-VG-137868 UCRL-VG-137868
LCB View of task and thread-state can be dumped anytime application is stopped. color code tells how many processes are where. UCRL-VG-137868
Root window collapsed Same Root window opened to show all tasks UCRL-VG-137868
Can set any of the counters to any of it’s settings Set Counter 1 Set Counter 2 Set Counter 3 Set Counter 4 Activate Counters Stop Counters Update Counters Zero Counters ----------------------- Close Window Close All Similar Windows Save Window to File... Reexecute Last Save Window Help Nothing MFLOPS % branch mispredictions L2 Data cache miss rate ------------------------- CPU Cycles Instructions Completed Instruction Cache Misses Integer Instructions Completed Floating Instructions Completed dtlb misses (not speculative) Branch Mispredictions ------------------------- Time Base bit transition Reservations requested Values by thread UCRL-VG-137868
Info about which task is using max and min memory. Memory info about all the tasks Can watch how (and which) tasks grow UCRL-VG-137868
Will OpenSource help RTS tools ? and how? The biggest barrier to more tools - especially from academia - is the problem of no standard interface with parallel runtime environment - no easy way to attach-to and communicate with parallel job. If the parallel OpenSource community could come up with a (simple) scalable parallel control-daemon interface - that would be a big help to opening this area to development. There are several places interested in a parallel-tools infrastructure components “kit” - but this item is the big drawback to portability. UCRL-VG-137868