50 likes | 179 Vues
This workshop presented by Prof. Vladimir Voevodin, Deputy Director of the Research Computing Center at Moscow State University, highlights critical performance analysis tools for High Performance Computing (HPC). It addresses the pervasive issue of low efficiency in supercomputers across various applications. Key sources of performance losses are discussed, including system policies and load imbalance. The workshop proposes developing a unified software toolset that integrates existing and new tools for detecting and addressing efficiency issues in supercomputers, thereby enhancing user experience and resource utilization at scale.
E N D
Workshop EU – Russia Joint Call in High Performance Computing Prof. VLADIMIR VOEVODIN Deputy Director, Research Computing Center, Moscow State UniversityCorresponding member of Russian Academy of Sciences, voevodin@parallel.ru25 March 2010, Brussels
Moscow State University1755 – 201030+ Faculties350+ Departments5 major Research InstitutesMore than 40 000 students, 2500 full doctors, 6000 PhDs,1000+ full professors,5000 researchers.
Research Computing Center, MSU1955 – 201020 Laboratories,220+ Researchers,50 PhDs and 25 Full Doctors, RCC MSU – Supercomputing Center #1 in Russia There are 25 Doctors of Sciences in RCC
Performance analysis tools for HPC (what problems should the project address?) • Most supercomputers have extremely low efficiency for a very wide range of applications. • Most users have no information about their programs after submitting to a queue… • “Low efficiency” can’t be explained by one reason, this is a complex problem… • Sources of losses: policies and quotas of batch systems, RTSs, compilers, communications overheads, load imbalance, Amdahl’s law, a memory wall… • There is neither a unified approach nor a software tool to detect sources of losses for users (for a particular task) and system administrators (for the whole supercomputer).
Performance analysis tools for HPC (some key points of the project) • Create an integrated environment that combines existing and new tools from the level of batch systems down to hardware monitors. • Detect all sources of losses on the level of a particular task, a particular user, and the entire supercomputer. • Collect, analyze, filter and display data in a scalable way up to exascale range systems. • Analyze instrumented as well as non-instrumented tasks. • Target architecture – clusters of SMP nodes with multicore processors and GPUs.