160 likes | 278 Vues
This paper discusses advanced software tools designed for efficient dynamic resource management in computational environments, specifically tailored for scientific research across various domains, including hydroaerodynamics and plasma physics. It addresses current challenges in resource allocation and task management while presenting the benefits of integrating computational resources from multiple scientific centers. Focused on optimizing the distribution of tasks via tools like Codine, SunGrid Engine, and PBS, the project aims to improve the efficiency of computing resources and user service quality through the migration of parallel tasks and enhanced monitoring capabilities.
E N D
Software Tools for Dynamic Resource Management Irina V. Shoshmina, Dmitry Yu. Malashonok, Sergay Yu. Romanov Institute of High-Performance Computing and Information Systems www.csa.ru {irena,mal,serrom}@csa.ru
Resources: CONVEX(es) Parsytec CC/16 Parsytec CCid Parsytec Power Mouse System SPP1600 SGI OCTANE Workstations SunUltra 450 Paritet (intel cluster) www.csa.ru/CSA Scientific problems: hydroaerodynamics plasma nuclear physics medicine biology chemistry astronomy State of the art
Difficulties • shortage of resources for soluble scientific problems • unsatisfactory management of tasks (the majority of tasks are parallel)
Shortage of resources integrate computational resources of several scientific centres Advantages of integration • increase access and activity of usage of computational resources, • promote an integration of scientific community, • increase the range of resolving scientific and technical problems
Management of tasks Tools optimisation of task distribution on computational nodes • Codine • SunGridEngine • PBS • Condor Disadvantages of tools • weak support of migration of parallel tasks • unsatisfactory load balancing • dependence on versions of PVM and MPI
Main goals of the project • increase of efficiency of use of computing resources • improvement of quality of service of the users Main tasks • migration of parallel tasks • optimisation of distributed resource management • integration resources of several scientific centres
Dynamite software developed by University of Amsterdam in the Esprit project 23499 Dynamite advantages • migration and checkpointing of PVM tasks • automatic work-load balancing of PVM tasks (on a cluster of workstations) • migration of dynamically linked tasks • migration of communication end points • reallocation of tasks
Dynamite disadvantages • dependence on the PVM versions • absence of migration of MPI tasks • absence of satisfactory monitoring system • absence of advanced scheduling system • absence of modules of global distribution
Main steps of the project • Migration of MPI and PVM tasks • Checkpointing of parallel tasks • Monitoring • Resource management • Addition architectures
Global level Local level Local level Two-level system
Migration of PVM and MPI tasks Main problems of migration • migration of PVM tasks • migration of MPI tasks • independence from versions and realisations of PVM and MPI • addition of architectures • files • sockets • kernel supported threads and etc.
Checkpointing of parallel tasks • trace development of parallel tasks • migrate parallel tasks at two levels • migrate of a process of a parallel task (local level) • migrate of a parallel task wholly (global level) • process extreme situations
Checkpointing of parallel tasks Global level local level local level local level
Monitoring Parameters of • computational resources (loading of processors, memory, network), • tasks and queues, • users
Resource management • distribution of tasks and queues at the moment • long-time scheduling • dynamic load balancing at global and local levels
Globus Global environment local level local level local level local level local level local level Integration with Globus