1 / 15

Christine MORIN PARIS project-team, IRISA/INRIA (Rennes, France)

Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters. Christine MORIN PARIS project-team, IRISA/INRIA (Rennes, France). Motivation. Clusters as an alternative to multiprocessor machines for high performance computing

marty
Télécharger la présentation

Christine MORIN PARIS project-team, IRISA/INRIA (Rennes, France)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA (Rennes, France)

  2. Motivation • Clusters as an alternative to multiprocessor machines for high performance computing • Workloads of scientific applications • Independent sequential processes • Compute intensive, huge memory requirements • Parallel applications • Shared memory (multithreaded applications, OpenMP) • Message passing (MPI) • Hybrid applications

  3. Some Issues … • No obvious solution to support standard Posix multithreaded applications on clusters • Memory distribution • Need of efficient placement and load-balancing strategies to take advantage of all cluster resources • Efficient process migration • Scientific applications execution time may be greater than the cluster MTBF • High availability and checkpointing

  4. Single System Image Operating System • Vision of a single machine (virtual SMP) • Same interface as a traditional OS for an SMP machine • Same vision for all applications • Efficiency • Properties of a SSI OS • Resource distribution transparency • Intra- and inter- application resource sharing • High availability • Scalability

  5. Kerrighed SSI OS • Combining high performance, high availability and ease of programming • Global resource management • Processor, memory, disk • Integrated resource management • Dynamic resource management • To deal with configuration changes • Extension of the standard OS running on each node • Small clusters • < 100 nodes

  6. Outline • Global process management • Global memory management • Conclusion and Perspectives

  7. Global Process Management • Global scheduling policy • Load balancing • Several policies • Configurable modular global scheduler • The policy can be changed without stopping the operating system or the applications • The local scheduler on each node is not modified

  8. Architecture of the Global Scheduler Global scheduler Global scheduler Local Analyzers Local Analyzers Monitors Monitors Standard OS Standard OS Node 1 Node 2

  9. Memory Memory Disk Disk Network Network Process Management Mechanisms Global scheduler (Application management) Global scheduler (Application management) Process creation Process checkpt Process migration Process creation Process checkpt Process migration Process state extraction Process state extraction

  10. Checkpointing • Common mechanisms for supporting checkpointing protocols for both shared memory and message-passing applications • Efficient checkpoint creation • Several memory checkpoints between two disk checkpoints • Disk checkpoints stored on local disks • Incremental checkpoints • Combination of data replication for efficiency and for high availability for shared memory applications • Data replication due to data sharing exploited to decrease the cost of checkpoint creation • Recovery data can be used for the computation until the first modification

  11. Process Migration • Communicating processes can migrate • Processes sharing memory • Processes communicating with data streams (sockets, pipes, …) • Efficiency of the process transfer • Address space transfered on demand (containers) • Efficiency of the process execution after migration • Efficient access to open files (containers) • Global management of data streams

  12. Global Memory Management • Different services • Shared virtual memory • Remote paging • Cooperative file cache • A unique concept: the container • Software object to store and share data cluster wide (COMA like management) • Global management of physical memory • Segments of a process address space, files are associated to containers

  13. Host Operating System Host Operating System File System File System VM Manager VM Manager Linker Linker Linker Linker Container Linker Linker Linker Linker Disk Manager Disk Manager Memory Manager Memory Manager Disk Disk Memory Memory Integration of Containers in a Standard OS

  14. Conclusion & Perspectives • A SSI OS for clusters is still missing in 2003 • Kerrighed represents a promising approach • A first prototype based on Linux is available • Current work directions • High availability and checkpointing • OpenMP on Kerrighed • Experimentation with industrial applications • EDF, DGA • Grid-aware OS for a federation of clusters

  15. http://www.kerrighed.org Kerrighed has been filed as a communitytrademark.

More Related