1 / 25

DMTCP: A Powerful Checkpointing Mechanism for Linux Jobs

DMTCP is a robust Linux checkpointing tool developed at Northeastern U. and MIT. It supports sequential and multi-threaded computations, transparently working in user space without kernel modules. This checkpoint/restart system eliminates common restrictions like no pthreads and no mmap() support. DMTCP is widely available under LGPL license, with no recompiling required. Checkpoint compression on-the-fly, a stateless synchronization server, and additional wrappers for process and thread virtualization are among its key features. It can seamlessly work with various applications like MPICH-2, OpenMPI, Python, and more. Planned support includes applications like Bash and Matlab. The integration with Condor is experimental, aiming to explore scalability and stability.

vfoley
Télécharger la présentation

DMTCP: A Powerful Checkpointing Mechanism for Linux Jobs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DMTCP: A New Linux Checkpointing Mechanism For Vanilla Universe Jobs

  2. Why DMTCP? • Why checkpoint at all? • Problems with Condor’s Standard Universe • Single process. • No pthreads. • No mmap() support. • Forced re-link to form a static executable. • DMTCP removes these restrictions!

  3. What is DMTCP? • Distributed Multi-Threaded CheckPointing. • Works with Linux Kernel 2.6.9 and later. • Supports sequential and multi-threaded computations across single/multiple hosts. • Entirely in user space (no kernel modules or root privilege). • Transparent (no recompiling, no re-linking). • Written at Northeastern U. and MIT and under active development for 4+ years. • LGPL’d and freely available. • No Remote I/O.

  4. Process Structure Coordinator Signal (USR2) DMTCP CT CT Process 1 Process N T1 T1 T2 Network Socket CT = DMTCP checkpoint thread T = User Thread

  5. How Does It Work? • ./dmtcp_checkpoint a.out # starts coordinator too • ./dmtcp_command –c # talks to coordinator • ./dmtcp_restart ckpt_a.out-*.dmtcp • Coordinator is a stateless synchronization server for the distributed checkpointing algorithm. • Checkpoint/Restart performance related to size of memory, disk write speed, and synchronization.

  6. How Does It Work? • LD_PRELOAD: Transparently preloads checkpoint libraries which installs libc wrappers and checkpointing code. • SIGUSR2: Used internally from checkpoint thread to user threads. • Wrappers: Only on less heavily used calls to libc • fork, exec, system, pipe, bind, listen, setsockopt, connect, accept, clone, close, ptsname, openlog, closelog, signal, sigaction, sigvec, sigblock, sigsetmask, sigprocmask, rt_sigprocmask, pthread_sigmask • Overhead is negligible.

  7. How Does It Work? • Additional wrappers when process id & thread id virtualization is enabled • getpid, getppid, gettid, tcgetpgrp, tcsetprgrp, getgrp, setpgrp, getsid, setsid, kill, tkill, tgkill, wait, waitpid, waitid, wait3, wait4

  8. How Does It Work? • Checkpoint image compression on-the-fly (default). • Currently only supports dynamically linking to libc.so. Support for static libc.a is feasible, but not implemented. • Stays close to POSIX API standards.

  9. A Checkpoint Under DMTCP • dmtcphijack.so & mtcp.so present in executable’s memory. • Ask coordinator process for checkpoint via dmtcp_command. • Now what happens?

  10. A Checkpoint Under DMTCP • Suspend user threads with SIGUSR2. • Elect shared file descriptor leaders. • Drain kernel buffers and do network handshake with peers. • Write checkpoint to disk. • Refill kernel buffers. • Resume user threads.

  11. Where Is the Checkpoint? • In the cwd of the application. • A set of ckpt_<exec>_<id>.dmtcp files. • In the cwd of the coordinator. • A dmtcp_restart_script.sh file. • The dmtcp_restart_script.sh may need tweaking depending upon circumstance.

  12. A Restart Under DMTCP • Restart Process loads in memory. • Reopen files and recreate ptys. • Recreate and reconnect sockets. • Fork into user processes. • Rearrange file descriptors to initial layout. • Restore memory and threads. • Refill kernel buffers. • Resume user threads.

  13. Supported OS Features • Threads, mutexes/semaphores, fork, exec • Shared memory (via mmap), TCP/IP sockets, UNIX domain sockets, pipes, ptys, terminal modes, ownership of controlling terminals, signal handlers, open and/or shared fds, I/O (including the readline library), parent-child process relationships, process id & thread id virtualization, session and process group ids, and more… • Trying to keep the implementation small!

  14. Supported Applications • MPICH-2, OpenMPI, SciPy/iPython, Python • cmsRun, Perl, Ruby, PHP, GHCi (Glasgow Haskell Compiler), Ocaml, Octave, Macaulay2, GNUPlot, slsh (S-Lang scripts), MZScheme, GST (Gnu Smalltalk virtual machine), tcsh, dash, csh, tclsh (tcl-based interpreter), SQLite. • And many others!

  15. Planned Application Support • Bash, gcl (GNU Common Lisp), maxima (based on gcl), and the Sun JVM. • These programs use sbrk() for their own memory management and induce a bug in DMTCP. • A fix is planned and will go in soon.

  16. Planned Application Support • Matlab • Directly calling the binary without graphics works, but matlab uses bash which needs the sbrk() fix.

  17. Condor/DMTCP Integration • Experimental at this time. • Determining scalability, stability, and extent of “weird edge cases” of DMTCP mixed with Condor. • Completely outside of Condor source code. • A vanilla job called “shim_dmtcp” that wraps the user’s job and stdfiles with DMTCP. • A submit description file which transfers needed dmtcp files over to the remote side and saves intermediate checkpoints. • No remote I/O!

  18. Shim Script Execution condor_starter shim_dmtcp Job Coordinator

  19. Submit File Example universe = vanilla executable = shim_dmtcp arguments = logfile stdinf stdoutf stderrf a.out arg0 arg1… should_transfer_files = YES when_to_transfer_output = ON_EVICT_OR_EXIT transfer_input_files = <dmtcp libraries and programs>,\ a.out, stdinf, stdoutf, stderrf environment = DMTCP_TMPDIR=./;JALIB_STDERR_PATH=/dev/null kill_sig = 2 output = shim.$(Cluster).$(Process).out error = shim.$(Cluster).$(Process).err log = shim.log queue

  20. Condor/DMTCP Integration • Early Results • It works with our test case and thousands of jobs. • Problems • Checkpointing between Physical Address Kernels and normal kernels is a challenge. • DMTCP’s API needs some improvement. • Coordinator failure means job failure. • Shim script is clunky, e.g. no streaming I/O. • Next: Integration into our stduniv test suite for full regression testing.

  21. Future Condor Integration • Add WantCheckpoint = True and CheckpointMethod = DMTCP for a vanilla universe job. • Condor takes care of the wrapping of the job with DMTCP and transferal of needed DMTCP files--no shim script voodoo. • Condor should honor CheckpointPlatform for Vanilla universe jobs in case of pool segmentation. • Parallel universe support with single coordinator. • Doug Thain’s Parrot for remote I/O.

  22. Challenges • C/C++ runtime library compatibility issues. • Recompile DMTCP on slot before job execution? • Dynamic library incompatibilities. • No Checkpoint Server. • Condor file transfer protocol enhancement? • Debugging methods and practices?

  23. Further Reading • “DMTCP: Transparent Checkpointing for Cluster Computation and the Desktop” • http://arxiv.org/abs/cs/0701037 • Source Code • http://dmtcp.sourceforge.net

  24. Questions? • DMTCP • http://dmtcp.sourceforge.net • Gene Cooperman: gene@ccs.neu.edu • Condor/DMTCP Integration • Pete Keller: psilord@cs.wisc.edu • Ask me if you want to try the Alpha Version out!

  25. Thank you

More Related