Master-Worker Framework for Condor - Tutorial and Agenda Overview
This tutorial outlines the Master-Worker (M-W) framework for Condor, focusing on its practical applications, including when to use it and how to build a basic M-W application. It addresses key weaknesses in Condor such as handling short jobs and dynamic parallel workflows. The framework facilitates the management of multiple tasks by reducing overhead in job execution and improves efficiency through lightweight task multiplexing. Key components, coding practices, and debugging methods are discussed, along with example applications like matrix multiplication and knapsack problems.
Master-Worker Framework for Condor - Tutorial and Agenda Overview
E N D
Presentation Transcript
Agenda • What is M-W • When to use M-W • How to build a simple M-W application • Q & A
Why M-W? • M-W addresses a weakness in Condor: • Short jobs • Also, for dynamic, parallel workflows
An easy solution: • Why not just wrap up smaller jobs into a bigger Condor job? • Partial failures? • Load balancing? • Dynamic creation of work?
Solution: Lightweight TasksMultiplexed on top of Jobs • Process : Thread :: Condor Job : MW Task • MWTask dispatch in milliseconds, Condor job can take minutes
MW is… • C++ Framework • To re-use condor worker jobs • To each run many tasks • Results in very parallel application
MW is not • MPI • General parallel programming scheme
MW in action T Worker Master exe T T T T T T T T T Worker T condor_submit Worker Submit machine
You Must Write 3 Classes Subclasses of … MWDriver MWTask MWWorker Master exe Worker exe
Your_MWTask • Subclass MWTask • Data members for inputs • Data member for results • Serialization of inputs and results • Distinct instances on each side
The Four Task Methods • void MyTask::pack_work(void); • void MyTask::unpack_work(void); • void MyTask::pack_results(void); • void MyTask::unpack_results(void); • Also ctor/dtor!
RMComms • Abstraction for communication • (and some other stuff…) • RMC->pack(int *array, int length); • RMC->unpack(int *array, int length);
MWWorker • Just one method: • executeTask(MWTask *t) • Also ctor/dtor!
MWDriver • get_userinfo(int argc, char **argv) • RMC->add_executable(char *exe, char *requirements); • setup_initial_tasks(int num_tasks, MWTask ***init_tasks) • act_on_completed_task(MWTask *t) • RMC->add_task(MWTask *t) • Also ctor/dtor
Putting it all together:new_skel • ./new_skel MY_PROJECT • Use configure –help for options • make
Debugging with Independent Mode • Special RMComm for debugging • Single process, can run under gdb
Running on the Grid… • Just launch the appropriate master • condor_q to see it in action
Advice for Large Runs • Use personal condor • Flock, glide-in, schedd-on-side, hobblein • Use checkpointing! • Set_worker_increment high
User-level Checkpointing • MWTask::write_chkpt_info(FILE *) • MWTask::read_chkpt_info(FILE *) • MWDriver::read_master_state(FILE *) • MWDriver::write_master_state(FILE *)
Example codes with MW • Matmul • Blackbox • knapsack
MW Philosophy • Reuse either code or concept • Key idea: Late binding
Other resources • http://www.cs.wisc.edu/condor/mw • Online manual • MW-users mailing list
Thank You! Questions? MW Home page: http://www.cs.wisc.edu/condor/mw