320 likes | 405 Vues
Terry L. Wilmarth, Nilesh Choudhury, David Kunzman, Eric Bohm Parallel Programming Laboratory University of Illinois at Urbana-Champaign. Creating Simulations with POSE. Inspiration. Parallel Discrete Event Simulation of fine-grained tasks is notoriously difficult to parallelize and scale
E N D
Terry L. Wilmarth, Nilesh Choudhury, David Kunzman, Eric Bohm Parallel Programming Laboratory University of Illinois at Urbana-Champaign Creating Simulations with POSE
Inspiration • Parallel Discrete Event Simulation of fine-grained tasks is notoriously difficult to parallelize and scale • Classes of applications parallelize well with conservative synchronization, but most do not scale well • Optimistic Synchronization coupled with Charm++'s virtualization, load balancing, communication optimization, etc. enable scalable general-purpose PDES (fine-grained or not)
POSE 0.5a Prerequisites • Parallel Discrete Event Simulation • C++, Charm++ • Minimal understanding of parallel programming • Some understanding of optimistic synchronization mechanism for PDES
Designing a POSE Simulation • Decompose system to be modeled into its concurrent entities: posers • Determine how the entities interact with discrete events: event methods • Determine initial placement of entities • Some parallel computing savvy required to maximize achievable parallelism
POSE to Charm++ • Posers: special Charm++ chares that communicate via timestamped messages executed in order • Event methods: Charm++ entry methods that receive timestamped event messages • Strategy: method for synchronizing posers in parallel
Optimistic Synchronization • Events received by posers are queued by timestamp in poser's event queue • Events executed in timestamp order according to synchronization strategy • The poser's state is periodically checkpointed • A straggler arrives, events are rolled back, spawned events are cancelled, and the poser's state is recovered
Optimistic Synchronization in POSE • After handling straggler, forward execution proceeds as before • Fossil collection: checkpoint memory is recovered when no longer needed, i.e. when checkpoint is older than... • GVT, global virtual time, the minimum virtual time in the entire simulation
Using POSE • POSE code translated to Charm++ • Same program structure: .ci, .h, .C • Same program code can be used to run sequential or parallel simulations • Highly configurable
Code Sample: Poser & Event Messages, .ci file message WorkerData; message WorkMsg; poser worker : sim adapt4 chpt { entry worker(WorkerData *); // Event methods entry [event] void work(WorkMsg *); };
The .ci File • Declare event message types: message WorkerMsg; • Declare posers with synchronization strategy and representation type: poser worker : sim adapt4 chpt {... • Declare event methods for posers which take an event message as parameter: entry [event] void work(WorkMsg *);
Code Sample: Poser Declaration & Event Methods, .h file class worker { int someIntData; public: worker(); worker(WorkerData *m); ~worker(); worker& operator=(const worker& obj); void pup(PUP::er &p); // Event methods void work(WorkMsg *m); void work_anti(WorkMsg *m); void work_commit(WorkMsg *m); };
The .h File • Define posers and their state (a portion of the global state): class worker { int someIntData; ... • Declare any local helper methods required • Declare contructors and required methods: • Basic constructor, destructor, assignment operator, pup method
The .h File, cont'd • Declare event methods and corresponding anti-methods and commit methods: void work(WorkMsg *m); void work_anti(WorkMsg *m); void work_commit(WorkMsg *m); • Anti-methods provide an alternative mechanism to checkpointing that allows the user to undo the state changes of an event method
The .h File, cont'd • Commit methods are executed when fossil collection is about to free the memory of a check-pointed state and “commit” to the execution of an event • i.e. committed event can't be rolled back • Useful for statistics collection, I/O, or any other activity that should only happen once
Code Sample: Poser Constructor & Event Method Invocation, .C file worker::worker(WorkerData *m) { someIntData = m->someData; delete m; POSE_srand(myHandle); WorkMsg *wm; if (myHandle == 0) { wm = new WorkMsg; wm->someIntData = someIntData; POSE_invoke(work(wm), worker, POSE_rand()%42, POSE_rand()%10)); } }
The .C File • Constructor receives message, uses data, deletes message (not true for event methods!) • Every poser has a handle:myHandle if (myHandle == 0) { ... worker::worker(WorkerData *m) { someIntData = m->someData; delete m; ...
The .C File, cont'd • User decides what handle each poser has at construction time • Any poser can invoke events on another poser as long as it knows the destination poser's handle • Random number generation in POSE repeats same sequence in case of rollback and re-execution
The .C File, cont'd • Event method invocation: • POSE_invoke(event_method(event_msg), poser_type, dest_handle, transit_time); • event_msg is timestamped with OVT + transit_time when it arrives on poser dest_handle wm = new WorkMsg; wm->someIntData = someIntData; POSE_invoke(work(wm), worker, POSE_rand()% 42, POSE_rand()%10)); • Each poser has its own virtual time: OVT; posers' OVTs can be out-of-sync
Code Sample: Event Methods, .C file void worker::work(WorkMsg *m) { WorkMsg *wm; wm->someIntData = m->someIntData + someIntData; // fake computation POSE_busy_wait(1000); elapse(27); POSE_invoke(work(wm), worker, POSE_rand()%42, POSE_rand()%10); } void worker::work_anti(WorkMsg *m) { restore(this); } void worker::work_commit(WorkMsg *m) { }
Passing Virtual Time • We've seen how to make an event happen in the future via the transit_time parameter to POSE_invoke • Elapse time on a poser: elapse(27); • Increments poser's OVT by 27 • Auto-elapse: a poser receives an event at time t > OVT; advance poser's OVT to t
Passing Virtual Time, cont'd • What if t < OVT? • If the received event is inserted in the event queue before other executed events, it causes a rollback • If not, the event is handled at time OVT, not at time t (events earlier than t kept the poser busy until time OVT)
Event Methods and Event Messages • When an event message arrives, it is queued on the destination poser as an event to be executed • The actual message is stored in the queue along with any additional information associated with the event • Because the event may be rolled back and re-executed, the message must not be deleted in the event method
Anti-methods • Typically, anti-methods only restore the checkpointed state: void worker::work_anti(WorkMsg *m) { restore(this); } • But they can be used instead of checkpointing to undo simple state changes: void myClass::toggleFlag_anti(eventMsg *m) { flag ? flag=0 : flag=1; restore(this); }
Output • Printing information about progress, statistics or debugging data in PDES can be confusing in the face of rollbacks • CommitPrintf(...): buffers event execution output until the event is committed • CommitError(...): buffers error statements and aborts if an event results in an error that is committed
A Main Program • Programs that use posers are pure Charm++ --- they are not translated • In main, just before creating posers, call POSE_init() to start simulation • Then inject posers into the system: WorkerData *wd; for (int i=0; i<42; i++) { wd = new WorkerData; wd->Timestamp(0); int dest = rand() % CkNumPes(); (*(CProxy_worker *) &POSE_Objects)[i].insert (wd, dest); // i is this poser's handle }
A Main Program WorkerData *wd; for (int i=0; i<42; i++) { wd = new WorkerData; wd->Timestamp(0); int dest = rand() % CkNumPes(); (*(CProxy_worker *) &POSE_Objects)[i].insert (wd, dest); // i is this poser's handle } • User must timestamp constructor message and create object with Charm++ syntax • Under the hood, constructs a worker in a Chare Array at index i on processor dest
POSE_init() • Initialization and Simulation start-up • Can optionally specify endTime, a virtual time at which to halt the simulation • Can optionally specify whether or not to use inactivity detection: terminates simulation if no events are handled for some period of time void POSE_init(); void POSE_init(int ET); void POSE_init(int IDflag, int ET);
Choosing a Synchronization Strategy poser worker : sim adapt4 chpt { ... • POSE offers a wide variety of synchronization strategies ranging from conservative to aggressively optimistic • Each type of poser can use the strategy best suited to its behavior • opt*, spec, adapt*
Choosing a Synchronization Strategy • opt*: basic optimistic synchronization, throttled and unthrottled • spec: throttled optimism, aggressive speculation • adapt*: optimism and speculation adapt to recent behavior of poser
Getting and Using POSE • Got Charm++? You've got POSE. • build pose ... • etrans.pl [-s] Worker • Translates Worker.* to Worker_sim.* • charmc ... -module pose -language charm++ • charmc ... -module seqpose -language charm++
Applications • VHDL Simulation: David Kunzman • Big Network Simulation: Nilesh Choudhury