140 likes | 239 Vues
Sending Commands and Managing Processes across the BaBar OPR Unix Farm through C++ and CORBA. Tom Glanzman (SLAC) on behalf of Gilbert Grosdidier (LAL-Orsay) (for the BaBar Prompt Reconstruction and Computing Groups) Paper #161 - CHEP 2000 - Padova. Context of the project.
E N D
Sending Commands and Managing Processes acrossthe BaBar OPR Unix Farmthrough C++ and CORBA Tom Glanzman (SLAC) on behalf of Gilbert Grosdidier (LAL-Orsay) (for the BaBar Prompt Reconstruction and Computing Groups) Paper #161 - CHEP 2000 - Padova
Context of the project • Online Prompt Reconstruction (OPR) • Unix distributed farm • typically 100-200 nodes • Processing the BaBar raw data events • within a projected latency of about 8 hours • The purpose of this project was to build a tool able to launch, monitor and controlremote processes inside of this OPR farm • It was actually started beginning of October 1999 • It is called GFD (Global Farm Daemon)
Functional Design Compute Node 2-way CORBA Call GFD-Client GFD Process Process Process Process Command Node 200 compute nodes GFD Process GFD Process OPR Farm Detached Process Process Process
Perl wrapper Internal Layout Driver (fork, exec) Gfd-Client Server GFD -update cmd list -shutdown CORBA C++ C++ CORBA (TAO) CORBA (TAO) ACE ACE -launch procs -run Unix cmds -kill procs Sockets Sockets Unix OS Unix OS
Design Requirements • This command layer was required to be: • Fast: broadcasting to the whole farm in a few seconds • Lightweight: the whole system must remain very simple • Flexible: one can build sophisticated macros • Robust: unreliable nodes do not interfere with others • Improving process control: • all processes on a compute node belong to one userid • Reasonably secure: • limited command library with aliases • ACLs specific to each command • Reliable: started and monitored by a cron job • Scalable:together with the number of compute nodes to reach
Design Components • Overall BaBar context was: • OO/C++, Distributed Computing, Unix • Given the OPR context, we chose: • CORBA for the message layer • ACE/TAO as a C++ CORBA API • Current versions built & running through: • Solaris 2.6 • using native C++ (4.2) compiler • ACE 5.04 & TAO 1.04 • using native Solaris threads
The Server components • A set of high level TAO wrappers • provided by BaBar (see other talk) • The Command Library • It is a human readable file, and contains: • alias name, together with complete command definition • options and parameters, in case of a macro • diverted log directory name (to store the results) • list of users allowed to access the command (ACL) • A special command reloads it after an update
The Server components (2) • Returning information to the Client • Commands are mainly run asynchronously • as a background process, not waiting for the output • log its output onto a file, whose name is returned to the client • Other modes are available, for special use only • The Command Processor • Authentication layer • A few commands are caught and processed directly • Option parsing, allowing utility switches • Execution layer: • the command string is built and wrapped into a "system()"-like call
The Client • It is used for many very different purposes: • check the status of the servers • launch and monitor specific tasks on all farm nodes • stop or kill some remote processes • Two versions of the client coexist • both accessible through the same Perl wrapper • a single-threaded version targets a single node • a multi-threaded one tackles a string of nodes in one shot • the client uses a non-blocking loop to contact the GFDs • no delays if a misbehaving GFD is seen
The Client components • The functions achieved by the client program: • select the GFDs and collect their TAO IORs • send the command alias and options through a CORBA call to every GFD • receive the returned data from the same CORBA call • and process it, if any • The multi-threaded version: • saves resources: memory, CPU time, name server calls • but it requires subtle and thoughtful coding. Some traps: • TAO initialisation at run-time to be MT safe • Avoid use of special CORBA types outside message handling • Check every utility or tool for MT-safety, or move it out of the thread
Command execution mode Driver Command Returned Command execution output to the client completion mode acknowledge Achieved on the through through GFD server CORBA CORBA Synchronously Logged Log filename Yes, by GFD (optional: in case of full command failure output) Asynchronously Logged Nothing Yes, by GFD (optional: in case of Log filename) failure
Performance • Issuing "uname -a" to 100 nodes synchronously requires, in elapsed time: • 230 sec. when using "ssh -x" • 25 sec. when using the single threaded client • the Perl wrapper achieving the loop over the nodes • 7.6 sec. when running a multi-node client • the client achieved the loop sequentiallyover the nodes • 4.7 sec. when running the multi-threaded client • This MT client is scalable, and was extensively tested with GFDs running over 250 nodes
Conclusions • GFDs in production for 3 months now: robust & reliable • Most of the required functionalities implemented and running • We have demonstrated successful use of CORBA in farm management • However, carefully consider the use of CORBA (ACE+TAO) for a large project • Significant learning curve of weeks, not days • Documentation is weak, and not always reliable(but improving) • Fast response from ACE/TAO support team • The current project was rather limited and simple, and constituted an ideal case study for the setup of these tools
Future & Evolution • Extend the use of GFDs to all OPR servers, at least for monitoring • not only on compute nodes • Embed this system in a "Global Farm Manager", to drive/coordinate the entire farm • not only managing processes • Propose to use GFDs also in the DAQ system • not only Reconstruction farm • We are still struggling to sort out a few (?) rough edges.