1 / 18

BaBar MC production

BaBar MC production. The simple question: How can we run BaBar software on EDG grid sites?. Farm @ VU (Amsterdam University). EDG testbed (NIKHEF). Jobs. BaBar MC production software. A lot of computers. Results. Introduction of Parrot.

xenon
Télécharger la présentation

BaBar MC production

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BaBar MC production The simple question: How can we run BaBar software on EDG grid sites? Farm @ VU (Amsterdam University) EDG testbed (NIKHEF) Jobs BaBar MC production software A lot of computers Results

  2. Introduction of Parrot We need transparent access to the Objectivity Database (requires local file access) Farm @ VU (Amsterdam University) EDG testbed (NIKHEF) Jobs BaBar MC production software A lot of computers Chirp Parrot Results

  3. Parrot functionality BaBar MC production Optimize (POSIX Interface) (Ptrace trap) Not yet Local Cache The Parrot Virtual File System HTTP FTP RFIO NeST Chirp Condor Proxy Whole File I/O (get/put) Partial File I/O (open,close,read,write, lseek) Secure Remote RPC x509 HTTP Server FTP Server RFIO Server NeST Server Chirp Server Condor Shadow Traditional I/O Services Integration with Castor Allocation and Mgmt Full UNIX Semantics Integration with Condor

  4. Private network Relay GCB The introduction of GCB Farm @ VU (Amsterdam University) BaBar MC production software Chirp EDG testbed (NIKHEF) NFS Results Jobs Parrot Condor-G Jobs A lot of computers Some computers Results

  5. Central Manager N A T A P GCB Server GCB functionality Private network B Relay Persistent connection

  6. Condor-G Job Queue GlideIn Relay Batch job Relay Private network The introduction of GlideIn Farm @ VU (Amsterdam University) EDG testbed (NIKHEF) PBS job manager 72 hour jobs Can’t wait for queues BaBar MC production software Chirp NFS Results Jobs Jobs Parrot Private network A lot of computers Some computers Relay Results GCB

  7. GlideIn functionality

  8. Condor-G Job Queue GlideIn Batch job Overview of complete setup Farm @ VU (Amsterdam University) EDG testbed (NIKHEF) PBS job manager 72 hour jobs Can’t wait for queues BaBar MC production software Relay Chirp NFS Results Jobs Jobs Relay Parrot Private network Private network A lot of computers Some computers Relay Results GCB

  9. Leave only the components Farm @ VU (Amsterdam University) EDG testbed (NIKHEF) PBS job manager BaBar MC production software Queue GlideIn Chirp NFS Parrot Private network Private network A lot of computers Some computers GCB

  10. Different MDS scheme • Objectivity database • LOCK server sockets • NFS problems • UID / hostname checks The interesting dependencies Farm @ VU (Amsterdam University) EDG testbed (NIKHEF) PBS job manager BaBar MC production software Queue GlideIn Chirp NFS NAT box Parrot Private network Private network A lot of computers Some computers GCB • Dropping UDP packages • Timeout 2 minutes • Inactive sockets • Inactive File I/O

  11. Consequences • Different MDS scheme • Implemented EDG scheme for GlideIn • Objectivity • A lot of debugging • Made Parrot mimic hostname and uid • Tricked Objectivity to use standard NFS libraries • Aggressive NAT box • Changed GCB to use TCP instead of UDP • Used Parrot to keep sockets alive • Parrot recovers File I/O when TCP connection is lost • We are the first to run Objectivity cross-domain

  12. Application Initializes 10 times slower Performance 3000 Production 3 times slower Time (minutes) 2500 2000 Production on EDG testbed 1500 1000 Production on local machine 500 1500 2000 500 1000 Events

  13. Possible improvements • Create more sophisticated tool to acquire resources • Resource planning, distribution, etc. • Maybe something fancy already exists? Farm @ VU (Amsterdam University) EDG testbed (NIKHEF) PBS job manager BaBar MC production software Queue GlideIn Chirp NFS Parrot Private network Private network A lot of computers Some computers GCB • Parrot: Caching • On per directory basis • Requires debugging

  14. Move chirp servers to private nodes Farm @ VU (Amsterdam University) EDG testbed (NIKHEF) PBS job manager BaBar MC production software Queue GlideIn • Use Condor/GCB machinery for chirp server • Solves security issues • Allows chirp server to be on private nodes • Requires new chirp-condor implementation NFS Private network Private network A lot of computers Some computers Parrot Chirp GCB

  15. Move GCB to head node Farm @ VU (Amsterdam University) EDG testbed (NIKHEF) PBS job manager BaBar MC production software Queue GlideIn GCB NFS Private network Private network A lot of computers Some computers Parrot Chirp • Move GCB to same machine as Central Manager • Solution required for port conflicts • Temporary solution: Move CM to a private node

  16. Use EDG data storage Farm @ VU (Amsterdam University) EDG testbed (NIKHEF) PBS job manager BaBar MC production software Queue GlideIn GCB • Write events to EDG data storage (gsiFTP) • Requires debugging NFS Private network Private network A lot of computers Some computers Parrot Chirp EDG data storage

  17. Use more sites • Let GCB manage several private networks at the same time • Requires solution for conflicting private addresses Farm @ VU (Amsterdam University) EDG testbed (NIKHEF) PBS job manager BaBar MC production software Queue GlideIn Other testbed GCB Private network A lot of computers NFS Private network Private network A lot of computers Some computers Parrot Chirp EDG data storage

  18. Conclusions • It works • BaBar MC production runs successfully on NIKHEF EDG testbed • All this experimental software actually works when used together • It looks easy • Our GRID setup is complicated, but…. • Parrot hides problems related to local file access • GCB hides problems related to network configurations • GlideIn hides complications with resource gathering • The user can just submit his/her jobs to a local batch system • There is some work to do • Performance could be better • Initialization 10 times slower • Production 3 times slower • Caching and (semi-) local event storage should improve this • Usability could be improved • GlideIn should have a tool to acquire them • Several improvements proposed for GCB/Parrot • The improvements are done at the level of the “grid” tools • The user benefits without rewriting code

More Related