Migration of ATLAS PanDA to CERN

Migration of ATLAS PanDA to CERN Graeme Stewart, Alexei Klimentov, Birger Koblitz, Massimo Lamanna, Tadashi Maeno, Pavel Nevski, Marcin Nowak, Pedro Salgao, Torre Wenus, Mikhail Titov

Outline • PanDA Review • PanDA History • PanDA Architecture • First steps of Migration to CERN • Infrastructure Setup • PanDA Monitor • Task Request Database • Second Phase Migration • PanDA Server and Bamboo • Database bombshells • Migration, Tuning and Tweaks • Conclusions

PanDA was developed by US ATLAS in 2005 Became the executor of all ATLAS production in EGEE during 2008 March 2009: executes production for ATLAS in NDGF as well using ARC Control Tower (aCT) As PanDA had become central to ATLAS operations it was decided in late 2008 to re-locate it to CERN 35k simultaneous running jobs 150k jobs per day finished PanDA Recent History

PanDA Server Architecture • PanDA (Production and Distributed Analysis) is a pilot job system • Executes jobs from the ATLAS production system and from users • Brokers jobs to sites based on available compute resource and data • Can move and stage data if necessary • Triggers data movement back to Tier-1s for dataset aggregation ATLAS ProdDB Panda Monitor Panda Client Bamboo Panda Server Panda Databases Pilots get jobs Computing Site Pilot Factory Pilots

PanDA Monitor • PanDA Monitor is the web interface to the panda system • Provides summaries of processing per cloud/site • Drill down to individual job logs • And directly view logfiles • Task status • Also provides a web interface to request actions from the system • Task requests • Dataset Subscriptions

AKTR MySQL PandaDB MySQL ProdDBOracle Task Request Database • Task request interface is hosted as part of the panda monitor • Allows physicists do define MC production task • Backend database exists separately from rest of panda • Prime candidate for migration from MySQL at BNL to Oracle at CERN AKTR Oracle PandaDB MySQL ProdDBOracle

Migration – Phase 1 • Target was migration of task request database and panda monitor • First step was to prepare infrastructure for services: • 3 server class machines to host panda monitors • Dual CPU, Quad Core Intel E5410 CPUs • 16GB RAM • 500GB HDD • Setup as much as possible as standard machines supported by CERN FIO • Quattor templates • Lemon monitoring • Alarms for host problems • Utilise CERN Arbitrating DNS to balance load across all machines • Picks the 2 ‘best’ machines of 3 with a configurable metric • Also migrated to the ATLAS standard python environment • Python 2.5, 64 bit

Parallel Monitors • Panda was always architected to have multiple stateless monitors • Each monitor queries the backend database to retrieve user requested information and display it • Thus setting up a parallel monitor infrastructure at CERN was relatively easy • Once external dependencies were sorted • ATLAS Distributed Data Management (DDM) • Grid User Interface tools • This was deployed at the beginning of December 2008 DB

Task Request Database • First real step was to migrate the TR DB between MySQL and Oracle • This is not quite as trivial as one first imagines • Each database supports some non-standard SQL features • And these are not entirely compatible • Optimising databases is quite specific to the database engine • First attempts ran into trouble • MySQL dump from BNL to CERN resulted in connections being dropped • Had to dump data at BNL and scp to CERN • Schema required some cleaning up • Dropped unused tables • Removing null constraints, CLOB->VARCHAR, resizing some text fields • However, after a couple of trial migrations we were confident that data could be migrated in just a couple of hours

Migration • Migration occurred on Monday December 8th • Database data was migrated in a couple of hours • Two days were then used to iron out any glitches • In the Task Request interfaces • In the scripts which manage the Task Request to ProdDB interface • Could this all have been prepared in advance? • In theory yes, but we are migrating a live system • So there only a limited amount of test data which can be inserted into the system • Real tasks trigger real jobs • System was live again and accepting task requests on Wednesday • Latency of tasks in the production system is usually several days, even for short tasks • Acceptable to the community

MySQL DB Oracle DB A Tale of Two Infrastructures • New panda monitor setup required DB plugins to talk to both MySQL and to Oracle • The MySQLdb module is bog standard • The cx_oracle module much less so • In addition Python 2.4 was the supported infrastructure at BNL as opposed to Python 2.5 at CERN • This meant after the TR migration the BNL monitors started to have a more limited functionality • This had definitely not been in the plan!

PanDA Servers • Some preliminary work on the panda server has been done already in 2008 • However much still required to be done to migrate the full suite of panda server databases: • PandaDB – holds live job information and status (‘fast buffer’) • LogDB – holds pilot logfile extracts • MetaDB – holds panda scheduler information on sites and queues • ArchiveDB – ultimate resting place of any panda job (big!) • For most databases the data volume was minimal and the main work was in the schema details • Including the setup of Oracle triggers • For the infrastructure side we copied the BNL setup, with multiple panda servers running on the same machines as the monitors • We knew the load was low and the machines were capable • We also required one server component which interfaces between the panda servers and ProdDB, bamboo • Same machine template worked fine

ArchiveDB • In MySQL, because of constraints on the table performance vs. size an explicit partitioning had been adopted • One ArchiveDB table for every two months of jobs • Jan_Feb_2007 • Mar_Apr_2007 • … • Jan_Feb_2009 • In Oracle internal partitioning is supported: • CREATE TABLE jobs_archived (<list of columns>) PARTITION BY RANGE(MODIFICATIONTIME) ( PARTITION jobs_archived_jan_2006 VALUES LESS THAN (TO_DATE('01-JAN-2006','DD-MON-YYYY')), PARTITION jobs_archived_feb_2006 VALUES LESS THAN (TO_DATE('01-MAR-2006','DD-MON-YYYY')), PARTITION jobs_archived_mar_2006 VALUES LESS THAN (TO_DATE('01-APR-2006','DD-MON-YYYY')), … • This allows for considerable simplification of the client code in the panda monitor

Integrate, Integrate, … • By late February trial migrations of the databases had happened to integration databases hosted at CERN (the INTR database) • Trail jobs had been run through the panda server, proving basic functionality • Decision now had to be made on final migration strategy • This could be ‘big bang’ (move the whole system at once) or ‘inflation’ (gradually migrate clouds one by one) • Big bang would be easier for, e.g., panda monitor • But would carry greater risks – suddenly loading the system with 35k running jobs was unwise • If things went very wrong it might leave us with a big mess to recover from • External constraint was the start of the ATLAS cosmics re-reprocessing campaign due to start 9th March • We decided to migrate piecemeal

Final Preparations • In fact PanDA did have two heads already • IT and CERN clouds had been run from a parallel MySQL setup from early 2008 • This was an expensive infrastructure to maintain as it did not tap into CERN IT supported services • It was obvious that migrating these two clouds would be a natural first step • Plans were made to migrate to the ATLAS production database at CERN (aka ATLR) • Things seemed to be under control a few days before…

DBAs • Friday before we were due to migrate CERN DBAs asked us not to do so • They were worried that not enough testing of the Oracle setup in INTR has been done • This triggered a somewhat frantic weekend of work, resulting in several thousand jobs being run through the CERN and IT clouds using the INTR databases • From our side this testing looked to be successful • However, we reached a subsequent compromise that • We would migrate the CERN and IT clouds to panda running against the INTR • They would start backups on the INTR database giving us the confidence to run production for ATLAS though this setup • Subsequent migration from INTR to ATLR could be achieved much more rapidly as the data was already in the correct Oracle formats

Tuning and Tweaking • Migration of PandaDB, LogDB, MetaDB was very quick • There was one unexpected piece of client code which hung during the migration process (polling of CERN MySQL servers) • Migration and index building of ArchiveDB was far slower • However, we disabled access to ArchiveDB and could bring the system up live within half a day • Since then a number of small improvements in the panda code have been made to help optimise use of oracle • Connections are much more expensive in Oracle than in MySQL • Restructure code to use a connection pool • Create common reader and writer accounts for access to all database schemas from the one connection • Migration away from triggers to .nextval() syntax • Despite fears, migration of panda server to oracle has been relatively painless and been achieved without significant loss of capacity

Cloud Migration • Initial migration was for CERN and IT clouds • We added NG, the new Nordugrid cloud, which was from a standing start • We added DE after a major intervention in which the cloud was taken offline • Similarly TW will come up in the CERN Oracle instance • UK was the interesting case where we migrated a cloud live: • Switched bamboo instance to send jobs to CERN Oracle servers • Current jobs are left being handled by old bamboo and servers • Start sending pilots to UK asking from jobs from CERN Oracle servers • Force the failure of jobs not yet started in the old instance • These return to prodDB and then are picked up again by panda using the new bamboo • Old running jobs are handled correctly by the ‘old’ system • There will be a subsequent re-merge into the CERN ArchiveDB

Monitor Blues • A number of problems did arise in the new monitor setup required for the migrated clouds • Coincident with the migration there was a repository change from CVS to SVN • However, the MySQL monitor was deployed from CVS and the Oracle monitor from SVN • This lead to a number of accidents and minor confusions which it took a while to recover from • New security features cause some loss of functionality at times as it was hard to check all the use cases • And the repository problems augmented this • However, these are now mostly resolved issues and ultimely the system will in fact become simpler

Conclusions • Migration of the panda infrastructure from BNL to CERN has underlined how difficult the transition of a large scale, live, distributed computing system is • A very pragmatic approach was adopted in order to get the migration done in a reasonable time • Although it always takes longer then you think • (This is true even when you try and factor in knowledge of the above) • Much has been achieved • Monitor and task request database fully migrated • CERN Panda server infrastructure moved to Oracle • Now running 5(6) of the 11 ATLAS clouds: CERN, DE, IT, NG, UK, (TW) • Remaining migration steps are now a matter of scaling and simplifying • We learned a lot • Love your DBAs, of course • If we have to do this again, now we know how • But there is still considerable work to do • Mainly in improving service stability, monitoring and support proceedures

Migration of ATLAS PanDA to CERN