PanDA in a Federated Environment

PanDA in a Federated Environment Kaushik De Univ. of Texas at Arlington CC-IN2P3, Lyon September 13, 2012

Outline • Overview • Concrete plans • Federated data access/stageout for fault tolerance • Federated data transfer for managed production • Federated data access for distributed analysis • Speculative ideas • Data caching • Event caching • Cache aware brokerage Kaushik De

PanDA FAX Status • Last year, I talked about local federations • Direct access through local redirectors are in use by PanDA at SLAC and SouthWest Tier 2 – working well for many years • This year, the emphasis has been on global federations • Global redirectors have been set up and tested in ATLAS • Changes were implemented in the PanDA pilot to enable these global redirectors in the default workflow • But progress has been somewhat slow • PanDA under continuous use in ATLAS • Development activities not related to LHC data have been minimal Kaushik De

FAX for Fault Tolerance • Phase I goal • If input file cannot be transferred/accessed from local SE, PanDA pilot currently fails the job after a few retries • We plan to use Federated storage for these (rare) cases • Start with file staging/transfers using FAX • Implemented in recent release of pilot, works fine at two test sites • Next step – wider scale testing at production/DA sites • Phase 2 • Once file transfers work well, try FAX Direct Access • Phase 3 • Try FAX for transfer of output files, if default destination fails • Next few slides from Tadashi/Paul Kaushik De

Kaushik De

FAX for Managed Production • Managed production has well defined workflow • PanDA schedules all input/output file transfers through DQ2 • DQ2 provides dataset level callback when transfers are completed • FAX can provide alternate transport mechanism • Transfers handled by FAX • Dataset level callback provided by FAX • Dataset discovery/registration handled by DQ2 • File level callback • Recent development – use activeMQ for file level callbacks • On best effort basis for scalability – dataset callbacks still used • FAX can use same mechanism • Work in progress Kaushik De

FAX for Distributed Analysis • Most challenging and most rewarding • Currently, DA jobs are brokered to sites which have input datasets • This may limit and slow the execution of DA jobs • Use FAX to relax constraint on locality of data • Use cost metric generated with Hammercloud tests • Provides ‘typical cost’ of data transfer between two sites • Brokerage will use ‘nearby’ sites • Calculate weight based on usual brokerage criteria (availability of CPU…) plus transfer cost • Jobs will be dispatched to site with best weight – not necessarily the site with local data or available CPU’s • Cost metric already available (see Ilija/Rob talks) Kaushik De

Implementation Schedule • FAX for fault tolerance • Phase 1 (FAX transfers) – done, test for few months • Phase 2 (FAX Direct Access) – before year end • Phase 3 (FAX output) – before year end • FAX for central production • Within 6 months • Maybe sooner – activeMQ is already under testing • FAX in brokerage • Cost metric already available • Few months to setup and test in PanDA database • Next year – enable a few sites for high throughput tests Kaushik De

Data Caching • Local data caching for WAN access • Maybe not for PanDA – can federation do it transparently? • Various alternatives were discussed in WAN meeting at CERN • PanDA could keep site level cache • Not guaranteed file catalog – best effort list • Use FAX to fetch again if file if no longer available Kaushik De

Event Cache • Long term PanDA goal – event service • Granularity of data processing in PanDA – datasets and files • But events are really the atomic unit for HEP • PanDA event service will change current processing model • Challenges of event service • Scalability – keeping track of 100’s of billions of events • Fault tolerance – processing all events without data loss • Chaining of data processing • Efficient use of WAN vs storage Kaushik De

Kaushik De

Conclusion • Wide array of FAX plans for PanDA • Schedule depends on availability of effort during LHC run • Do not foresee technical challenges for short/medium term • Long term – many open ideas, some quite challenging Kaushik De

PanDA in a Federated Environment

PanDA in a Federated Environment

Presentation Transcript

Federated Identity in Practice

Collection-Level User Searches in Federated Digital Resource Environment

Authentication and Authorization in a federated environment Jules Wolfrat (SARA)

PANDA

Panda

Panda

Panda

Federated A(A(A))I

panda

Panda

panda

Panda

panda

FBIRN Federated Informatics Research Environment (FIRE)

Building a technical consulting team within a federated university environment

PANDA

Identity Management in a Federated Environment

FBIRN Federated Informatics Research Environment (FIRE)

Panda

Managing Your Infrastructure in a Federated World

Building a technical consulting team within a federated university environment