440 likes | 459 Vues
Learn about the integration of PROOF essentials with experiment software and the ALICE experience at the CAF. Discover how PROOF provides a dynamic approach to end-user HEP analysis on distributed systems.
E N D
PROOF Gerardo GANIS CERN / LCG G. Ganis, 2nd LCG-France Colloquium
Outline • PROOF essentials • Integrationwith experiment software • ALICE experience at the CAF G. Ganis, 2nd LCG-France Colloquium
Outline • PROOF essentials • Integration with experiment software • ALICE experience at the CAF G. Ganis, 2nd LCG-France Colloquium
Run over data set Implement algorithm Make improvements PROOF essentials • Motivation: provide an alternative, dynamic, approach to end-user HEP analysis on distributed systems • Typical HEP analysis is a continuous refinement cycle • Data sets are collections of independent events • Large (e.g. ALICE ESD+AOD: ~350 TB / year) • Spread over many disks and mass storage systems • Exploiting intrinsic parallelism is the only way to analyze the data in reasonable times G. Ganis, 2nd LCG-France Colloquium
catalog files query jobs data file splitting myAna.C PROOF essentials: classic approach Storage Batch farm queues submit merging final analysis manager outputs • static use of resources • jobs frozen: 1 job / worker node • manual splitting, merging • monitoring requires instrumentation G. Ganis, 2nd LCG-France Colloquium
catalog files Storage PROOF farm scheduler query PROOF query: data file list, myAna.C feedbacks (merged) MASTER final outputs (merged) PROOF essentials: an alternative approach • farm perceived as extension of local PC • same syntax as in local session • more dynamic use of resources • real time feedback • automated splitting and merging G. Ganis, 2nd LCG-France Colloquium
PROOF essentials: target Medium term jobs, e.g. analysis design and development using also non-local resources • Short analysis using • local resources, e.g. • end-analysis calculations • visualization Long analysis jobs with well defined algorithms (e.g. production of personal trees) • Optimize response for short / medium jobs • Perceive medium as short G. Ganis, 2nd LCG-France Colloquium
PROOF essentials: design goals • Transparency • minimal impact on ROOT user habits • Scalability • full exploitation of available resources • Adaptability • cope transparently with heterogeneous environments • Real-time interaction and feedback • Addresses the case of • Central or Departmental Analysis Facilities (Tier-2’s) • Multi-core, multi-disks desktops G. Ganis, 2nd LCG-France Colloquium
PROOF essentials: what can be done? • Ideally everything that can be split in independent tasks • Currently available: • Processing of trees (see next slide) • Processing of independent objects in a file • Tree processing and drawing functionality complete LOCAL PROOF // Create a chain of trees root[0] TChain *c = CreateMyChain.C; // Start PROOF and tell the chain // to use it root[1] TProof::Open(“masterURL”); root[2] c->SetProof() // Process goes via PROOF root[3] c->Process(“MySelec.C+”); // Create a chain of trees root[0] TChain *c = CreateMyChain.C; // MySelec is a TSelector root[1] c->Process(“MySelec.C+”); G. Ganis, 2nd LCG-France Colloquium
output list Selector Process() • Begin() • Create histos, … • Define output list • Terminate() • Final analysis • (fitting, …) OK preselection analysis n event branch read needed parts only branch leaf leaf branch leaf branch leaf leaf leaf leaf Chain n last 1 2 loop over events The ROOT data model: Trees & Selectors G. Ganis, 2nd LCG-France Colloquium
xproofd PROOF essentials: multi-tier architecture One sub-master per geographic domain Structured master - adapt to clusters of clusters - improve scalability Heterogenous hardware / OS Node sessions started by Xrootd G. Ganis, 2nd LCG-France Colloquium
PROOF essentials: connection layer • Sets-up the client session • Authentication, sandbox setup, start sessions on nodes • Based on xrootd • Light weight, industrial strength, networking and protocol handler • New PROOF-related protocol plug-in, xpd • xpd launches and controls PROOF sessions (proofserv) • xrootd act as a coordinator on the farm • Client disconnection / reconnection handled naturally • Can use the same daemon for data and PROOF serving G. Ganis, 2nd LCG-France Colloquium
PROOF essentials: dynamic load balancing • Pull architecture guarantees scalability • Adapts to definitive / temporary variation in performance Master Worker N Worker 1 G. Ganis, 2nd LCG-France Colloquium
PROOF essentials: intrinsic scalability • Strictly concurrent user jobs at CAF (100% CPU used) • In-memory data • Dual Xeon, 2.8 GHz • CMS analysis • 1 master, 80 workers • Dual Xeon 3.2 GHz • Local data: 1.4 GB / node • Non-Blocking GB Ethernet 1 user 2 users 4 users 8 users I. Gonzales, Cantabria G. Ganis, 2nd LCG-France Colloquium
PROOF essentials: exploiting multi-cores • Alice search for 0’s • 4 GB simulated data • Instantaneous rates (evt/s, MB/s) • Clear advantage of quad core • Additional computing power fully exploited G. Ganis, 2nd LCG-France Colloquium
PROOF essentials: additional remarks • Intrinsic serial overhead small • requires reasonable connection between a (sub-)master and its workers • Hardware considerations • IO bound analysis (frequent in HEP) often limited by hard drive access: N small disks are much better than 1 big one • Good amount of RAM for efficient data caching • Data access is The Issue: • Optimize for data locality, when possible • Efficient access to mass storage (next slide) G. Ganis, 2nd LCG-France Colloquium
PROOF essentials: data access issues • Low latency in data access is essential for high performance • Not only a PROOF issue • File opening overhead • Minimized using asynchronous open techniques • Data retrieval • caching, pre-fetching of data segments to be analyzed • Recently introduced in ROOT for TTree • Techniques improving network performance, e.g. InfiniBand, or file access (e.g. memory-based file serving, PetaCache) should be evaluated G. Ganis, 2nd LCG-France Colloquium
PROOF essentials: scheduling multi-users • Fair resource sharing, enforce priority policies • Priority-based worker level load balancing • Simple and solid implementation, no central unit • Slowdown lower priority sessions • Group priorities defined in the configuration file • Future: central scheduler for per-query decisions based on: • cluster load, resources need by the query, user history and priorities • Generic interface to external schedulers planned • MAUI, LSF, … G. Ganis, 2nd LCG-France Colloquium
PROOF essentials: management tools • Data sets • Optimized distribution of data files on the farm • By direct upload • By staging out from mass storage (e.g. CASTOR) • Query results • Retrieve, archive • Packages • Optimized upload of additional libraries needed the analysis G. Ganis, 2nd LCG-France Colloquium
PROOF essentials: monitoring • Internal • File access rates, packet latencies, processing time, etc. • Basic set of histograms available at tunable frequency • Client temporary output objects can also be retrieved • Possibility of detailed tree for further analysis • MonALISA-based • Each host reports • CPU, memory, swap, network • Each worker reports • CPU, memory, evt/s, IO vs. network rate • pcalimonitor.cern.ch:8889 Network traffic between nodes G. Ganis, 2nd LCG-France Colloquium
PROOF GUI controller • Allows full on-click control • define a new session • submit a query, execute a command • query editor • create / pick up a TChain • choose selectors • online monitoring of feedback histograms • browse folders with results of query • retrieve, delete, archive functionality G. Ganis, 2nd LCG-France Colloquium
Outline • PROOF essentials • Integration with experiments software • Main issues • PROOF packages • Examples of ALICE, Phobos, CMS • ALICE experience at the CAF G. Ganis, 2nd LCG-France Colloquium
Integration with experiment software: main issues • Finding, using the experiment software • Environment settings, libraries loading • Implementing the analysis algorithms • TSelector strengths • Automatic tree interaction • Structured analysis • TSelector weaknesses • Big macros • New analysis implies new selector • Change in the tree definition implies a new selector • Add layer to improve flexibility and to hide irrelevant details G. Ganis, 2nd LCG-France Colloquium
Integration with experiment software • Experiment software framework available on nodes • Working group dedicated packages uploaded / enabled as PROOF packages (next slide) • Allows user to run her/his own modifications • Minimal ROOT environment set by the daemons before starting proofserv • Setting the experiment environment • Statically, before starting xrootd (inherited by proofserv) • Dynamically, by evaluating a user defined script in front of proofserv • Allows to select different versions at run time G. Ganis, 2nd LCG-France Colloquium
PROOF package management • Allows client to add software to be used in the analysis • Uploaded in the form of PAR files (Proof ARchive) • Simple structure • package/ • Source / binary files • package/PROOF-INF/BUILD.sh • How to build the package (makefile) • package/PROOF-INF/SETUP.C • How to enable the package (load, dependencies) • Versioning support being added • Possibility to modify library / include paths to use public external packages (experiment libraries) G. Ganis, 2nd LCG-France Colloquium
Integration with experiment software: ALICE • AliROOT analysis framework • Deployed on all nodes • Needs to be rebuilt for each new ROOT version • Versioning issue being solved • One additional package (ESD) needed to read Event Summary Data • Uploaded as PAR file • Working group software automatically converted to PROOF packages (‘make’ target added to Makefile) • Generic AliSelector hiding details • User’s selector derives from AliSelector • Access to data by member fESD TSelector AliSelector <UserSelector> G. Ganis, 2nd LCG-France Colloquium
Integration with experiment software: ALICE • Alternative solution: • split analysis in functional modules (tasks) • Each task corresponds to well defined action • Tasks are connected via input/output data containers, defining their inter-dependencies • User creates tasks (derivation of AliAnalysisTask) and registers them to a task manager provide by the framework • The task manager, which derives from TSelector, takes care of the proper execution respecting the dependencies G. Ganis, 2nd LCG-France Colloquium
Integration with experiment software: Phobos • TAM: Tree Analysis Modules solution • Modules structured like TSelector (Begin, Process, …) separating tree structure from analysis • Organization: • TAModule (: public TTask), base class of all modules • ReqBranch (name, pointer) • attach to a branch in Begin() or SlaveBegin() • LoadBranch (name) • Load the branch data in Process() • TAMSelector (: public TSelector) • Module running and management • Handle interaction with tree • TAMOutput: stores module output objects G. Ganis, 2nd LCG-France Colloquium
Integration with experiment software: Phobos Example of user’s module class TMyMod : public TAModule { private: TPhAnTEventInfo* fEvtInfo; // event info TH1F* fEvtNumH; // event num histogram protected: void SlaveBegin(); void Process(); void TMyMod::SlaveBegin() { ReqBranch(“eventInfo”, fEvtInfo); fEvtNumH = new TH1F(“EvtNumH”,”Event Num”,10,0,10); } void TMyMod::Process() { LoadBranch(“eventInfo”); fEvtNumH->Fill(fEvtInfo->fEventNum); } G. Ganis, 2nd LCG-France Colloquium
Integration with experiment software: Phobos Example analysis • Build module hierarchy • No PROOF • PROOF TMyMod* myMod = new TMyMod; TMyOtherMod* subMod = new TMyOtherMod; myMod->Add(subMod); TAMSelector* mySel = new TAMSelector; mySel->AddInput(myMod); tree->Process(mySel); TList* output = mySel->GetModOutput(); dset->AddInput(myMod); dset->Process(“TAMSelector”); TList* output = gProof->GetOutputList(); G. Ganis, 2nd LCG-France Colloquium
Integration with experiment software: CMS • Environment: CMS needs to run SCRAM before proofserv • PROOF_INITCMD contains thepath of a script • The script initializes the CMS environment using SCRAM TProof::AddEnvVar(“PROOF_INITCMD”, “~maartenb/proj/cms/CMSSW_1_1_1/setup_proof.sh”) #!/bin/sh # Export the architecture export SCRAM_ARCH=slc3_ia32_gcc323 # Init CMS defaults cd ~maartenb/proj/cms/CMSSW_1_1_1 . /app/cms/cmsset_default.sh # Init runtime environment scramv1 runtime -sh > /tmp/dummy cat /tmp/dummy G. Ganis, 2nd LCG-France Colloquium
Integration with experiment software: CMS • CMSSW: software framework provides EDAnalyzer technology for analysis purpose • Write algorithms that can be used with both technologies (EDAnalyzer and TSelector) • Possible if well defined interface class MyAnalysisAlgorithm { void process( const edm::Event & ); void postProcess( TList & ); void terminate( TList & ); }; • Used in a TSelector templated framework TFWLiteSelector TFWLiteSelector<MyAnalysisAlgorithm> G. Ganis, 2nd LCG-France Colloquium
Integration with experiment software: CMS • In PROOF selectors libraries distributed as PAR file // Load framework library gSystem->Load(“libFWCoreFWLite”); AutoLibraryLoader::enable(); // Load TSelector library gSystem->Load(“libPhysicsToolsParallelAnalysis”); G. Ganis, 2nd LCG-France Colloquium
Outline • PROOF essentials • Integration with experiment software • ALICE experience at the CAF G. Ganis, 2nd LCG-France Colloquium
ALICE experience at the CAF • CERN Analysis Facility used for short / medium tasks • p-p prompt analysis, Pb-Pb pilot analysis • Calibration & Alignment • Alternative to using the Grid • Massive execution of jobs vs. fast response time • Available to the whole collaboration • number of users will be limited for efficiency reasons • Design goals • 500 CPUs, 200 TB of selected data locally available G. Ganis, 2nd LCG-France Colloquium
ALICE at CAF: example of data distribution Total: 200 TB 20% Last day RAW events 3.2M PbPb or 40M pp 20% Fixed RAW events 1.6M PbPb and 20M pp 20% Fixed ESDs 8M PbPb and 500M pp 40 % Cache for files retrieved from AliEn Grid, Castor Sizes of single events from Computing TDR G. Ganis, 2nd LCG-France Colloquium
ALICE experience at the CAF: test setup • Test setup since May 2006 • 40 machines, 2 CPUs each (Xeon 2.8 Ghz), ~200 GB disk • 5 as development partition, 35 as production partition • Machine pools are managed by xrootd • Fraction of data of Physics Data Challenge ’06 distributed (~ 1 M events) • Tests performed • Usability tests • Speedup tests • Evaluation of the system when running a combination of query types • Integration with ALICE’s analysis framework (AliROOT) G. Ganis, 2nd LCG-France Colloquium
ALICE experience at the CAF: realistic stress test • A realistic stress test consists of different users that submit different types of queries • 4 different query types • 20% very short queries (0.4 GB) • 40% short queries (8 GB) • 20% medium queries (60 GB) • 20% long queries (200 GB) • User mix • 33 nodes available for the test • Maximum average speedup for 10 users = 6.6 (33 nodes = 66 CPUs) G. Ganis, 2nd LCG-France Colloquium
ALICE experience at the CAF: query types *run in PROOF, 10 users, 10 workers each G. Ganis, 2nd LCG-France Colloquium
ALICE experience at the CAF: speed-up G. Ganis, 2nd LCG-France Colloquium
ALICE experience at the CAF: speed-up • Theoretical batch limit achieved and by-passed automatically • Machines load was 80-90% during the test • Adding workers may be inefficient • Tune number of workers for optimal response • Depends on query type and internals • Number, type, size of output objects • Shows importance of active scheduling G. Ganis, 2nd LCG-France Colloquium
Summary • PROOF provides an alternative approach to HEP analysis on farms trying to automatically avoid under-usage, preserving the goodies of interactivity • Real issue is data access (everybody affected!) • pre-fetching and asynchronous techniques help • Alternative technologies (e.g. InfiniBand) or alternative ideas (PetaCache) worth to be investigated • ALICE is pioneering the system in LHC environment using a test-CAF at CERN • CMS manifested its interest and test-clusters are being set up • A lot of useful feedback: PROOF is steadily improving G. Ganis, 2nd LCG-France Colloquium
Credits • PROOF team • M. Ballintijn, B. Bellenot, L. Franco, G.G, J. Iwaszkiewizc, F. Rademakers • J.F. Grosse-Oetringhaus, A. Peters (ALICE) • I. Gonzales, L. Lista (CMS) • A. Hanushevsky (SLAC) • C. Reed (MIT, Phobos) G. Ganis, 2nd LCG-France Colloquium
Questions? G. Ganis, 2nd LCG-France Colloquium