
PROOF - Parallel ROOT Facility Maarten Ballintijn http://root.cern.ch Bring the KB to the PB not the PB to the KB
PROOF Intro • Collaboration between core ROOT group at CERN and MIT Heavy Ion Group • Rene Brun • Fons Rademakers • Gunther Roland • Maarten Ballintijn • Part of and based on ROOT framework • ROOT since 1995, PROOF started 2001 • A wealth of info at http://root.cern.ch • In ROOT CVS tree, beta tests ongoing
PROOF Intro • Collection of servers processes data • Parallel I/O and Parallel CPU • CPU Allocation and Data Access Strategies • Dynamic resource allocation • Local data first, also rootd, SAN/NAS • Transparency • Single source Analysis code • Input Objects copied from Client • Output Objects merged, returned to Client • Scalability and Adaptability • Dynamic packet size
PROOF Intro Slave many slaves Internet Master ROOT Client Session
Phobos Event and AnT Tree 1 TTree: Paddle Event 0..n 0..n Track Vertex 0..n Hit 1..n
PROOF Packages • Provide a collection of files in the sandbox • Binary or Source packages • PAR files: Proof ARchive. Like Java jar • Tar file, ROOT-INF directory • BUILD.sh • SETUP.C, per slave setting • API manage and activate packages
AnT Package ant: PROOF-INF/ Makefile LinkDef.h TPhAnTEventInfo.cxx TPhAnTEventInfo.h TPhAnTHit.cxx TPhAnTHit.h TPhAnTPdlInfo.cxx TPhAnTPdlInfo.h TPhAnTTrack.cxx TPhAnTTrack.h TPhAnTVertex.cxx TPhAnTVertex.h ant/PROOF-INF: BUILD.sh SETUP.C #!/bin/sh # BUILD.sh -- Build libant.so exec make // SETUP.C -- Load AnT library { gSystem->Load("libPhysics.so"); gSystem->Load("libant.so"); }
Analysis using TSelector • Extend Framework by inheritance // Abbreviated version class TSelector : public TObject { Protected: TList *fInput; TList *fOutput; public void Init( TTree* ); void Begin( Ttree* ); Bool_t Process(int entry); void Terminate(); };
Analysis using TSelector • Create Class inheriting from Tselector • Implement member functions • Begin() – Called once at the beginning of an analysis job, in each of the slave servers. Used to e.g. create histograms, initialize data • Process()- Called for each entry to be processed (by that slave) • Terminate()- Called once at the end of an analysis job, in each of the slave servers. Used to e.g. for post processing data, cleanup • Init() – Called for each new file
Example Selector antsel.C Antsel::Begin(Ttree *) { fVtx_x = new TH1F(“vtxx”,“Vertex X”,100,-10.,10.); } Antsel::Process(int entry) { fChain->GetTree()->GetEntry(entry); if ( evtInfo->fPdlInfo->fPdlMean < 1500 ) return; TPhAnTVertex *v = evtInfo->fRMSSelvtx->GetObject(); fVtx_x->Fill( v->fPos.X() ); } Antsel::Terminate() { fOutput->Add(fVtx_x); }
Running locally • Develop and debug selector locally on small event sample. % root Root[0] TFile *f = Tfile::Open(“ant_sample.root”) Root[1] TTree *t = (Ttree*) f->Get(“trkTree”) Root[3] t->Process(“antsel.C”,””,2000) Real time 0:00:06, CP time 5.940 Root[4] vtxx->Draw() Root[5] .! vi antsel.C • About 8Mb data (~x5 compression) • Develop until ready for large sample.
Running Locally • Ready to run on a large sample
TDSet – Specify the data • Specify a collection of TTrees or TFiles with objects [] TDSet *d = new TDSet(“TTree”, “tracks”, “/”); [] TDSet *d = new TDSet(“TEvent”, “”, “/objs”); [] d->Add(“root://rcrs4001/a.root”, “tracks”, “dir”, first, num); … [] d->Print(“a”); • To be returned by DB or File Catalog query etc. • Use logical filenames (“lfn:…”)
Running with PROOF • Ready to run on large event sample % root Root[0] gROOT->Proof(“pgate.lns.mit.edu”) … login details … Root[1] TDSet *ds = make_dset() Root[2] gProof->UploadPackage(“ant.par”) Root[3] gProof->EnablePackage(“ant”) … Root[4] gProof->Process(“antsel.C”,””,60000) Real time 0:00:12, CP Time 0.050 Root[5] ((TH1F*)gProof->GetOutput(“vtxx”))->Draw() • Use same session to look at other histograms, change cuts etc.
The PROOF advantage • Processed 240 Mb in 12 sec.
PROOF Scalability 8.8GB, 128 files 1 node: 5:25 m 32 nodes in parallel: 12 s 32 nodes: dual Itanium II 1 GHz CPU’s, 2 GB RAM, 2x75 GB 15K SCSI disk, 1 Fast Eth, 1 GB Eth nic (not used) Each node has one copy of the data set (4 files, total of 277 MB), 32 nodes: 8.8 Gbyte in 128 files, 9 million events
Future Work • Ongoing development • Improvements and defect fixes • Event lists • Friend Tree • Multi site PROOF sessions • Continued development of GRID based PROOF cluster
Other PROOF Talks • Fons Rademakers: • Distributed Parallel Analysis Framework with PROOF (15:00, session 2) • Jinghua Liu: • Analysis of CMS Heavy Ion Simulation Data Using ROOT/PROOF/Grid