1 / 13

Analysis framework – status and plans

Analysis framework – status and plans. Andrei Gheata ALICE offline week 19 March 2014. Outline. Framework status I/O and AOD: the path from giga to nano F lat AOD’s. Status. Many improvements in the LEGO framework – see presentation from Jan- Fiete

ave
Télécharger la présentation

Analysis framework – status and plans

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis framework – status and plans Andrei Gheata ALICE offline week 19 March 2014

  2. Outline • Framework status • I/O and AOD: the path from giga to nano • Flat AOD’s

  3. Status • Many improvements in the LEGO framework – see presentation from Jan-Fiete • Support for MC loop – generation + analysis without persistent kinematics • Transient event pointer in tracks: needed for PID usage in cuts • Set automatically if using handler • User needs to ConnectTracks and Reset the event manually otherwise • Work on cleaning filtering cuts • Consistent merging of delta AOD’s • Several other small fixes • Automatic testing procedure for central filtering • Triggered by the build server • Fully implemented, to be deployed in the coming week

  4. I/O and AOD • Large AOD container – I/O penalties • The idea was to read only the needed branches, but this is not happening in our trains • Small AOD container – fast I/O but proliferation of datasets and extra filtering step needed • Reuse few times then drop – model to be tested in practice

  5. Nano AOD • Quite appealing for analyses needing a small subset of the event data • Specially when the resulting AOD fits a laptop • New (simple) interface to generate nano AOD’s • See Michele’s presentation • Not a silver bullet • Much less opportunities to make large trains sharing the input • CPU-bound analysis will still need to split • Extra overhead to access data members • Extra overhead for data management and filtering jobs – ATLAS moving slightly away from this model

  6. Problems with the current AOD approach • Very large I/O per event translated in deserialization time • Many branches made of custom types (e.g. PID, centrality), many of them not split • Needing to load several libraries besides root ones • Inefficient for compression and overheads in accessing the information • Impossible to auto-detect the required branches per module • The built-in feature AliAnalysisTaks::fBranchNames practically not used • Can increase speed by very large factors! • Even in the nano AOD approach the user has to declare the branches… • Too big tree complexity with many streamer info’s to be handled by root -> CPU overhead

  7. A compromise • A simplified yet comprehensive container with a more flat (ntuple-like) structure • Fully split and having the feature to easily select the required input branches • Protect against selection errors (!) – any data request not matching the declared branches producing a Fatal • Faster access, compact data, possibility to vectorize loops • No need of custom AliRoot libs to analyze • Contiguous data per type insured by root • Alignment not yet – this is under investigation by I/O experts • Use of std::vector • Change of interface: event->GetTrack(i)->GetPt() becomes: event->GetPt(i) • Old API to be preserved (with some cost) for a while, then deprecated and removed (discussed in next slides)

  8. Error protection • Make sure that the task trying to use undeclared data crashes rather than produce incorrect results • Declared branches: “tracks_fPz, tracks_fPID,…” • UserExec { … event->GetPt(itrack);} (uses px,py) • AliFatal: used undeclared branch • Implementation not trivial for preserving vectorization possibility • Mapping of API methods versus branch names • A “slow” implementation: • Pt() {if (!fBranchMap[Mask(kPx | kPy)]) AliFatal(); else return …;} • A way to switch between slow and fast versions (unfortunately at compile time) • A solution using templates (to avoid virtuality) also possible • E.g. LEGO test with slow version, actual run with fast one

  9. Exercise • Use AODtree->MakeClass() to generate a skeleton, then rework • Keep all AOD info, but restructure the format Int_tfTracks.fDetPid.fTRDncls -> Int_t *fTracks_fDetPid_fTRDncls; //[ntracks_] • More complex cases to support I/O: typedefstruct{ Double32_t x[10]; } vec10dbl32; Double32_tfTracks.fPID[10] -> vec10dbl32 *fTracks_fPID; //[ntracks_] -> in future vector<Double32_t> Double32_t *fV0s.fPx //[fNprongs] -> TArrayF *fV0s.fPx; //[nv0s_] • Convert AliAODEvent-> FlatEvent • Try to keep the full content AND size • Write FlatEventon file • Compare file size and read speed

  10. Results • Tested on AOD PbPb: AliAOD.root ****************************************************************************** *Tree :aodTree : AliAOD tree * *Entries : 2327 : Total = 2761263710 bytes File Size = 660491257 * * : : Tree compression factor = 4.18 * ****************************************************************************** ****************************************************************************** *Tree :AliAODFlat: Flattened AliAODEvent * *Entries : 2327 : Total = 2248164303 bytes File Size = 385263726 * * : : Tree compression factor = 5.84 * ****************************************************************************** • Data smaller (no TObject overhead, TRef->int) • 30% better compression • Reading speed • Old: CPU time= 103s , Real time=120s • New: CPU time= 54s , Real time= 64s

  11. Benefits • User analysis more simple, working mostly with basic types (besides the event) • Simplified access to data, highly reducing number of (virtual) calls, specially in deep loops • ROOT-only analysis • No problem with schema evolution, supported as before • Filtering hierarchical to flat AOD’s straightforward • Much better vectorizable track loops • Deltas would have to be also flattened

  12. Migration • To allow a smooth transition, the old code should run with the new format • Requires: loading new flat event, producing automatically an AliAODEvent matching it • Done smartly, can use AliAODFlat data from the original location (no copy) • AliAODInputHandler::SetSupportOldFormat • New code will immediately benefit by calling the new API • Gradually deprecate old API • Convert old AOD format to new one

  13. Conclusions • Framework becoming more flexible – use with reconstruction, now also with generators • Focusing on the data structure – unification of API ESD/AOD, AOD skimming, flat AOD • Flattening the event structure has benefits • Sizeable gains in size and speed (>50%) • Benefits for user code performance (faster access and vectorisation) • Conversion straightforward • An approach we will have to consider seriously for Run2 and specially Run3

More Related