90 likes | 241 Vues
Physics Analysis inside the Oracle DB. Progress report 10 Octobre 2013. Performance with parallel scaling. Original result showed DB-version of the H+Z benchmark was faster than serial execution of the root- ntuple - analysis,on test3 -setup.
E N D
Physics Analysis inside the Oracle DB Progress report 10 Octobre 2013
Performance with parallel scaling • Original result showed DB-version of the H+Z benchmark was faster than serial execution of the root-ntuple-analysis,ontest3-setup • Parallelism with root can be mimicked by running multiple simultaneous jobs each running on a subset of data • Ntuple-version improves more with parallel execution as the DB is limited by IO • more data needs to be read by the DB compared to the column-storage in the ntuples • Test3-setup has datafiles stored on nfs, resulting in relativly slow I/O speed of ~250 MB/s
Ttbarcutflow analysis • A cutflow analysis for the top-pair production cross-section measurement was implemented as a new benchmark. • Original “RootCore”-packages used by ATLAS top physics group are compared to a modified set of packages that retrieve data from the DB via an SQL-query. More realistic than Higgs+Zbenchark: • uses 262 variables (compared to 40 in Higgs+Z) • also uses data from photon and primary-vertex objects (photon-table=114 GB table!) • Another 3 external libraries added, used to call functions for corrections on electron and photon objects • Selection of electron and photon objects can not be done as single table select, the corrections depends on the number of vertices in the event (pile-up) so an inner join with the vertex-table is required Only “electron”-channel implemented so far but “muon”-channel is very similar…
Performance with parallel scaling • Ttbarcutflow analysis from root-ntuples is very slow in serial! • But when running multiple simultaneous root-jobs it again becomes faster than DB-version • … this is on the test3-setup where I/O is relatively slow
Mapred-cluster • The mapred-cluster (described in previous report) has better I/O reads • 5 nodes connected to 5 disk arrays with a total of 60 disks -> up to 2500 MB/s • As shown last time root-ntuple analysis is faster on this cluster for Higgs+Z benchmark: 40 root-jobs: 71 seconds SQL parallel 40: 135 seconds
Mapred-cluster • But the root-ntuple analysis is slower than DB for ttbarcutflow analysis SQL parallel 40: 372 seconds 40 root-jobs: 588 seconds • Optimistic conclusion: Oracle DB beats root-ntuple analysis for realistic physics analysis, when given fast enough I/O reads!
ttbarcutflow: why so slow? • But WHY is the ttbarcutflow analysis so much slower than Higgs+Z for the ntuple-analysis on the “mapred”-cluster ? • As a test I rewrote the ttbar-analysis packages to return the result only for a specific sub-selection, and similary broke the SQL-version down to produce the same result • This allows to compare the timing results for DB vsntuple for different type of selections! • The DB analysis was always faster, even though much less branches need to be loaded for the separate object-selection than for the full ttbarcutflow (=262 branches) • This looked suspicious to me!
ttbarcutflow: why so slow? • I wrote a single root-macro to reproduce the same results as the single-electron-select from the ttbar-packages • This macro was much faster: 55 seconds instead of 214 seconds • I think the difference in time is due to the ttbar-packages disabling branches and calling Tree->GetEntry() while my root-macro call branch->GetEntry(), this turns out to make a large difference in CPU: good-electron, simple macro w. tree->GetEntry(), 40 root-jobs: Time = 156 seconds good-electron, simple macro w. load branches, 40 root-jobs: Time = 156 seconds
Summary • Real analysis code is not always optimized to run as fast as possible! • ttbarcutflow is an interesting case-study and it uses many different object-selections that can be studied separately • Experience from converting ttbarcutflow in SQL: • Using a materialized view for the goodrunlist-selection • Adding external libraries (PL/SQL calling Java calling C++) is easy once you know how to! • It is more difficult when selection requires cross-table selection with INNER JOIN (photon/electron selection requiring info from vertex-table for pileup correction) but not impossible • I’m using lots of MATERIALIZE hints, I don’t trust the SQL optimizer as it goes a crazy with all those JOINs… • I still have a single analysis query to run entire cutflow, eventually might be better/easier to write the code by explicitly creating tables to hold intermediate selection (using NOLOGGING-option and parallel dml hints to speed up table-creation) • To do: • Finish ttbarcutflow, there are still some cuts requiring cross-table matching not implemented • Add muon-channel, ntuple-analysis produces 2 cut-flow for muon- and electron-channel in a single analysis as they use the results after good object selection, I can do the same in SQL but I will need to create intermediate tables to hold temporary selection