1 / 8

Job Merging for Efficient Data Analysis

Learn about the next logical step after job splitting: job merging. Discover how to concatenate the results of subjobs and the benefits of merging. Follow the status of merging and automate the process.

hilldanny
Télécharger la présentation

Job Merging for Efficient Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AliEn Job Merging Pablo Saiz CAF and Grid User Forum

  2. Job Merging • Next logical step after job splitting • See http://indico.cern.ch/conferenceDisplay.py?confId=31167 • Concatenate the result of all subjobs of a given masterjob • New status if a masterjob needs merge: INSERTED SPLITTING SPLITMERGINGDONE • The ‘merging’ is another job • It will wait in the queue like any other job pablo.saiz@cern.ch

  3. 1 image is better than 1000 words histo.root analysis.log Subjob 1 Merge Histo AllHisto.root Subjob 2 histo.root analysis.log User JDL Subjob 3 Merge Logs ERROR!! Alllogs.txt … histo.root analysis.log Subjob n Time INSERTED SPLIT MERGING DONE pablo.saiz@cern.ch

  4. How to specify Merging • In the JDL of the masterJob: • Merge={“<input>:<jdl>:<output>” (,“<input2>:<jdl2>:<output2>”)* } • MergeOutputDir=“/path/where/you/want/the/output”; • Default /proc/<user>/<masterid>/merge • AliEn will do: submit <jdl> <masterJobId> <input> <output> <user> <procdir> <outputdir> pablo.saiz@cern.ch

  5. How to start the merging • Automatically: • When all the subjobs are in a final state, AliEn sends the merging • masterJob <id> merge • Force the merging of the subjobs that have finished • By hand: • submit <jdl> <masterJobId> <input> <output> <user> <procdir> <outputdir> pablo.saiz@cern.ch

  6. Existing merging JDLs • /alice/jdl/mergerootfile.jdl • /alice/jdl/mergerootfile-sequential.jdl • No requirements. The merging can be executed anywhere! • User defined • Variations of the previous jdl • There is no merging for text files: • Needed? pablo.saiz@cern.ch

  7. ToDo • Given a masterjobid, follow up the status of the merging • Automatically put requirements on the execution site for the merging. • More documentation in the bible • ? pablo.saiz@cern.ch

  8. Conclusions • Merging collects the output of subjobs into a single file • Performed when all the subjobs are in a final state: ERROR or DONE • Can also be trigger manually • Documentation will be added to the bible pablo.saiz@cern.ch

More Related