110 likes | 228 Vues
This document outlines the Abstract Job Description Language (AJDL) model utilized during the PPDG Collaboration Meeting. It details user interactions with datasets, transformations, job creation, and result examination in a distributed analysis system. Key components include datasets, transformations, job definitions, and user preferences, all designed to abstractly define tasks for execution across various computing sites. The model emphasizes extensibility for diverse dataset types and supports various job management systems.
E N D
AJDL: Abstract Job Description Language PPDG Collaboration Meeting Williams Bay David Adams BNL June 29, 2004
Model Components Implementation Contents AJDL PPDG Collaboration Meeting
Model • Job-based model • User selects an input dataset • User selects/constructs a xform to apply to this dataset • Distributed analysis system constructs a job to apply the xform to the dataset • Result is a new dataset • Partial results may be available during processing • User examines the result • From this identify the components of AJDL • Dataset • Transformation (e.g. application and task) • Job (xform, dataset, job preferences) AJDL PPDG Collaboration Meeting
Model (cont) • Abstract means • User job definition should be suitable for invocation at any site using any WMS • Specify what to do; not how to do it • Analysis service • Receives abstract job request • Split into subjobs • Typically by splitting input dataset • Map transformation to local executable and runtime environment • Run executable on each sub-dataset • Gather and merge results from each sub-job AJDL PPDG Collaboration Meeting
Components • Dataset • Identity • Dataset is immutable • Location • Typically list of LFN’s • May be absent (virtual dataset) • DRC then provides • Content • Which events • Type of data in each event (raw, trackxs, jets, aod, …) • Compound structure • List of sub-datasets • Can be a tree structure AJDL PPDG Collaboration Meeting
Components (cont) • Application • Script to process a dataset • Output is another dataset • List of software packages • Assume package management service to provide location of a specified package • May have automatic installation • Application advertises the required content • Compare with content of input dataset to verify compatibility • Second script to build task before processing • E.g. compile provided sources AJDL PPDG Collaboration Meeting
Components (cont) • Task • Carries the data used to configure the application • At present the task carries embedded text files • E.g. myalg.cxx • May add named parameters AJDL PPDG Collaboration Meeting
Components (cont) • Job preferences • Allow user to provide hits for processing • Location for output data • User role • Desired response time • System may ignore or freely interpret these AJDL PPDG Collaboration Meeting
Components (cont) • Job • ID • Current state (initializing, running, done, failed, …) • Start stop time • List of sub-job ID’s • Input application, task and dataset • Output dataset • Partial result if job is not complete • Access to control job • Suspend/resume • Kill AJDL PPDG Collaboration Meeting
Implementation • Extensibility • Must be extensible to support different types of datasets and jobs • AtlasPoolEventDataset, RootHistogramDataset, … • ProcessJob, LsfJob, CondorJob, EgeeJob, … • Can we use the same schema for all types? • So far yes for jobs • Probably for applications and tasks • Not clear for datasets • Data representation • XML description for each type AJDL PPDG Collaboration Meeting
Implementation • Classes • Provide class interfaces for each type • C++, python and maybe java • C++ from DIAL • Python binding to C++ using lcgdict (GANGA) • Convenience for implementing clients and services • Add operations to take action • E.g. fetch local replicas of files in a dataset • Update status or kill a job • May add functionality for subtypes • Extract histograms for a RootHistogramDataset AJDL PPDG Collaboration Meeting