130 likes | 140 Vues
This workshop provides an overview of AliEn and aliensh, the File Catalogue structure, path/file name definitions, and additional run and file level metadata. It also includes a working example and an outlook for future developments.
E N D
ALICE metadata Markus Oldenburg GridPP Metadata Workshop July 4–7 2006, Oxford University M. Oldenburg GridPP Metadata Workshop — July 4–72006, Oxford University 1
Overview • AliEn and aliensh • File Catalogue • structure • path/file name definition • Additional Run and File Level Metadata • Event Level Metadata • ‘Working’ Example • Summary and Outlook M. Oldenburg GridPP Metadata Workshop — July 4–72006, Oxford University 2
AliEn and aliensh • AliEn (Alice Environment) • distributed computer environment for Alice • core services in PERL • provides • Database interface (MySQL) • File Catalogue • Metadata Catalogue • other services… • aliensh • provides commands to access AliEn GRID computing resources and the AliEn virtual file system • bash like behaviour • interactive, single-command-, or script-execution • informative + convenience commands (whoami, less, …) • virtual file catalogue + data management commands (cp, rm, find, …) • TaskQueue/job management commands (submit, ps, kill, …) M. Oldenburg GridPP Metadata Workshop — July 4–72006, Oxford University 3
Structure of the File Catalogue • File Catalogue • acts as and looks like a ‘File System’ • doesn’t own the files, just associates logical file names (LFN) with physical locations/physical file names (PFN) • MySQL database • each virtual directory is represented by one table • subdirectories are connected to directories by sub-table entries • LFNs (base names) are represented as entries in directory tables • These entries hold the name (LFN) and the PFN. • PFN contains • protocol how to access the data • host where to find the data • access port • directory entry (= file name) M. Oldenburg GridPP Metadata Workshop — July 4–72006, Oxford University 4
Pathname Definitions • for real data /alice/data/‹Year›/‹AcceleratorPeriod›/‹RunNumber›/ • for simulated data /alice/sim/‹Year›/‹ProductionType›/‹RunNumber›/ • subdirectories: • for raw data raw/ • for links to calibration and condition files reco/‹PassX›/cond/ • for ESD and corresponding tag files reco/‹PassX›/ESD • for AOD files reco/‹PassX›/AOD M. Oldenburg GridPP Metadata Workshop — July 4–72006, Oxford University 5
Filename Definition • for ESD files ‹xxxx›.AliESD.root • similar for all other files (except for condition files) • a tool will be provided to generate ‘meaningful’ file names if somebody wants to make a local copy • files to be registered in the file catalogue • raw data files, • AliESD.root files, • AliESDfriends.root files, and • ESDtags.root files M. Oldenburg GridPP Metadata Workshop — July 4–72006, Oxford University 6
Run and File Level Metadata • Metadata Catalogue • additional tables can be attached to each ‘directory’/table of the MySQL database metadata • directory structure (grouping of ‘similar’ files) allows for • reduction of (additional) metadata for a given directory • enhancement of search performance M. Oldenburg GridPP Metadata Workshop — July 4–72006, Oxford University 7
run comment run type physics, laser, pulser, pedestal, simulation run start time run stop time run stop reason Normal, beam loss, detector failure, … magnetic field setting FullField, ReversedField, ZeroField, HalfField collision system PbPb, pp, pPb, … collision energy trigger setup name detectors present in this run # of events in this run run sanity MetaData Overview I Run Event File • file sanity flag (“online/offline”, “available/ not available”) • event id • centrality • multiplicity • an array for different detectors? • luminosity • magnetic field value • trigger condition • detectors with data in this event • mean pT • max pT • # of protons • # pions • # of strange particles • # of pos. charges • # of neg. charges • # of • … • event sanity M. Oldenburg GridPP Metadata Workshop — July 4–72006, Oxford University 8
for produced events production tag production software library version for simulation generator generator version generator comments generator parameters detector geometry detector configuration simulation comments MetaData Overview II Run Event File All this is additional information to what is stored in the path name! M. Oldenburg GridPP Metadata Workshop — July 4–72006, Oxford University 9
Event Level Metadata • raw data is processed right after data taking • some physical quantities will be extracted right away • multiplicity • vertex position • … • each file containing physics events gets an additional file containing this event level metadata ‘attached’ • ESDtags file • root file • stored in the same directory as the physics data file • content can be extended later (or each user can even create his/her own tag files) M. Oldenburg GridPP Metadata Workshop — July 4–72006, Oxford University 10
Event Level Metadata Creation/Selection INDEX BUILDER RECONSTRUCTION POST PROCESS BITMAP INDICES LIST OF EVENTS GROUPED BY GUID QUERY PROOF/AliEn QUERY ANALYSIS CODE LIST OF EVENTS GROUPED BY GUID P. Christakoglou M. Oldenburg GridPP Metadata Workshop — July 4–72006, Oxford University 11
Working Example • user wants to analyse • AliESDs • pp collisions • taken on 19. and 20.09.2007, before 10:20:33 h • … $ find -x pp /alice/data/2007/LHC07a/*/reco/Pass3/*AliESDs.root Run:collision_system=”pp” and Run:stop<"2007-09-20 10:20:33" and Run:start>"2007-09-19" > pp.xml • the events should meet the following additional specifications • properly reconstructed vertex • vertex z position in between ±1 cm • … Loop over list of events grouped by GUID/file for the file collection specified by ‘pp.xml’. Run Event M. Oldenburg GridPP Metadata Workshop — July 4–72006, Oxford University 12
Summary and Outlook • System is fully setup and functional: • File Catalogue (with defined directory structure) exists and works • run and file level Metadata Catalogue (data fields) is defined and exists • event level metadata is defined, index builder is functional • all stages were tested and work properly • But… • no large scale tests yet • many tables/catalogues not filled yet (at least not automatically) • not enough simulation data to effectively stress test the system • Currently • large test production running • we start adding output files automatically to the file catalogue • overall system performance to be seen… M. Oldenburg GridPP Metadata Workshop — July 4–72006, Oxford University 13