Efficient Event Catalog for Distributed User Analysis in Grid Environment

The Grid Collector: Using an Event Catalog to Speed up User Analysis in Distributed Environment Wei-Ming Zhang Kent State University Kesheng Wu, Arie Shoshani Lawrence Berkeley National Laboratory Victor Perevoztchikov, Jerome Lauret Brookhaven National Laboratory CHEP 2004

Problem to resolveA View of a Typical Analysis Process • Users want to analyze “some” (not all) events • Events are stored in millions of files • Files are distributed on many storage systems • To perform an analysis, a user needs to • Prepare an analysis • Write the analysis code • Specify the events of interest • Run an analysis • Locate the files containing the events of interest • Prepare disk space for the files • Transfer the files to the disks • Recover from any errors • Read the events of interest from files • Remove the files CHEP 2004

Run an Analysis • Locate the files containing the events of interest • Need a catalog over events • Prepare disk space for the files • Need a component to manage space automatically • Transfer the files to the disks • Need to automate file streaming into available space • Need automated transfers from mass storage • Recover from any errors • Need automatic recovery from transient failures • Read the events of interest from files • Need automatic iteration over events • Remove the files • Need automatic garbage collection Alternative: static disk population, not optimal for large Datasets or constrained resources CHEP 2004

Design Goals of Grid Collector Primary goal Make analysts more productive by • Allowing to specify events of interest using meaningful physical quantities • numberOfPrimaryTracks > 1000 AND SumOfPt > 20 • Reading only events of interest • Automating the management of distributed files and disks Secondary goals • Working with the existing ROOT based analysis framework • Using a minimal amount of storage • Benefiting a majority of the users CHEP 2004

Components of the Grid Collector Legend: red– new components, purple– existing components • Locate the files containing the events of interest • Event Catalog, file & replica catalogs • Prepare disk space and transfer • Prepare disk space for the files • Disk Resource Manager (DRM) • Transfer the files to the disks • Hierarchical Resource Manager (HRM) to access HPSS • On-demand transfers from HRM to DRM • Recover from any errors • HRM recovers from HPSS failures • DRM recovers from network transfer failures (Track 4, 344 for SRM/HRM/DRM usage) • Read the events of interest from files • Event Iterator with fast forward capability • Remove the files • DRM performs garbage collection using pinning and lifetime Consistent with otherSRM based strategies and tools CHEP 2004

Grid Collector: Architecture Servers Clients Grid Collector Administrator Index Builder In: STAR tag file Out: bitmap index Replica Catalog File Locator In: logical name, Out: physical location Fetch tag file Load subset Rollback Commit Replica Catalog Event Catalog In: conditions Out: logical files, event IDs File Scheduler In: physical file Analysis code New query Event iterator HRM 1 DRM NFS, local disk HRM 2 CHEP 2004

Event Catalog • The Event Catalog is built from the information in the “tag files”. Arbitrary/user defined, arranged as arrays • e.g., Run #, Event #, production time, sum of Pt… • The Event Catalog (EC) also contains persistent logical file names for each event • In STAR, the EC persistent logical file names are composed of the tag file name and the production tag • As a tag file is registered with the File Catalog, its content can be also placed in the Event Catalog • Main operations to build the Event Catalog include: fetch tag files, load new events into the EC, roll back and commit CHEP 2004

Performance of Event Catalog • The Event Catalog uses compressed bitmap indices • The most commonly used index is B-tree • The most efficient one is often the projection index • The following table reports the size and the average query processing time • 1-attribute, 2-attribute, and 5-attribute refer to the number of attributes in a query • Compressed bitmap indices are about half the size of B-trees, and are 10 times faster • Compressed bitmap indices are larger than projection indices, but are 3 times faster CHEP 2004

Event Catalog is Fast Log-log plot of query processing time for different size queries The compressed bitmap index is at least 10X faster than B-tree and 3X faster than the projection index CHEP 2004

Grid Collector :Works with Remote Clients • Main servers • Grid Collector for coordination and query interpretation • File Catalogs for locating multiple copies of files • HRM for managing storage sites including HPSS • Client side requires a DRM for managing local disk storage • An Event Iterator can access local files with or without a DRM • Multiple Event Iterators can work on the same set of events Servers Clients CHEP 2004

GC Requires Minimal Disk Space • The Event Catalog has the same information as the tag files plus some extra information • Since the tags files are much smaller than others, the size of the Event Catalog is also relatively small • Time to build the catalog is much less than the time to generate the tag files • For 13 million events in a 62 GeV production (STAR 2004), • Event Catalog size: 27 GB (tags: 6.0 GB, MuDST: 4.1 TB, event: 8.6 TB, raw: 14.6 TB) • Production time: 3.5 months, 300+ CPUs • Time to build the catalog: 5 days on one machine CHEP 2004

Grid Collector Speeds up Analyses • Test machine: 2.8 GHz Xeon, 27 MB/s read speed • Without Grid Collector, an analysis typically reads all events of a run • Speedup = time to read all events in a run / time to read selected events with Grid Collector • Using Grid Collector is preferred, speedup ≥ 1 • When searching for rare events, say, selecting one event out of 1000, using GC is 20 to 50 times faster • Using GC to read 1/2 of events, speedup > 1.5, 1/10 events, speed up > 2. CHEP 2004

Searching for anti-3He Lee Barnby, Birmingham Previous studies identified collision events that possibly contain anti-3He, need further analysis Searching for strangelet Aihong Tang, BNL Previous studies identified events that behave close to strangelets, need further investigation Success Stories • Without Grid Collector, one has to retrieve every file from mass storage systems and scan them for the wanted events – may take weeks or months • With Grid Collector, both jobs completed within a day CHEP 2004

Main Benefits of Using Grid Collector • If you gather statistics on lots of events • Grid Collector allows you to work with files not already on disk • If you search for rare events, Grid Collector allows you to • Specify the events with ease • Access only relevant files • Read only selected events • If you want to try some analysis ideas outside of the main computer centers, • Grid Collector allows you to select the wanted events easily, and manages file and space for you CHEP 2004

How To Use Grid Collector • In STAR, use of an abstract interface StIOMaker • StIOMaker can now handle all files including MuDST • StIOMaker uses StFile (interface) -> StFileI (implementation) • Replace StFileI with StGridCollector • StIOMaker requires a StFile object • One currently uses “new StFile(…)” to create a StFileI object (default mode) • Grid Collector provides a new way, “StGridCollector::Create(SELECT geant, event WHERE …)” • Iterate through events as usual CHEP 2004

How To Select Events • Basic syntax • SELECT [MuDst|event|…] WHERE SumOfPt > 10 AND chargedMultiplicity > 300 AND… • The SELECT clause identifies the type of files to analyze • The WHERE clause defines the events of interest • The WHERE clause consists of range conditions joined with logical operators AND, OR, NOT. • All tags and a few MetaData Catalog key words can be used in the WHERE clause • Variables with multiple values can be addressed with index, e.g., scaAnalysisMatrix[7] CHEP 2004

Related Work Initial concept of Grid Collector was developed in Storage Access Coordination System (STACS) • STACS was a monolithic system Other projects with similar objectives or similar vision • PROOF: parallel ROOT, designed for tight clusters, require files on disk • JAS3: use a dataset catalog • ARDA: jobs are specified in terms of files • Most of these either reads all events in a file or requires manual file management CHEP 2004

Summary • Grid Collector works with the existing ROOT based analysis framework to speed up analysis jobs • It is efficient and requires minimal addition storage • It automatically retrieves files from remote mass storage • Analysis tasks read only the selected events • Software status • Currently in use by STAR users (testers) • Capable of indexing all new events as they are produced • Contact information • John Wu John.Wu@nersc.gov • Jerome Lauret lauret@bnl.gov • Wei-Ming Zhang zhang@hpaq.kent.edu CHEP 2004

Efficient Event Catalog for Distributed User Analysis in Grid Environment