1 / 16

Grid Data Management

Pevent Obj. Pcalo Region. PMDT _Detector. Ptruth Vertex. Ptruth Track. PMDT _Digit. Pcalo Digit. PEventObj Vector. PsiDetector. PEvent. PSiDigit. b. b. b. b. b. b. b. b. b. b. b. To Objects. Grid Data Management. Introduction Physics Analysis Data Hierarchy

emilyq
Télécharger la présentation

Grid Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pevent Obj Pcalo Region PMDT _Detector Ptruth Vertex Ptruth Track PMDT _Digit Pcalo Digit PEventObj Vector PsiDetector PEvent PSiDigit b b b b b b b b b b b To Objects Grid Data Management • Introduction • Physics Analysis • Data Hierarchy • GRID Services • Virtual Data Scenario • GRID Data Management • Service Graph • Development Tools • Unified Modelling Language • Compiler Efficiency • Database Access Benchmark • System Monitoring Prototype From Files Tony Doyle - University of Glasgow

  2. Analysis Object Data Analysis Object Data Analysis Object Data AOD Physics Analysis ESD: Data or Monte Carlo Event Tags Tier 0,1 Collaboration wide Event Selection Calibration Data Analysis, Skims INCREASINGDATAFLOW Raw Data Tier 2 Analysis Groups Physics Objects Physics Objects Physics Objects Tier 3, 4 Physicists Physics Analysis Tony Doyle - University of Glasgow

  3. ESD Pseudo-physical information: Clusters, track candidates (electrons, muons), etc. Reconstructed information ~100 kB/event Physical information: Transverse momentum, Association of particles, jets, (best) id of particles, Physical info for relevant “objects” AOD Selected information ~10 kB/event Analysis information TAG Relevant information for fast event selection ~1 kB/event Data Hierarchy “RAW, ESD, AOD, TAG” RAW Recorded by DAQ Triggered events ~2 MB/event Detector digitisation Tony Doyle - University of Glasgow

  4. GRID Services • Grid Services • Resource Discovery • Scheduling • Security • Monitoring • Data Access • Policy • Athena/Gaudi Services • Application manager • “Job Options” service • Event persistency service • Detector persistency • Histogram service • User interfaces • Visualization • Database • Event model • Object federations Extensible interfaces and protocols being specified and developed: Tools: 1. UML 2. Java Protocols: 1. XML 2. MySQL DataGRID Toolkit 3. LDAP } Tony Doyle - University of Glasgow

  5. GRID Data Management !!! Virtual Data Scenario • Example analysis scenario: • Physicist issues a query from Athena for a Monte Carlo dataset • Issues: • How expressive is this query? • What is the nature of the query: declarative • Creating new queries and language • Algorithms are already available in local shared libraries • An Athena service consults an ATLAS Virtual Data Catalog • Consider possibilities: • TAG file exists on local machine (e.g. Glasgow) • Analyze it • ESD file exists in a remote store (e.g. Edinburgh) • Access relevant event files, then analyze that • RAW File no longer exists (e.g. RAL) • Regenerate, re-reconstruct, re-analyze Tony Doyle - University of Glasgow

  6. UK H i g h L e v e l S e r v i c e s Q u e r y O p t i m i s a t i o n & R e p l i c a M a n a g e r A c c e s s P a t t e r n M a n a g . M e d i u m L e v e l S e r v i c e s D a t a M o v e r D a t a A c c e s s o r D a t a L o c a t o r C o r e S e r v i c e s S t o r a g e M a n a g e r M e t a D a t a M a n a g e r C a s t o r H P S S L o c a l F i l e s y s t e m S e c u r e R e g i o n GRID Data Management • Goal: develop middle-ware infrastructure to manage petabyte-scale data Identify Key Areas Within Software Structure Service levels reasonably well defined Tony Doyle - University of Glasgow

  7. Identifying Key Areas • 5 areas for development • Data Accessor - hides specific storage system requirements. Mass Storage Management group. • Replication - improves access by wide-area caching. Globus toolkit offers sockets and a communication library, Nexus. • Meta Data Management - data catalogues, monitoring information (e.g. access pattern), grid configuration information, policies. MySQL over Lightweight Directory Access Protocol (LDAP) being investigated. • Security - ensuring consistent levels of security for data and meta data. • Query optimisation - “cost” minimisation based on response time and throughputMonitoring Services group. RAL Identifiable UK Contributions RAL Tony Doyle - University of Glasgow

  8. Identifying Key Areas • 4 tasks defined in current UK WP2 • Service Discovery - locate grid services (Wolfgang Hoschek, Gavin McCance +...) • SQL Database Service - store, query and retrieve metadata (Wolfgang Hoschek, Gavin McCance +...) • Query Optimisation - “cost” model (Kurt Stockinger +…) • Data Mining - semi-automatic discovery of events patterns, associations and anomalies: Grid metadata and HEP applications UK UK + CERN = UK++ Tony Doyle - University of Glasgow

  9. sds.padova-infn.it sds.anl.gov sds.trieste-infn.it sds.infn.it sds.ral.uk sds.cern.ch sds.bologna-infn.it Service Graph Allowed? Hierarchical Model All nodes “Grid Aware” Optimisation? - combine all info on nodes from e.g. ScotGRID locally and advertise via Globus Tony Doyle - University of Glasgow

  10. Unified Modelling Language • Standard method to define the architecture = UML • Standard tool = TogetherSoft? Free for academic use. Runs under linux. DB Driver for MySQL under linux? “I tried to generate an import/export module for MySQL under linux by copying the db2 .config file and replacing the various column types by the ones that are available in MySQL. This works apart from the fact that the primary key generation fails and a schema is generated (which MySQL doesn't support). The Access97 type of primary key generation is fine for MySQL. I have seen that Access uses a specialized DB import/export class. How can I generate one for MySQL?” Determine correct tools by testing.. Tony Doyle - University of Glasgow

  11. Compiler Efficiency Numerically intensive simulations: • Minimal input and output data • ATLAS Monte Carlo (gg H bb) 228 sec/3.5 Mb event on 800 MHz linux box Compiler Tests: LINPACK Compiler Speed (MFlops) Fortran (g77) 27 C (gcc) 43 Java (jdk) 41 Industry Standard Compilers +OO Methods Tony Doyle - University of Glasgow

  12. System Monitoring Prototype http://ppewww.ph.gla.ac.uk/~skilli/grid1.html Tools: 1. Linux Kernel Info = /proc/stat 2. Enquire = Java client-server 3. Histograms = Java Analysis Studio 4. TCP/IP = Local WAN Instantaneous CPU Usage Scalable Architecture Individual Node Info. Tony Doyle - University of Glasgow

  13. ping ping WAN service monitor ping LAN Industrial Partnership Adoption of OPEN Industry Standards +OO Methods Monitoring Tools Exist  Standard? Research Council Industry Tony Doyle - University of Glasgow

  14. System Monitoring Prototype user nicesystemidle cpu 46960715938237646044637 disk 51306 0 0 0 disk_rio 11002 0 0 0 disk_wio 40304 0 0 0 disk_rblk 87872 0 0 0 disk_wblk 322378 0 0 0 page 29693 49417 swap 33 1447 intr 18916942 7339601 27941 0 2 2 0 3 0 1 0 9331361 0 869060 1 619454 729516 0 0 ctxt 62664003 btime 984922120 processes 107015 Instantaneous CPU, disk, memory Input from /proc/stat Individual Node Info. is input to single Grid node Combined Info into e.g. distributed MySQL database “Why start here?” Need well-understood simple system to start tests and calibrate commercially available solutions. Tony Doyle - University of Glasgow

  15. Database Access Benchmark Many applications require database functionality • e.g. MySQL database daemon • Basic 'crash-me' and associated tests • Access times for basic insert, modify, delete, update database operations • e.g. (on 256Mbyte, 800MHz Red Hat 6.2 linux box) 350k data insert operations 149 seconds 10k query operations 97 seconds Currently favoured HEP DataBase application e.g. BaBar, ZEUS software Tony Doyle - University of Glasgow

  16. Grid Disk Grid Memory GridEvent Obj Grid CPU Grid Network Memory Digit CPU Digit Disk Digit Network Digit GridEvent Obj Vector GridEvent b b b b b b b b b b b Teamwork WP2 - Open Issues From Files • Many… Early Days • Working Standards? • Scope Of UK Contribution • Service Discovery • SQL Database Service • Query Optimisation • Data Mining • Development Tools? • UMLTogetherSoft • Database MySQL • GDMP • System Monitoring Standard • Grid-Enabled Files Objects.. • Input/Contributions welcome…. To Objects Tony Doyle - University of Glasgow

More Related