1 / 24

I2-DSI Applications Workshop Julian Bunn/Caltech & CERN March 1999

The GIOD Project (Globally Interconnected Object Databases) For High Energy Physics A Joint Project between Caltech(HEP and CACR), CERN and Hewlett Packard http://pcbunn.cithep.caltech.edu/. I2-DSI Applications Workshop Julian Bunn/Caltech & CERN March 1999.

najila
Télécharger la présentation

I2-DSI Applications Workshop Julian Bunn/Caltech & CERN March 1999

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The GIOD Project(Globally Interconnected Object Databases)For High Energy PhysicsA Joint Project between Caltech(HEP and CACR), CERN and Hewlett Packardhttp://pcbunn.cithep.caltech.edu/ I2-DSI Applications Workshop Julian Bunn/Caltech & CERN March 1999

  2. CERN’s Large Hadron Collider- 2005 to >2025 • Biggest machine yet built: a proton-proton collider • Four experiments: ALICE, ATLAS, CMS, LHCb I2-DSI workshop: J.J.Bunn

  3. WorldWide Collaboration • CMS • >1700 physicists • 140 institutes • 30 countries • 100 Mbytes/sec from online systems • ~1 Pbyte/year raw data • ~1 Pbyte/year reconstructed data • Data accessible across the globe I2-DSI workshop: J.J.Bunn

  4. Online System HPSS HPSS Institute Institute Institute Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Data Distribution Model ~PBytes/sec ~100 MBytes/sec Offline Processor Farm ~10 TIPS There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size ~100 MBytes/sec CERN Computer Centre ~622 Mbits/sec or Air Freight (deprecated) USA Regional Centre ~1 TIPS France Regional Centre Italy Regional Centre Germany Regional Centre ~622 Mbits/sec Institute ~0.1TIPS Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels Data for these channels should be cached by the institute server Physics data cache ~1 MBytes/sec Physicist workstations I2-DSI workshop: J.J.Bunn

  5. Large Hadron Collider - Computing Models • Requirement: Computing Hardware, Network and Software systems to support timely and competitive analysis by a worldwide collaboration • Solution: Hierarchical networked ensemble of heterogeneous, data-serving and processing computing systems • Key technologies: • Object-Oriented Software • Object Database Management Systems (ODBMS) • Sophisticated middleware for query brokering (Agents) • Hierarchical Storage Management Systems • Networked Collaborative Environments I2-DSI workshop: J.J.Bunn

  6. The GIOD Project Goals • Build a small-scale prototype Regional Centre using: • Object Oriented software, tools and ODBMS • Large scale data storage equipment and software • High bandwidth LAN and WANs • Measure, evaluate and tune the components of the Centre for LHC Physics • Confirm the viability of the proposed LHC Computing Models • Use measurements of the prototype as input to simulations of realistic LHC Computing Models for the future I2-DSI workshop: J.J.Bunn

  7. Why ODBMS ? • OO programming paradigm • is the modern, industry direction • supported by C++, Java high level languages • excellent choice of both free and commercial class libraries • suits our problem space well: rich hierarchy of complex data types (raw data, tracks, energy clusters, particles, missing energy, time-dependent calibration constants) • Allows us to take full advantage of industry developments in software technology • Need to make some objects “persistent” • raw data • newly computed, useful, objects • Need an object store that supports an evolving data model and scales to many PetaBytes (1015 Bytes) • (O)RDBMS wont work: For one year’s data would need a virtual table with 109 rows and many columns • Require persistent heterogeneous object location transparency, replication • Multiple platforms, arrays of software versions, many applications, widely distributed in collaboration • Need to banish huge “logbooks” of correspondences between event numbers, run numbers, event types, tag information, file names, tape numbers, site names etc. I2-DSI workshop: J.J.Bunn

  8. ODBMS - choice of Objectivity/DB • Commercial ODBMS • embody hundreds of person-years of effort to develop • tend to conform to standards • offer rich set of management tools & language bindings • At least one (Objectivity/DB) - seems capable of handling PetaBytes. • Objectivity is the best choice for us right now • Very large databases can be created as “Federations” of very many smaller databases, which themselves are distributed and/or replicated amongst servers on the network • Features data replication and fault tolerance • I/O performance, overhead and efficiency are very similar to traditional HEP systems • OS support (NT, Solaris, Linux, Irix, AIX, HP-UX, etc..) • Language bindings (C++, Java, [C, SmallTalk, SQL++ etc.]) • Commitment to HEP as target business sector • Close relationships built up with the company, at all levels • Attractive licensing schemes for HEP I2-DSI workshop: J.J.Bunn

  9. HPSS Storage management - choice of HPSS • Need to “backend” the ODBMS • Using a large scale media management system • Because: • “Tapes” are still foreseen to be most cost effective • (May be DVDs in practice) • System reliability not enough to avoid “backup copies” • Unfortunately, large scale data archives are a niche market • HPSS is currently the best choice: • Appears scale into the PetaByte storage range • Heavy investment of CERN/Caltech/SLAC… effort to make HPSS evolve in directions suited for HEP • Unfortunately, only supported on a couple of platforms • A layer between the ODBMS and an HPSS filesystem has been developed: it is interfaced to Objectivity’s Advanced Multithreaded Server. This is one key to development of the system middleware. I2-DSI workshop: J.J.Bunn

  10. ODBMS worries • Bouyancy of the commercial marketspace? • Introduction of Computer Associates “Jasmine” pure ODBMS (targetted at multimedia data) • Oracle et al. paying lip-service to OO with Object features “bolted on” to their fundamentally RDBMS technology • Breathtaking fall of Versant stock! • Still no IPO for Objectivity • Conversion of “legacy” ODBMS data from one system to another? • 100 PetaBytes via an ODMG-compliant text file?! • Good argument for keeping raw data outside the ODBMS, in simple binary files (BUT doubles storage needs) I2-DSI workshop: J.J.Bunn

  11. Track Hit Detector Federated Database - Views of the Data I2-DSI workshop: J.J.Bunn

  12. Wavelength  Wavelength  ODBMS tests • Developed simple scaling application: matching 1000s of sky objects at different wavelengths • Runs entirely in cache (can neglect disk I/O performance), applies matching algorithm between pairs of objects in different databases. • Looked at usability, efficiency and scalability for • number of objects • location of objects • object selection mechanism • database host platform • Results • Application is platform independent • Database is platform independent • No performance loss for remote client • Fastest access: objects are “indexed” • Slowest: using predicates Database 1 Database 2 Match using indexes, predicates or cuts I2-DSI workshop: J.J.Bunn

  13. Other Tests: Looked at Java binding performance (~3 times slower) Created federated database in HPSS managed storage, using NFS export Tested database replication from CERN to Caltech ODBMS tests I2-DSI workshop: J.J.Bunn

  14. Caltech Exemplar used as a convenient testbed for Objy multiple-client tests Results: Exemplar very well suited for this workload. With two (of four) node filesystems it was possible to utilise 150 processors in parallel with very high efficiency. Outlook: expect to utilise all processors with near 100% efficiency when all four filesystems are engaged. ODBMS tests • Evaluated usability and performance of Versant ODBMS, Objectivity’s main competitor. • Results: Versant a decent “fall-back” solution for us I2-DSI workshop: J.J.Bunn

  15. GIOD - Database of “real” LHC events • Would like to evaluate system performance with realistic Event objects • Caltech/HEP submitted a successful proposal to NPACI to generate ~1,000,000 fully-simulated multi-jet QCD events • Directly study Higgs  backgrounds for first time • Computing power of Caltech’s 256-CPU (64 Gbyte shared memory) HP-Exemplar makes this possible in ~few months • Event production on the Exemplar since May ‘98 ~ 1,000,000 events of 1 MByte. • Used by GIOD as copious source of “raw” LHC event data • Events are analysed using Java Analysis Studio and “scanned” using a Java applet I2-DSI workshop: J.J.Bunn

  16. Large data transfer over CERN-USA link to Caltech Try one file ... Let it rip HPSS fails Tidy up ... • Transfer of ~31 GBytes of Objectivity databases from Shift20/CERN to HPSS/Caltech • Achieved ~11 GBytes/day (equivalent to ~4 Tbytes/year, equivalent to 1 Pbyte/year on a 622 Mbits/sec link) • HPSS hardware problem at Caltech , not network, caused transfer to abort I2-DSI workshop: J.J.Bunn

  17. GIOD - Database Status • Over 200,000 fully simulated di-jet events in the database • Population continuing using parallel jobs on the Exemplar (from a pool of over 1,000,000 events) • Building the TAG database • For optimising queries, each event is summarised by a small object, shown opposite, that contains the salient features. • The TAG objects are kept in a dedicated database, which is replicated to client machines • Preparing for WAN test with SDSC • Preparing for HPSS/AMS installation and tests • For MONARC: Making a replica at Padua/INFN (Italy) I2-DSI workshop: J.J.Bunn

  18. 10 GByte 80 GByte HPSS Exemplar SPP2000 SDSC IBM SparcStation 20 GIOD - WAN/LAN Tests Temporary FDB Parallel CMSOO Production Jobs 155 Mbits/s ATM 155 Mbits/s ATM HP C200 Ethernet OC12/622 Mbits/s to San Diego SuperComputing Center (SDSC) FORE ATM switch DB files (ftp) ATM AIX Master FDB Ethernet IBM SP2 Oofs traffic Clone FDB Tapes I2-DSI workshop: J.J.Bunn

  19. 622 Mbits/s FNAL 2.106 MIPS 200 Tbyte Robot Desk tops 622 Mbits/s Desk tops Caltech n.106MIPS 50 Tbyte Robot Optional Air Freight 622 Mbits/s CERN 107 MIPS 200 Tbyte Robot Desk tops 622Mbits/s MONARC - Models Of Networked Analysis At Regional Centers Caltech, CERN, FNAL, Heidelberg, INFN, KEK, Marseilles, Munich, Orsay, Oxford, Tufts, … • GOALS • Specify the main parameters characterizing the Model’s performance: throughputs, latencies • Determine classes of Computing Models feasible for LHC (matched to network capacity and data handling resources) • Develop “Baseline Models” in the “feasible” category • Verify resource requirement baselines: (computing, data handling, networks) • REQUIRES • Define the Analysis Process • Define Regional Centre Architectures • Provide Guidelines for the final Models I2-DSI workshop: J.J.Bunn

  20. Created to aid in Track Fitting algorithm development Fetches objects directly from the ODBMS Java binding to the ODBMS very convenient to use JavaCMS - 2D Event Viewer Applet I2-DSI workshop: J.J.Bunn

  21. CMSOO - Java 3D Applet • Attaches to any GIOD database and allows to view/scan all events in the federation, at multiple detail levels • Demonstrated at the Internet-2 meeting in San Francisco in Sep’98 and at SuperComputing’98 in Florida at the iGrid, NPACI and CACR stands • Running on a 450 MHz HP “Kayak” PC with fx4 graphics card: excellent frame rates in free rotation of a complete event (~ 5 times performance of Riva TNT) • Developments:“Drill down” into the database for picked objects, Refit tracks I2-DSI workshop: J.J.Bunn

  22. Java Analysis Studio public void processEvent(final EventData d) { final CMSEventData data = (CMSEventData) d; final double ET_THRESHOLD = 15.0; Jet jets[] = new Jet[2]; Iterator jetItr = (Iterator) data.getObject("Jet"); if(jetItr == null) return; int nJets = 0; double sumET = 0.; FourVectorRecObj sum4v = new FourVectorRecObj(0.,0.,0.,0.); while(jetItr.hasMoreElements()) { Jet jet = (Jet) jetItr.nextElement(); sum4v.add(jet); double jetET = jet.ET(); sumET += jetET; if(jetET > ET_THRESHOLD) { if(nJets <= 1) { jets[nJets] = jet; nJets++; } } } njetHist.fill( nJets ); if(nJets >= 2) { // dijet event! FourVectorRecObj dijet4v = jets[0]; dijet4v.add( jets[1] ); massHist.fill( dijet4v.get_mass() ); sumetHist.fill( sumET ); missetHist.fill( sum4v.pt() ); et1vset2Hist.fill( jets[0].ET(), jets[1].ET() ); } } I2-DSI workshop: J.J.Bunn

  23. GIOD - Summary • LHC Computing models specify • Massive quantities of raw, reconstructed and analysis data in ODBMS • Distributed data analysis at CERN, Regional Centres and Institutes • Location transparency for the end user • GIOD is investigating • Usability, scalability, portability of Object Oriented LHC codes • In a hierarchy of large-servers, and medium/small client machines • With fast LAN and WAN connections • Using realistic raw and reconstructed LHC event data • GIOD has • Constructed a large set of fully simulated events and used these to create a large OO database • Learned how to create large database federations • Developed prototype reconstruction and analysis codes that work with persistent objects • Deployed facilities and database federations as useful testbeds for Computing Model studies I2-DSI workshop: J.J.Bunn

  24. GIOD - Interest in I2-DSI • LHC Computing: timely access to powerful resources • Measure the prevailing network conditions • Predict and manage the (short term) future conditions • Implement QoS with policies on end to end links, • Provide for movement of large datasets • Match the Network, Storage, and Compute resources to the needs • Synchronize their availability in real time • Overlay the distributed, tightly coupled ODBMS on a loosely-coupled set of heterogeneous servers on the WAN • Potential Areas of Research with I2-DSI • Test ODBMS replication • Burst mode, using I2 backbones up to the Gbits/sec range • Experiment with data “localization” strategies • Roles of caching, mirroring, channeling • Interaction with Objectivity/DB • Experiment with policy-based resource allocationstrategies • Evaluate Autonomous Agent Implementations I2-DSI workshop: J.J.Bunn

More Related