1 / 43

Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

U.S. Physics Data Grid Projects. Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ avery@phys.ufl.edu. International Workshop on HEP Data Grids Kyungpook National University, Daegu, Korea Nov. 8-9, 2002. “Trillium”: US Physics Data Grid Projects.

paloma
Télécharger la présentation

Paul Avery University of Florida phys.ufl/~avery/ avery@phys.ufl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. U.S. Physics Data Grid Projects Paul Avery University of Florida http://www.phys.ufl.edu/~avery/ avery@phys.ufl.edu International Workshop on HEP Data Grids Kyungpook National University, Daegu, KoreaNov. 8-9, 2002 Paul Avery

  2. “Trillium”: US Physics Data Grid Projects • Particle Physics Data Grid (PPDG) • Data Grid for HENP experiments • ATLAS, CMS, D0, BaBar, STAR, JLAB • GriPhyN • Petascale Virtual-Data Grids • ATLAS, CMS, LIGO, SDSS • iVDGL • Global Grid lab • ATLAS, CMS, LIGO, SDSS, NVO • Data intensive expts. • Collaborations of physicists & computer scientists • Infrastructure development & deployment • Globus + VDT based = Paul Avery

  3. Why Trillium? • Many common aspects • Large overlap in project leadership • Large overlap in participants • Large overlap in experiments, particularly LHC • Common projects (monitoring, etc.) • Common packaging • Common use of VDT, other GriPhyN software • Funding agencies like collaboration • Good working relationship on grids between NSF and DOE • Good complementarity: DOE (labs), NSF (universities) • Collaboration of computer science/physics/astronomy encouraged • Organization from the “bottom up” • With encouragement from funding agencies Paul Avery

  4. Driven by LHC Computing Challenges • Complexity: Millions of detector channels, complex events • Scale: PetaOps (CPU), Petabytes (Data) • Distribution: Global distribution of people & resources 1800 Physicists 150 Institutes 32 Countries Paul Avery

  5. Korea Russia UK USA Tier2 Center Tier2 Center Tier2 Center Tier2 Center Institute Institute Institute Institute Global LHC Data Grid Experiment (e.g., CMS) Tier0/( Tier1)/( Tier2) ~ 1:1:1 Online System 100-200 MBytes/s CERN Computer Center > 20 TIPS Tier 0 2.5 Gbits/s Tier 1 2.5 Gbits/s Tier 2 ~0.6 Gbits/s Tier 3 0.1 - 1 Gbits/s Physics cache Tier 4 PCs, other portals Paul Avery

  6. Router Data Server LHC Tier2 Center (2001) “Flat” switching topology FEth/GEthSwitch 20-60 nodesDual 0.8-1 GHz, P31 TByte RAID WAN >1 RAID Paul Avery

  7. Switch Switch Router Data Server LHC Tier2 Center (2002-2003) “Hierarchical” switching topology GEthSwitch GEth/FEth Switch GEth/FEth 40-100 nodesDual 2.5 GHz, P42-4 TBytes RAID WAN >1 RAID Paul Avery

  8. LHC Hardware Cost Estimates • Buy late, but not too late: phased implementation • R&D Phase 2001-2004 • Implementation Phase 2004-2007 • R&D to develop capabilities and computing model itself • Prototyping at increasing scales of capability & complexity 1.1 years 1.4 years 2.1 years 1.2 years Paul Avery

  9. Particle Physics Data Grid “In coordination with complementary projects in the US and Europe, PPDG aims to meet the urgent needs for advanced Grid-enabled technology and to strengthen the collaborative foundations of experimental particle and nuclear physics.” Paul Avery

  10. PPDG Goals • Serve high energy & nuclear physics (HENP) experiments • Funded 2001 – 2004 @ US$9.5M (DOE) • Develop advanced Grid technologies • Use Globus to develop higher level tools • Focus on end to end integration • Maintain practical orientation • Networks, instrumentation, monitoring • DB file/object replication, caching, catalogs, end-to-end movement • Serve urgent needs of experiments • Unique challenges, diverse test environments • But make tools general enough for wide community! • Collaboration with GriPhyN, iVDGL, EDG, LCG • Recent work on ESNet Certificate Authority Paul Avery

  11. PPDG Participants and Work Program • Physicist + CS involvement • D0, BaBar, STAR, CMS, ATLAS • SLAC, LBNL, Jlab, FNAL, BNL, Caltech, Wisconsin, Chicago, USC • Computer Science Program of Work • CS1: Job description language • CS2: Schedule, manage data processing, data placement activities • CS3: Monitoring and status reporting (with GriPhyN) • CS4: Storage resource management • CS5: Reliable replication services • CS6: File transfer services • CS7: Collect/document experiment practices  generalize • … • CS11: Grid-enabled data analysis Paul Avery

  12. GriPhyN = App. Science + CS + Grids • Participants • US-CMS High Energy Physics • US-ATLAS High Energy Physics • LIGO/LSC Gravity wave research • SDSS Sloan Digital Sky Survey • Strong partnership with computer scientists • Design and implement production-scale grids • Develop common infrastructure, tools and services (Globus based) • Integration into the 4 experiments • Broad application to other sciences via “Virtual Data Toolkit” • Strong outreach program • Funded by NSF for 2000 – 2005 • R&D for grid architecture (funded at $11.9M +$1.6M) • Integrate Grid infrastructure into experiments through VDT Paul Avery

  13. GriPhyN: PetaScale Virtual-Data Grids Production Team Individual Investigator Workgroups ~1 Petaflop ~100 Petabytes Interactive User Tools Request Planning & Request Execution & Virtual Data Tools Management Tools Scheduling Tools Resource Other Grid • Resource • Security and • Other Grid Security and Management • Management • Policy • Services Policy Services Services • Services • Services Services Transforms Distributed resources(code, storage, CPUs,networks) Raw data source Paul Avery

  14. GriPhyN Research Agenda • Based on Virtual Data technologies (fig.) • Derived data, calculable via algorithm • Instantiated 0, 1, or many times (e.g., caches) • “Fetch value” vs “execute algorithm” • Very complex (versions, consistency, cost calculation, etc) • LIGO example • “Get gravitational strain for 2 minutes around each of 200 gamma-ray bursts over the last year” • For each requested data value, need to • Locate item location and algorithm • Determine costs of fetching vs calculating • Plan data movements & computations required to obtain results • Schedule the plan • Execute the plan Paul Avery

  15. Fetch item Virtual Data Concept Major facilities, archives • Data request may • Compute locally • Compute remotely • Access local data • Access remote data • Scheduling based on • Local policies • Global policies • Cost Regional facilities, caches Local facilities, caches Paul Avery

  16. iVDGL: A Global Grid Laboratory “We propose to create, operate and evaluate, over asustained period of time, an international researchlaboratory for data-intensive science.” From NSF proposal, 2001 • International Virtual-Data Grid Laboratory • A global Grid laboratory (US, EU, Asia, South America, …) • A place to conduct Data Grid tests “at scale” • A mechanism to create common Grid infrastructure • A laboratory for other disciplines to perform Data Grid tests • A focus of outreach efforts to small institutions • U.S. part funded by NSF (2001 – 2006) • $14.1M (NSF) + $2M (matching) • International partners bring own funds Paul Avery

  17. iVDGL Participants • Initial experiments (funded by NSF proposal) • CMS, ATLAS, LIGO, SDSS, NVO • Possible other experiments and disciplines • HENP: BTEV, D0, CMS HI, ALICE, … • Non-HEP: Biology, … • Complementary EU project: DataTAG • DataTAG and US pay for 2.5 Gb/s transatlantic network • Additional support from UK e-Science programme • Up to 6 Fellows per year • None hired yet  Paul Avery

  18. iVDGL Components • Computing resources • Tier1 laboratory sites (funded elsewhere) • Tier2 university sites  software integration • Tier3 university sites  outreach effort • Networks • USA (Internet2, ESNet), Europe (Géant, …) • Transatlantic (DataTAG), Transpacific, AMPATH, … • Grid Operations Center (GOC) • Indiana (2 people) • Joint work with TeraGrid on GOC development • Computer Science support teams • Support, test, upgrade GriPhyN Virtual Data Toolkit • Coordination, management Paul Avery

  19. DataTAG TeraGrid EDG LCG? Asia BTEV ALICE Bio Geo ? D0 PDC CMS HI ? iVDGL Management and Coordination U.S. Piece US ProjectDirectors International Piece US External Advisory Committee Collaborating Grid Projects US Project Steering Group Facilities Team Core Software Team Operations Team Project Coordination Group Applications Team GLUE Interoperability Team Outreach Team Paul Avery

  20. iVDGL Work Teams • Facilities Team • Hardware (Tier1, Tier2, Tier3) • Core Software Team • Grid middleware, toolkits • Laboratory Operations Team • Coordination, software support, performance monitoring • Applications Team • High energy physics, gravity waves, virtual astronomy • Nuclear physics, bioinformatics, … • Education and Outreach Team • Web tools, curriculum development, involvement of students • Integrated with GriPhyN, connections to other projects • Want to develop further international connections Paul Avery

  21. Tier1 Tier2 Tier3 US-iVDGL Data Grid (Sep. 2001) SKC Boston U Wisconsin PSU BNL Argonne Fermilab J. Hopkins Indiana Hampton Caltech UCSD/SDSC UF Brownsville Paul Avery

  22. Tier1 Tier2 Tier3 US-iVDGL Data Grid (Dec. 2002) SKC Boston U Wisconsin Michigan PSU BNL Fermilab LBL Argonne J. Hopkins NCSA Indiana Hampton Caltech Oklahoma Vanderbilt UCSD/SDSC FSU Arlington UF FIU Brownsville Paul Avery

  23. Possible iVDGL Participant: TeraGrid 13 TeraFlops Site Resources Site Resources 26 HPSS HPSS 4 24 External Networks External Networks 8 5 Caltech Argonne 40 Gb/s External Networks External Networks NCSA/PACI 8 TF 240 TB SDSC 4.1 TF 225 TB Site Resources Site Resources HPSS UniTree Paul Avery

  24. International Participation • Existing partners • European Data Grid (EDG) • DataTAG • Potential partners • Korea T1 • China T1? • Japan T1? • Brazil T1 • Russia T1 • Chile T2 • Pakistan T2 • Romania ? Paul Avery

  25. Current Trillium Work • Packaging technologies: PACMAN • Used for VDT releases  very successful & powerful • Evaluated for Globus, EDG • GriPhyN Virtual Data Toolkit 1.1.3 released • Vastly simplifies installation of grid tools • New changes will further simplify configuration complexity • Monitoring (joint efforts) • Globus MDS 2.2 (GLUE schema) • Caltech MonaLisa • Condor HawkEye • Florida Gossip (low level component) • Chimera Virtual Data System (more later) • Testbeds, demo projects (more later) Paul Avery

  26. Virtual Data: Derivation and Provenance • Most scientific data are not simple “measurements” • They are computationally corrected/reconstructed • They can be produced by numerical simulation • Science & eng. projects are more CPU and data intensive • Programs are significant community resources (transformations) • So are the executions of those programs (derivations) • Management of dataset transformations important! • Derivation: Instantiation of a potential data product • Provenance: Exact history of any existing data product Programs are valuable, like data.They should be community resources Paul Avery

  27. Motivations (1) “I’ve found some interesting data, but I need to know exactly what corrections were applied before I can trust it.” “I’ve detected a mirror calibration error and want to know which derived data products need to be recomputed.” Data consumed-by/ generated-by product-of Derivation Transformation execution-of “I want to search a database for dwarf galaxies. If a program that performs this analysis exists, I won’t have to write one from scratch.” “I want to apply a shape analysis to 10M galaxies. If the results already exist, I’ll save weeks of computation.” Paul Avery

  28. Motivations (2) • Data track-ability and result audit-ability • Universally sought by GriPhyN applications • Facilitates tool and data sharing and collaboration • Data can be sent along with its recipe • Repair and correction of data • Rebuild data products—c.f., “make” • Workflow management • A new, structured paradigm for organizing, locating, specifying, and requesting data products • Performance optimizations • Ability to re-create data rather than move it Paul Avery

  29. “Chimera” Virtual Data System • Virtual Data API • A Java class hierarchy to represent transformations & derivations • Virtual Data Language • Textual for people & illustrative examples • XML for machine-to-machine interfaces • Virtual Data Database • Makes the objects of a virtual data definition persistent • Virtual Data Service (future) • Provides a service interface (e.g., OGSA) to persistent objects Paul Avery

  30. Virtual Data Catalog Object Model Paul Avery

  31. Logical Physical Chimera as a Virtual Data System • Virtual Data Language (VDL) • Describes virtual data products • Virtual Data Catalog (VDC) • Used to store VDL • Abstract Job Flow Planner • Creates a logical DAG (dependency graph) • Concrete Job Flow Planner • Interfaces with a Replica Catalog • Provides a physical DAG submission file to Condor-G • Generic and flexible • As a toolkit and/or a framework • In a Grid environment or locally • Currently in beta version VDC AbstractPlanner XML XML VDL DAX ReplicaCatalog ConcretePlanner DAG DAGMan Paul Avery

  32. Galaxy cluster size distribution Chimera Virtual Data System + GriPhyN Virtual Data Toolkit + iVDGL Data Grid (many CPUs) Chimera Application: SDSS Analysis Size distribution of galaxy clusters? Paul Avery

  33. Wisconsin Fermilab Caltech UCSD Florida US-CMS Testbed Paul Avery

  34. Wisconsin Fermilab Caltech UCSD Florida Other CMS Institutes Encouraged to Join • Expressions of interest • Princeton • Brazil • South Korea • Minnesota • Iowa • Possibly others Paul Avery

  35. Wisconsin Fermilab Caltech UCSD Florida Grid Middleware Used in Testbed • Virtual Data Toolkit 1.1.3 • VDT Client: • Globus Toolkit 2.0 • Condor-G 6.4.3 • VDT Server: • Globus Toolkit 2.0 • mkgridmap • Condor 6.4.3 • ftsh • GDMP 3.0.7 • Virtual Organization (VO) Management • LDAP Server deployed at Fermilab • GroupMAN (adapted from EDG) used to manage the VO • Use D.O.E. Science Grid certificates • Accept EDG and Globus certificates Paul Avery

  36. Commissioning the CMS Grid Testbed • A complete prototype (fig.) • CMS Production Scripts • Globus • Condor-G • GridFTP • Commissioning: Require production quality results! • Run until the Testbed "breaks" • Fix Testbed with middleware patches • Repeat procedure until the entire Production Run finishes! • Discovered/fixed many Globus and Condor-G problems • Huge success from this point of view alone • … but very painful Paul Avery

  37. Remote Site 1 Batch Queue GridFTP Master Site Remote Site 2 Batch Queue DAGMan Condor-G mop_submitter IMPALA GridFTP GridFTP Remote Site N Batch Queue GridFTP CMS Grid Testbed Production Paul Avery

  38. MCRunJob Linker ScriptGenerator Configurator MasterScript "DAGMaker" VDL Requirements MOP MOP Chimera Self Description Production Success on CMS Testbed • Results • 150k events generated, ~200 GB produced • 1.5 weeks continuous running across all 5 testbed sites • 1M event run just started on larger testbed (~30% complete!) Paul Avery

  39. Grid Coordination Efforts • Global Grid Forum (www.gridforum.org) • International forum for general Grid efforts • Many working groups, standards definitions • Next one in Japan, early 2003 • HICB (High energy physics) • Joint development & deployment of Data Grid middleware • GriPhyN, PPDG, iVDGL, EU-DataGrid, LCG, DataTAG, Crossgrid • GLUE effort (joint iVDGL – DataTAG working group) • LCG (LHC Computing Grid Project) • Strong “forcing function” • Large demo projects • IST2002 Copenhagen • Supercomputing 2002 Baltimore • New proposal (joint NSF + Framework 6)? Paul Avery

  40. WorldGrid Demo • Joint Trillium-EDG-DataTAG demo • Resources from both sides in Intercontinental Grid Testbed • Use several visualization tools (Nagios, MapCenter, Ganglia) • Use several monitoring tools (Ganglia, MDS, NetSaint, …) • Applications • CMS: CMKIN, CMSIM • ATLAS: ATLSIM • Submit jobs from US or EU • Jobs can run on any cluster • Shown at IST2002 (Copenhagen) • To be shown at SC2002 (Baltimore) • Brochures now available describing Trillium and demos • I have 10 with me now (2000 just printed) Paul Avery

  41. WorldGrid Paul Avery

  42. Summary • Very good progress on many fronts • Packaging • Testbeds • Major demonstration projects • Current Data Grid projects are providing good experience • Looking to collaborate with more international partners • Testbeds • Monitoring • Deploying VDT more widely • Working towards new proposal • Emphasis on Grid-enabled analysis • Extending Chimera virtual data system to analysis Paul Avery

  43. Grid References • Grid Book • www.mkp.com/grids • Globus • www.globus.org • Global Grid Forum • www.gridforum.org • TeraGrid • www.teragrid.org • EU DataGrid • www.eu-datagrid.org • PPDG • www.ppdg.net • GriPhyN • www.griphyn.org • iVDGL • www.ivdgl.org Paul Avery

More Related