1 / 28

DATA GRIDS for Science and Engineering

DATA GRIDS for Science and Engineering Worldwide Analysis at Regional Centers Harvey B. Newman Professor of Physics, Caltech Islamabad, August 21, 2000. Tier2 Center. Tier2 Center. Tier2 Center. Tier2 Center. Tier2 Center. HPSS. HPSS. HPSS. HPSS. LHC Vision: Data Grid Hierarchy.

feo
Télécharger la présentation

DATA GRIDS for Science and Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DATA GRIDS for • Science and Engineering • Worldwide Analysis at Regional CentersHarvey B. NewmanProfessor of Physics, Caltech • Islamabad, August 21, 2000

  2. Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center HPSS HPSS HPSS HPSS LHC Vision: Data Grid Hierarchy Bunch crossing per 25 nsecs; 100 triggers per second. Event is ~1 MByte in size ~PByte/sec ~100 MBytes/sec Online System Experiment Offline Farm,CERN Computer Ctr > 20 TIPS Tier 0 +1 HPSS ~0.6-2.5 Gbits/sec Tier 1 FNAL Center Italy Center UK Center FranceCentre ~2.5 Gbits/sec Tier 2 ~622 Mbits/sec Tier 3 Institute ~0.25TIPS Institute Institute Institute Physicists work on analysis “channels” Each institute has ~10 physicists working on one or more channels 100 - 1000 Mbits/sec Physics data cache Tier 4 Workstations

  3. Grids: Next Generation Web http:// Web: Uniform access to HTML documents http:// Software catalogs Sensor nets Grid: Flexible, high-performance access to all significant resources Computers Data Stores Colleagues On-demand creation of powerfulvirtual computing and data systems

  4. Roles of Projectsfor HENP Distributed Analysis • RD45, GIOD Networked Object Databases • Clipper/GC High speed access, processing and analysis FNAL/SAM of files and object data • SLAC/OOFS Distributed File System + Objectivity Interface • NILE, Condor Fault Tolerant Distributed Computing • MONARC LHC Computing Models: Architecture, Simulation, Strategy • PPDG First Distributed Data Services and Data Grid System Prototype • ALDAP OO Database Structures & Access Methods for Astrophysics and HENP Data • GriPhyN Production-Scale Data Grids • EU Data Grid

  5. Grid Services Architecture [*] A Rich Set of HEP Data-Analysis Related Applications Applns Remote data toolkit Remote comp. toolkit Remote viz toolkit Remote collab. toolkit Remote sensors toolkit Appln Toolkits ... Grid Services Protocols, authentication, policy, resource discovery & management, instrumentation,... Data stores, networks, computers, display devices,… ; associated local services Grid Fabric [*] Adapted from Ian Foster: there are computing grids, access (collaborative) grids, data grids, ...

  6. The Grid MiddlewareServices Concept • Standard services that • Provide uniform, high-level access to a wide range of resources (including networks) • Address interdomain issues: security, policy • Permit application-level management and monitoring of end-to-end performance • Broadly deployed, like Internet Protocols • Enabler of application-specific tools as well as applications themselves

  7. Application Example:Condor Numerical Optimization • Exact solution of “nug30” quadratic assignment problem on June 16, 2000 • 14,5,28,24,1,3,16,15,10,9,21,2,4,29,25,22,13, 26,17,30,6,20,19,8,18,7,27,12,11,23 • Used “MW” framework that maps branch and bound problem to master-worker structure • Condor-G delivered 3.46E8 CPU seconds in 7 days (peak 1009 processors), using parallel computers, workstations, and clusters MetaNEOS: Argonne, Northwestern, Wisconson

  8. Emerging Data Grid User Communities • NSF Network for Earthquake Engineering Simulation Grid (NEES) • Integrated instrumentation, collaboration, simulation • Grid Physics Network (GriPhyN) • ATLAS, CMS, LIGO, SDSS • Particle Physics Data Grid (PPDG) • EU Data Grid • Access Grid; VRVS: supporting group-based collaboration • And • The Human Genome Project • The Earth System Grid and EOSDIS • Federating Brain Data • Computed Microtomography • The Virtual Observatory (US + Int’l)

  9. The Particle Physics Data Grid (PPDG) ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC, U.Wisc/CS Site to Site Data Replication Service 100 Mbytes/sec PRIMARY SITE Data Acquisition, CPU, Disk, Tape Robot REGIONAL SITE CPU, Disk, Tape Robot • First Round Goal: Optimized cached read access to 10-100 Gbytes drawn from a total data set of 0.1 to ~1 Petabyte • Matchmaking, Resource Co-Scheduling: SRB, Condor, HRM, Globus Multi-Site Cached File Access Service University CPU, Disk, Users University CPU, Disk, Users Regional Site Tape, CPU, Disk, Robot PRIMARY SITE DAQ, Tape, CPU, Disk, Robot University CPU, Disk, Users University CPU, Disk, Users University CPU, Disk, Users Regional Site Tape, CPU, Disk, Robot

  10. GriPhyN: PetaScale Virtual Data Grids • Build the Foundation for Petascale Virtual Data Grids Production Team Individual Investigator Workgroups Interactive User Tools Request Planning & Request Execution & Virtual Data Tools Management Tools Scheduling Tools Resource Other Grid • Resource • Security and • Other Grid Security and Management • Management • Policy • Services Policy Services Services • Services • Services Services Transforms Distributed resources Raw data (code, storage, computers, and network) source

  11. EU-Grid ProjectWork Packages        

  12. Grid Tools for CMS “HLT” Production: A. Samar (Caltech) • Distributed Job Execution and Data Handling:Goals • Transparency • Performance • Security • Fault Tolerance • Automation Site A Site B Submit job Replicate data Job writes data locally Replicate data • Jobs are executed locally or remotely • Data is always written locally • Data is replicated to remote sites Site C A.Samar, M. Hafeez (Caltech) with CERN and FNAL

  13. GRIDs In 2000: Summary • Grids are changing the way we do science and engineering • Key services and concepts have been identified, and development has started • Major IT challenges remain • Opportunities for Collaboration • Transition of services and applications to production use is starting to occur • In future more sophisticated integrated services and toolsets could drive advances in many fields of science and engineering • High Energy Physics, facing the need for Petascale Virtual Data, is an early adopterand leading Data Grid developer

  14. The GRID BOOK • Book published by Morgan Kaufman • www.mkp.com/grids • Globus • www.globus.org • Grid Forum • www.gridforum.org

  15. French GRID Initiative Partners • Computing centres: • IDRIS CNRS High Performance Computing Centre • IN2P3 Computing Centre • CINES, centre de calcul intensif de l’enseignement • CRIHAN centre régional d’informatique à Rouen • Network departments: • UREC CNRS network department • GIP Renater • Computing Science CNRS & INRIA labs: • Université Joseph Fourier • ID-IMAG • LAAS • RESAM • LIP and PSMN (Ecole Normale Supérieure de Lyon) • Industry: • Société Communication et Systèmes • EDF R&D department • Applications development teams (HEP, Bioinformatics,Earth Observation): • IN2P3, CEA, Observatoire de Grenoble, Laboratoire de Biométrie, Institut Pierre Simon Laplace

  16. GEth Switch VRVS MPEG2 FEth Switch FEth Switch FEth Switch FEth Switch FEth GbEth RAID Router FC Data Server OC-3 DLT OC-3 LHC Tier 2 Center In 2001 OC-12

  17. ANL GSI-wuftpd ISI GSI-wuftpd Disk Disk ESG Prototype Inter-communication Diagram LLNL ANL PCMDI Replica Catalog LDAP Script Disk Request Manager LDAP C API or Script GIS with NWS GSI-ncftp GSI-ncftp GSI-ncftp GSI-ncftp GSI-ncftp GSI-ncftp CORBA LBNL GSI-wuftpd LBNL NCAR GSI-wuftpd SDSC GSI-pftpd HPSS HPSS Disk on Clipper HRM Disk Disk

  18. GriPhyN Scope • Several scientific disciplines • US-CMS High Energy Physics • US-ATLAS High Energy Physics • LIGO Gravity wave experiment • SDSS Sloan Digital Sky Survey • Requesting $70M from NSF to build Grids • 4 Grid implementations, one per experiment • Tier2 hardware, networking, people, R&D • Common problems for different implementations • Partnership with CS professionals, IT, industry • R&D from NSF ITR Program ($12M)

  19. Data Grids: Better Global Resource Use and Faster Turnaround • Efficient resource use and improved responsiveness through: • Treatment of the ensemble of site and network resources as an integrated (loosely coupled) system • Resource discovery, prioritization • Data caching, query estimation, co-scheduling, transaction management • Network and site “instrumentation”: performance tracking, monitoring, problem trapping and handling

  20. Emerging ProductionGrids NSF National Technology Grid NASA Information Power Grid

  21. EU HEP Data Grid Project

  22. Grid (IT) Issues to be Addressed • Data caching and mirroring strategies • Object Collection Extract/Export/Transport/Import for large or highly distributed data transactions • Query estimators, Query Monitors (cf. ATLAS/GC work) • Enable flexible, resilient prioritisation schemes • Query redirection, priority alteration, fragmentation, etc. • Pre-Emptive and realtime data/resource matchmaking • Resource discovery • Co-scheduling and queueing • State, workflow, & performance-monitoring instrumentation; tracking and forward prediction • Security: Authentication (for resource allocation/usage and priority); running an international certificate authority

  23. Why Now? • The Internet as infrastructure • Increasing bandwidth, advanced services;a need to explore higher throughput • Advances in storage capacity • A Terabyte for ~$40k (or $ 10k) • Increased availability of compute resources • Dense (Web) Server-Clusters, supercomputers, etc. • Advances in application concepts • Simulation-based design, advanced scientific instruments, collaborative engineering, ...

  24. PPDG Work at Caltech and SLAC • Work on the NTON connections between Caltech and SLAC • Test with 8 OC3 adapters on the Caltech Exemplar multiplexed across to a SLAC Cisco GSR router. Limited throughput due to small MTU in the GSR. • A Dell dual Pentium III based server with two OC12 (622 Mbps) ATM cards. Configured to allow aggregate transfer of more then 100 Mbytes/seconds in both directions Caltech  SLAC. • So far reached 40 Mbytes/sec on one OC12 • Monitoring tools installed at Caltech/CACR • PingER installed to monitor WAN HEP connectivity • A Surveyor device will be installed soon, for very precise measurement of network traffic speeds • Investigations into a distributed resource management architecture that co-manages processors and data

  25. Participants • Main partners:CERN, INFN(I), CNRS(F), PPARC(UK), NIKHEF(NL), ESA-Earth Observation • Other sciences:Earth Observation, Biology, Medicine • Industrial participation:CS SI/F, DataMat/I, IBM/UK • Associated partners:Czech Republic, Finland, Germany, Hungary, Spain, Sweden (mostly computer scientists) • Work with US:Underway; Formal collaboration being established • Industry and Research Project Forumwith representatives from: • Denmark, Greece, Israel, Japan, Norway, Poland, Portugal, Russia, Switzerland

  26. GriPhyN: First Production Scale “Grid Physics Network” • Develop a New Form of Integrated Distributed System, while Meeting Primary Goals of the LIGO, SDSS and LHC Scientific Programs • Focus on Tier2 Centers at Universities • In a Unified Hierarchical Grid of Five Levels • 18 Centers; with Four Sub-Implementations • 5 Each in US for LIGO, CMS, ATLAS; 3 for SDSS • Near Term Focus on LIGO, SDSS handling of real data; LHC “Data Challenges” with simulated data • Cooperation with PPDG, MONARC and EU Grid Project • http://www.phys.ufl.edu/~avery/GriPhyN/

  27. GriPhyN: Petascale Virtual Data Grids • An effective collaboration betweenPhysicists, Astronomers, and Computer Scientists Virtual Data • A hierarchy of compact data forms, user collections and remote data transformations is essential • Even with future Gbps networks • Coordination among multiple sites is required • Coherent strategies are needed for data location, transport, caching and replication, structuring, and resource co-scheduling for efficient access

  28. Sloan Digital Sky SurveyData Grid • Three main functions: • Raw data processing on a Grid (FNAL) • Rapid turnaround with TBs of data • Accessible storage of all image data • Fast science analysis environment (JHU) • Combined data access + analysis of calibrated data • Distributed I/O layer and processing layer; shared by whole collaboration • Public data access • SDSS data browsing for astronomers, and students • Complex query engine for the public

More Related