1 / 20

Virtual Data Grid Architecture

Virtual Data Grid Architecture. Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny. GriPhyN Summary.

bona
Télécharger la présentation

Virtual Data Grid Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny

  2. GriPhyN Summary The GriPhyN research agenda aims at IT advances that will enable groups of scientists distributed worldwide to harness Petascale processing, communication, and data resources to transform raw experimental data into scientific discoveries. The goals of the GriPhyN project are to achieve the fundamental IT advances required to realize Petascale Virtual Data Grids and to demonstrate, evaluate, and transfer these research results via the creation of a Virtual Data Toolkit to be used by the four major physics experiments and other projects.

  3. Major Points • Project has two complementary & supporting elements • IT research project: will be judged on contributions to knowledge • CS/application partnership: will also be judged on successful transfer to experiments • Two associated unifying concepts • Virtual data as the central intellectual concept • Toolkit as a central deliverable and technology transfer vehicle

  4. Virtual Data as a Key Intellectual Challenge and Unifying Concept “These characteristics combine to enable the definition and delivery of a potentially unlimited virtual space of data products derived from other data. In this virtual space, requests can be satisfied via direct retrieval of materialized products and/or computation, with local and global resource management, policy, and security constraints determining the strategy used.”

  5. Virtual Data (contd) “The concept of virtual data recognizes that all except irreproducible raw experimental data need ‘exist’ physically only as the specification for how they may be derived. The grid may materialize zero, one, or many copies of derivable data depending on probable demand and the relative costs of computation, storage, and transport.”

  6. (Simple) Virtual Data Example • (LIGO) “Gravitational strain for 2 minutes around each of 200 gamma-ray bursts over the last year” • For each requested data value, need to • Determine if it is materialized; if so, where; if not, how to compute it • Plan data movements and computations required to obtain all results • Execute this plan

  7. GriPhyN Goals “Explore concept of virtual data and its applicability to data-intensive science,” i.e., • Transparency with respect to location • Known concept; but how to realize in a large-scale, performance-oriented Data Grid? • Transparency with respect to materialization • To determine: is this useful? • Automated management of computation • Issues of scale, transparency

  8. Production Team Individual Investigator Other Users Interactive User Tools Request Planning and Request Execution Virtual Data Tools Scheduling Tools Management Tools Performance Estimation and Evaluation Resource Security and Other Grid Resource Security and Other Grid Management Policy Services Management Policy Services Services Services Services Services Transforms Raw data Distributed resources source (code, storage, computers, and network) Primary GriPhyN R&D Components

  9. Data Grid Reference Architecture:Purpose • Identify primary components of a Data Grid architecture (part vocabulary, part requirements definition, part strategy) • Suggest potential implementation approaches • Identify principal areas in which uncertainty exists and hence research is required

  10. Observations on Architecture • We need an architecture so that we can • Coordinate our own activities • Coordinate with other Data Grid projects • Explain to others (experiments, NSF, CS community) what we are doing • An architecture must: • Facilitate CS research activities by simplifying evaluation of alternatives • Not preclude experimentation with (radically) alternative approaches

  11. Documents • A Data Grid Reference Architecture • Representing Virtual Data: A Catalog Architecture for Location and Materialization Transparency • Virtual Data Research Challenges • Requirements documents from CMS, LIGO, SDSS

  12. Data Grid Reference Architecture User Applications Request Formulation Virtual Data Catalogs Request Manager Request Planner Request Executor Storage Systems Code Repositories Computers Networks

  13. Relationship Between Components Virtual Data Data Grids Grids

  14. Application “Specialized services”: user- or appln-specific distributed services Application User Internet Protocol Architecture “Managing multiple resources”: ubiquitous infrastructure services Collective “Sharing single resources”: negotiating access, controlling use Resource “Talking to things”: communication (Internet protocols) & security Connectivity Transport Internet “Controlling things locally”: Access to, & control of, resources Fabric Link Layered Grid Architecture

  15. GriPhyNData Grid Reference Architecture Application Discipline-Specific Data Grid Application Catalogs Replica Management Request Management Community Policy … Collective Access to data, access to computers, access to network performance data, … Resource Communication, service discovery (DNS), authentication, delegation Connectivity Storage Systems Compute Systems Networks Code Repositories Fabric …

  16. Existing Components • Globus Toolkit • MDS-2 information service: access to static & dynamic configuration & state information • GRAM resource access protocol • GridFTP data access and transfer protocol • Replica catalog, replica management • Grid Security Infrastructure: single sign on • Condor, Condor-G resource management • SRB catalog services

  17. Globus Data Grid Components Attribute Specification Replica Catalog Metadata Catalog Application Multiple Locations Logical Collection and Logical File Name MDS Selected Replica Replica Selection Performance Information & Predictions GridFTP commands NWS Disk Cache TapeLibrary Disk Array Disk Cache Replica Location 1 Replica Location 2 Replica Location 3

  18. Catalog Architecture

  19. Short-Term (2001) Developments • Deployment of, and experimentation with, basic tools: data movement, data location, computation management • Already started in CMS and LIGO • Requirements definition for experiments • Already started with documents from CMS, LIGO, SDSS • Virtual data catalog prototype • Prototyping of other elements TBD • Work breakdown with EDG, PPDG

  20. Goals for this Meeting • Identify major areas in with Data Grid Reference Architecture needs improvement • Identify how each CS research thrust contributes to this refinement process, and on what schedule • Research, software, and/or experiments • Identify how each application area will contribute to evaluating DGRA ideas • Experiments conducted

More Related