150 likes | 295 Vues
GriPhyN/PPDG Data Grid Architecture. Ian Foster Mathematics and Computer Science Division Argonne National Laboratory Department of Computer Science The University of Chicago. Data Grid Architecture. Goal: Define requirements and principal elements of a Data Grid system
E N D
GriPhyN/PPDGData Grid Architecture Ian Foster Mathematics and Computer Science DivisionArgonne National Laboratory Department of Computer Science The University of Chicago
Data Grid Architecture • Goal: Define requirements and principal elements of a Data Grid system • Working bottom up, identify elements for which we have solid base of experience and code • Working top down, identify and prioritize higher-level elements
MCAT; GriPhyN catalogs MDS MDS GDMP DAGMAN, Kangaroo GSI, CAS Globus GRAM GridFTP; GRAM; SRM Data Grid Architecture Application DAG Catalog Services Monitoring Planner Info Services DAG Repl. Mgmt. Executor Policy/Security Reliable Transfer Service Compute Resource Storage Resource = some [perhaps partial] solution exists
Storage Resource Architecture Storage Resource Manager Common protocols Storage System Architecture—includes - RAID E.g.: - Cluster HPSS - Hierarchical NeST Capabilities—may support - Reservation - Monitoring
Compute & Storage Resources:Protocol View • Compute resources • GRAM for reliable remote access • GridFTP for data stage in/out • MDS for discovery and monitoring • Storage resources • GridFTP for data access & transfer • MDS for discovery and monitoring • GRAM for remote management
Protocols: Observations • GridFTP • FTP + extensions for performance, flexibility • GRAM-1 and GRAM-2 • GRAM-1: remote access to computers, with UW extensions for enhanced reliability • GRAM-2: reservation, management of many different resources • MDS-2 • Scalable, secure discovery and monitoring • Common security infrastructure
Reliable Transfer Services • Requirements • Secure, reliable, high-performance transfer • User-specified error recovery strategies • Address hard and soft (performance) errors • Status • GridFTP with pluggable monitoring and error recovery defined • Globus reliable data transfer service specified and alpha version available
Executor • Requirements • Reliable management of the execution of a set of computations and data movements • Current status • UW DAGMan, which executes a supplied directed acyclic graph (DAG) • Future directions • Error handling and recovery • Ability to express alternative strategies • Beyond DAGs
Planner • Requirement • Develop a plan to satisfy a user request • Current status • Various application-specific prototypes, e.g., for LIGO and CMS • Future plans • Query estimation • Virtual data materialization • Etc.
Catalog Services Architecture • Transparency with respect to location • Metadata catalog: attributes -> object name • Grid Container Management System: mapping from object name to logical file • Replica catalog: logical name -> physical name • Transparency with respect to materialization • Transformation catalog: software details • Derived metadata catalog: attributes -> derived object name • Derived data catalog: object name -> transf.
Security and Policy • Requirements • Conveniently express and implement community policies for resource access • Current solutions • Grid Security Infrastructure: PKI-based, single sign-on, authentication, authorization • Soon to be available • Community Authorization Service
1. CAS request, with user/group CAS resource names membership Does the and operations collective policy resource/collective authorize this 2. CAS reply, with membership request for this capability and resource CA info user? collective policy information Resource 3. Resource request, authenticated with Is this request capability authorized by the local policy capability? information 4. Resource reply Is this request authorized for the CAS? Community Authorization(Prototype shown August 2001) User Laura Pearlman, Steve Tuecke, Von Welch, others
MDS-2Information Service Architecture • Multiple “sensors”: resource existence, status, location, etc. • Each sensor speaks two protocols, either natively or via gateways • Registration: notify others of existence • Enquiry: respond to requests for state • Can build many services on this basis • Resource discovery & characterization • System monitoring: which resources are up? • To do: expand set of “sensors”
Questions and Next Steps • Many open issues remain, e.g. • Grid-enabled storage: requirements? • Catalog architecture: adequacy? Naming? • Request planning, execution: requirements? • Replica management: requirements? • Priorities for additions to current toolkit • Next steps • Produce DGRA v2 document • Package software based on specification • Evaluate and refine software and document