170 likes | 297 Vues
Diligent – EGEE JRA1 Meeting, 2004 December 16. EGEE Data Management Peter Kunszt. Contents. Component Overview gLite Catalogs Overview Concepts Implementations Distribution gLite Transfer Management Scheduling model Implementation Deployment models Distribution mechanisms
E N D
Diligent – EGEE JRA1 Meeting, 2004 December 16 EGEE Data ManagementPeter Kunszt
Contents • Component Overview • gLite Catalogs • Overview • Concepts • Implementations • Distribution • gLite Transfer Management • Scheduling model • Implementation • Deployment models • Distribution mechanisms • Discussion Dec. 16 Diligent – JRA1 Workshop Peter Kunszt2
Guiding Principles Service Oriented Architecture Interoperability Portability Building on existingcomponents in alightweight manner Web Services Modularity AliEn LCG Condor Scalability Globus SRM ... Dec. 16 Diligent – JRA1 Workshop Peter Kunszt3
Data Management Tasks • File Management • Storage • Access • Placement • Cataloguing • Security • Metadata Management • Secure database access • Schema management • File-based metadata • Generic metadata Dec. 16 Diligent – JRA1 Workshop Peter Kunszt4
Product Overview • File Storage • Storage Elements with SRM (Storage Resource Manager) interface • Posix I/O interface through glite-io • Supports transfer protocols (bbftp, https, ftp, gsiftp, rfio, dcap, …) • Catalogs • File and Replica Catalog • File Authorization Service • Metadata Catalog • Distribution of catalogs, conflicts resolution (messaging) • Transfer • Top-level Data Scheduler as global entry point (there may be many). • Site File Placement Service managing transfers and catalog interactions • Site File Transfer Service managing incoming transfers (the network resource) Dec. 16 Diligent – JRA1 Workshop Peter Kunszt5
File Movement and Management • Data scheduling and high-level optimization • Job-like data transfers (queuing, ordering, etc) • Possibility to use reliable managed file transfer • Site self-consistency (locality of reference) • SRM-based managed storage (permanent and volatile) Dec. 16 Diligent – JRA1 Workshop Peter Kunszt6
File Movement and Management • Internals Dec. 16 Diligent – JRA1 Workshop Peter Kunszt7
Metadata Catalog Contents Storage URL SymLink LogicalFile Name Global UniqueIDentifyer Storage URL SymLink Storage URL Unique User-defined Mutable Unique System-defined Immutable UUID Dec. 16 Diligent – JRA1 Workshop Peter Kunszt8
Concepts • Directories • Symlinks • Authorization: ACL and base (unix) permissions • File metadata (size, ctime, mtime, checksum, status, type) • File-based metadata (key-value pairs on files), the schema is associated per directory • Extensible metadata including schema manipulation • Maybe virtual directories (cached metadata queries) in the future Dec. 16 Diligent – JRA1 Workshop Peter Kunszt9
End-userInterface Interface Design FAS FiReMan MetadataCatalog FileCatalog ReplicaCatalog MetadataSchema FASBase MetadataBase ServiceBase SEIndex Base Interfaces Service Interfaces Feature Interfaces Dec. 16 Diligent – JRA1 Workshop Peter Kunszt10
Metadata Capabilities • Metadata directly in the File Catalog • Like POSIX file metadata: key-value pairs stored. • Metadata Schema (description of key-value pairs) may be different for each directory, but all files in the same directory share the same keys • Limited query and search capabilities to single directory or single schema: the hierarchy has to restrict the query (we don’t allow a global find-like operation on metadata) • Unconstrained Metadata • Any schema possible • Schema manipulation interface available • Generic query interface (just pass in a query string) • Application-specific Metadata • On top of any of these two gLite specifications, applications can build their own metadata interface Dec. 16 Diligent – JRA1 Workshop Peter Kunszt11
gLite Catalog Implementations • Fireman Interface • Oracle 9i implementation • MySQL implementation • MetadataCatalog Interface • MySQL implementation • Oracle 9i implementation • MetadataSchema Interface • MySQL implementation • Oracle 9i implementation • Apply interfaces to existing implementations • Will have a Fireman interface also over the AliEn FC • Fireman interface over the LCG FC • MetadataCatalog and MetadataSchema over existing application catalogs • … DONE In progress or planning Dec. 16 Diligent – JRA1 Workshop Peter Kunszt12
Catalog Deployment Models • Single central catalog (AliEn, LCG-2 model) • All operations go there • Local catalogs with a central component • Update operation only on local catalogs • Update operation on both local and central catalogs • Local catalogs, no central component – only indices for certain queries Dec. 16 Diligent – JRA1 Workshop Peter Kunszt13
Distribution Mechanism 1 • Data Scheduler (global and local schedulers) • Global scheduler (VO-specific) takes requests like • Copy set of files from A to B • Make set of files available at C • Upload files from GSIFTP server to D • Delete files • Maybe also metadata operations • Local scheduler fetches tasks from known global schedulers • Coupled tightly to a local transfer service • Manage transfer where the local site is a target • Assure atomicity of transfer and catalog operations • Transfer Service • Queue data transfers to/from a given Storage Element (SRM) • Receives jobs from local scheduler • Manages transfers through a set of states Dec. 16 Diligent – JRA1 Workshop Peter Kunszt14
Distribution Mechanism 2 • Certainly possible to just rely on DB replication • Middleware distribution of updates between catalogs • Using a messaging system (JMS using JORAM) • Publish updates to message queue locally • Subscribe to updates at central catalogs / index nodes • Asynchronous messaging queues take care of update delivery • Scales well to the number of sites we deal with • However, error messages have to be queued for retrieval as well Dec. 16 Diligent – JRA1 Workshop Peter Kunszt15
To be understood • What to distribute and how • All of the data? (Replication) • Just parts? (Indexing) • Read-write mechanisms and updates between many copies (Policies) • Metadata usage • Schema manipulation capabilities – what is really needed • Metadata services by experiments may interface with gLite or implement the gLite interfaces themselves • Are a set of canned queries good enough? If yes, user does not need to have a generic query interface. • Does all of the metadata need to be local? Or will some metadata have to be fetched from remote sites? • What kinds of distributed queries are necessary at all? • What kind of metadata is for local/laptop usage? • What kinds of update semantics are needed if at all? (Single instance, single master, multi master) Dec. 16 Diligent – JRA1 Workshop Peter Kunszt16
Summary • gLite Data Management provides a complete set of file management middleware including data and catalog distribution • Many extensible modules based on simple interfaces. Capabilities may easily be extended if needed. • Actual usage patterns need to be understood in order to set up an efficient deployment scenario. • Still many difficult open questions which have to be answered individually for each Grid VO. We are looking forward to work with the community to address these issues. Dec. 16 Diligent – JRA1 Workshop Peter Kunszt17