1 / 33

Data Management GridPP and EDG

Data Management GridPP and EDG. Gavin McCance University of Glasgow May 9, 2002. http://www.gridpp.ac.uk/datamanagement http://cern.ch/grid-data-management. Overview. Status of data management work Products delivered to 1.2 GDMP 3.0 Reptor: replica manager Spitfire

pabla
Télécharger la présentation

Data Management GridPP and EDG

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data ManagementGridPP and EDG Gavin McCance University of Glasgow May 9, 2002 http://www.gridpp.ac.uk/datamanagement http://cern.ch/grid-data-management

  2. Overview • Status of data management work • Products delivered to 1.2 • GDMP 3.0 • Reptor: replica manager • Spitfire • Optor: grid simulation • What’s currently available and future plans Gavin McCance

  3. WP2: Data Management Work is done within the EDG WP2 team (based in CERN) • Replication • Replica catalogue • Replica manager • Query Optimisation* • Grid replica optimisation • Meta-data management* • Secure, transparent access to meta-data • Service discovery *Direct UK involvement Gavin McCance

  4. General Status • Deliverables on target • Major software released for 1.2 • UK manpower based at Glasgow: • 2.5 RAs, Me, Will Bell, Paul Millar (50%) • 1 PhD student, David Cameron • 1 more student to come in September Gavin McCance

  5. File Replication File-1 LFN • Requires: replica catalogue or replica location service • Keeps track of the mapping between logical file name and physical file names • Requires: replica manager or replica management service • High level tool to actually do the replication and manage what files are being replicated Paris File-1 Chicago Glasgow File-1 File-1 Gavin McCance

  6. File Replication • Current replication functionality provided by GDMP 3.0 – new for 1.2 release! • Used for mirroring of storage elements • Implements subscription based replication model with security, and updates the Globus replica catalogue Gavin McCance

  7. GDMP 3.0 Site A Site B • Site ‘B’ subscribes to site A’s files • ‘A’ produces new file – ‘B’ will be notified of this • ‘B’ then starts transfer of new files from ‘A’ • Replica catalogue at ‘B’ is updated to reflect new file replica. Gavin McCance

  8. GDMP 3.0 • Changes w.r.t. 2.* : • New security model – host certificates • Server delegation, i.e. accounts on SE not necessarily required • Client-only install possible • Basic space management • Stand-alone server option • ‘unsubscribe’ option Gavin McCance

  9. GDMP 3.0 status • Final version of GDMP released for 1.2 • For future, GDMP will be absorbed into the Replica Manager Service which will offer richer functionality • SRPM, RPM, tarball, User Guide, Quick Config for EDG SEs: • http://cmsdoc.cern.ch/cms/grid/ Gavin McCance

  10. Replica Location Service • Current Globus replica catalogue is LDAP based • To be replaced with new ‘GIGGLE’ framework Replica Location Service • Joint EDG WP2 / Globus / PPDG project • Trade-offs: global consistency, space, query / update overhead, reliability Gavin McCance

  11. RLS model… • Reliable local state • Relaxed global consistency • Soft state updates to global index nodes permits graceful behaviour in face of network problems • Secure access • Implemented as web service Gavin McCance

  12. Hierarchical indexing. The higher- level RLI contains pointers to lower-level RLIs or LRCs. RLI RLI = Replica Location Index LRC = Local Replica Catalog RLI RLI LRC LRC LRC LRC LRC Storage Element Storage Element Storage Element Storage Element Storage Element Gavin McCance

  13. Scalable, reliable • LFN Namespace partitioned among RLIs • Redundant RLIs for reliability • Lossy compression • Higher level RLIs may lose accuracy about mappings Gavin McCance

  14. RLS status • Currently Alpha for developers • http://cern.ch/grid-data-management/replica-location-service/RLS.html • New version will be progressively integrated with other replication software. • Testbed deployment in September release Gavin McCance

  15. Replica Management Service • Web Service under development (Reptor) • Will absorb GDMP functionality and extend it • Will use the Replica Location Service • Two facets • Core Replica Management API • Optimisation API Gavin McCance

  16. Core Reptor API • Similar to GDMP API • registerEntry • copyFile • copyAndRegisterFile • replicateFile • deleteFile • listReplicas Gavin McCance

  17. Physical file attribute File type Master permanent secondary copy permanent, durable or volatile. Interactions with SE • Defined file types: Gavin McCance

  18. RMS Current Status • Testbed can use GDMP for 1.2 • Defined Reptor API currently wraps the Globus Replica Manager • Will be developed progressively • Full version on testbed in September • Technical reports: http://cern.ch/grid-data-management/publications.html Gavin McCance

  19. Grid Query Optimisation • Best place for a job? • Joint WP1 / WP2 question… • Approach: 2-Phase Optimisation: • Phase 1: Find suitable CE for job execution given distribution of files it will access • Phase 2: Re-optimise file access during job execution (due to dynamic nature of Grid, the resource status changes over time) Gavin McCance

  20. Optimisation API • initFilePrefetch(LFN[], CE, protocol[], fraction) • cancelFilePrefetch(LFN[], CE) • getBestFile(LFN[], protocol[], fraction) • getNetworkCosts(SE1, SE2, filesize, protocol) from WP7 • getIOCosts(SE, PFN) from WP5 Gavin McCance

  21. Grid Replica Optimisation • Controlled intelligent replication to optimise grid over the longer term • Collect getBestFile requests • ‘Intelligence’ based on algorithms • Test replication algorithms on data-centric grid simulator Gavin McCance

  22. Optor – replica optimiser simulation • Simulate prototype Grid • Input site policies and experiment data files. • Introduce replication algorithm: • Files are always replicated to the local storage. • If necessary oldest files are deleted. Gavin McCance

  23. Optor first results Even a basic replication algorithm significantly reduces network trafficand program running times. New economics-based algorithms under investigation! http://ppewww.ph.gla.ac.uk/ScotGRID/Optor Gavin McCance

  24. Meta-data Management • Spitfire v1.1.0 delivered • A grid enabled database service • Grid enabled front end to any type of RDBMS • Examples: • Grid meta-data: replica catalogue, service registry • Application meta-data: experimental data catalogues, calibration data Gavin McCance

  25. V1.1.0 XSQL Spitfire • CURRENT (v1.1.0) is based on XSQL templates on the server, e.g. <role=“Read-only”/> <query> SELECT FILENAME FROM HFS_DATASET WHERE RNNO={@run} AND TRIGGER={@trig} AND STATUS={@stat} </query> File URL = http://filecat1.atlas.cern.ch/hfs/findDataSet.xsql Gavin McCance

  26. V1.1.0 Spitfire client • Any HTTP client – either your own app, or a web-browser form • POST an HTML FORM to http://filecat1.atlas.cern.ch/hfs/findDataSet.xsql with parameters run=25555, trig=highlumi, stat=good • The operation is made on the database, and the result send back to the client… Gavin McCance

  27. HTTP + SSLRequest + client certificate Is certificate signedby a trusted CA? Has certificatebeen revoked? No No Yes Finddefault Role ok? Request and connection ID Security Mechanism Servlet Container SSLServletSocketFactory RDBMS Trusted CAs TrustManager Revoked Certsrepository Security Servlet ConnectionPool Authorization Module Does user specify role? Role repository Translator Servlet Role Connectionmappings Map role to connection id Gavin McCance

  28. V1.1.0 • V1.1.0 available for 1.2 release now! • SRPM, RPM, tarball installation • User / Admin / Quick Install guides • http://cern.ch/hep-proj-spitfire Gavin McCance

  29. New spitfire client (dev) • Users can use either this or v1.1.0 static (XSQL template based) functionality • A database client API has been defined • Will implement as grid service using standard web service technologies Gavin McCance

  30. Client side API to access remote database • DB Admin • Create(), Drop(), Alter() Table, Database • DB Core functionality • Insert(), Update(), Delete(), Select() • DB Role admin • Secure, role based authorisation • DB Information • Schema, Quotas, Disk space Gavin McCance

  31. Extra functionality • To be developed.. • Distributed querying • Replication of meta-data • Automated expiration and cleanup • Discussions with UK DBTF and GGF Database Group Gavin McCance

  32. Service Index • How do I find a specific grid service? • E.g. replica location server, image database, information service • XML Service description • What, where, attributes, how to contact. • Scalable architectures for querying this developed • Service index web service • W. Hoschek’s thesis and paper (WP2@CERN) • API developed Gavin McCance

  33. More Info • More information available at… http://www.gridpp.ac.uk/datamanagement http://cern.ch/grid-data-management Gavin McCance

More Related