1 / 20

Preservation and Long Term Access to Data and Records in a Knowledge-based Society Reagan W. Moore San Diego Supercomput

Preservation and Long Term Access to Data and Records in a Knowledge-based Society Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.npaci.edu/DICE/. Staff Reagan Moore Ilkai Altintas Chaitan Baru Sheau Yen Chen Charles Cowart Amarnath Gupta George Kremenek

odelia
Télécharger la présentation

Preservation and Long Term Access to Data and Records in a Knowledge-based Society Reagan W. Moore San Diego Supercomput

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Preservation and Long Term Access to Data and Records in a Knowledge-based Society Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.npaci.edu/DICE/

  2. Staff Reagan Moore Ilkai Altintas Chaitan Baru Sheau Yen Chen Charles Cowart Amarnath Gupta George Kremenek M. Kulrul Bertram Ludäscher Richard Marciano A. Memon XuFei Qian Roman Olshanowsky Arcot Rajasekar Abe Singer Michael Wan Ilya Zaslavsky Bing Zhu Graduate Students A. Bagchi S. Bansal A. Behere R. Bharath S. Bharath L. Sui Undergraduate Interns N. Cotofana D. Le J. Trang L. Yin +/- NN Data and Knowledge Systems Group

  3. Topics • Building persistent archives • Data grids • Authenticity mechanisms • Managing technology evolution • Knowledge-based access

  4. Archival Processes  Appraisal –determine the archivable content  Accession - determine the initial physical location for the data, and the relationship of the new collection to existing collections • Arrangement - add administration control, describe the information content (provenance, authenticity, structure, administrative), and decompose digital objects into their components as needed. • Description - complete the definition of collection attributes by iterating between arrangement, reformatting, and representation. • Preservation – build an archivable form of the digital entities, characterize the collection context , and manage their storage  Access – provide query mechanisms for discovering, retrieving, and presenting the digital entities.

  5. ERA Concept model

  6. Common Approach (digital library, persistent archive, data grid) • Logical name space used to organize digital entities, and associate attributes • Separation of information management from data storage management • Definition of abstraction mechanisms for dealing with repositories • Emergence of need for knowledge management

  7. C, C++, Libraries Unix Shell Databases DB2, Oracle, Postgres Archives HPSS, ADSM, UniTree, DMF File Systems Unix, NT, Mac OSX SDSC Storage Resource Broker & Meta-data Catalog Levels of Abstraction Application Linux I/O Web WSDL DLL / Python Java, NT Browsers Prolog Predicate Clients Consistency Management / Authorization-Authentication Prime Server Logical Name Space Latency Management Data Transport Metadata Transport Catalog Abstraction Storage Abstraction Databases DB2, Oracle, Sybase Servers HRM

  8. Authenticity • Guarantee that the data has not been changed • Collection owned data, only accessible through the data handling system • Support roles defining access (curation, owner, annotation, read) • Support access controls mapping users to roles • Audit trails that record all operations on files • Digital signatures - cryptographic checksums

  9. Managing Technology Evolution • Data grids provide interoperability mechanisms to access data in multiple administration domains and multiple types of storage systems. • Persistent archives migrate collections from old technology to new technology to support presentation on new systems • Both require the ability to access heterogeneous systems

  10. Presentation of Digital Objects Application Operating System Storage System Display System Digital Object

  11. Technology Management - Emulation Old Application Wrap Application New Operating System New Storage System New Display System Digital Object

  12. Technology Management Old Application Add Operating System Call New Operating System New Storage System New Display System Digital Object

  13. Technology Management Old Application Add Operating System Call New Operating System Add Operating System Call Old Storage System Old Display System Digital Object

  14. Technology Management Migration New Application New Operating System New Storage System New Display System Migrate Encoding Format Digital Object

  15. Technology Management - SDSC New Application New Operating System Wrap Storage System Wrap Display System Old Storage System Old Display System Migrate Encoding Format Digital Object

  16. Accessing Archived Data • Name transparency • Access data without knowing the file name • Map from attributes to a local file name • Location transparency • Access data without knowing where it is stored • Map from global file name to local file name • Collection transparency • Access data without knowing the collection attributes • Map from concept space to collection attributes

  17. Information Management- Logical Name Space • Set of attributes to describe digital entities that are registered into the logical name space • SRB metadata - Unix file system semantics • Provenance metadata - Dublin Core • Resource metadata - User access control lists • Discipline metadata - User defined attributes • Each digital entity may have unique attributes

  18. Knowledge Management - Discovery across Collections • Mapping from collection attributes to discipline concepts • Make queries based on discipline concepts • Characterization of relationships between attributes • Semantic / logical - cross-walks • Procedural / temporal - records management • Structural / spatial - GIS

  19. Knowledge Based Data Grids Ingest Services Management Access Services Relationships Between Concepts Knowledge Repository for Rules Knowledge or Topic-Based Query / Browse Knowledge XTM DTD • Rules - KQL (Model-based Access) XML DTD Information Repository Attribute- based Query Attributes Semantics SDLIP Information (Data Handling System - SRB) Data Fields Containers Folders Storage (Replicas, Persistent IDs) Grids Feature-based Query MCAT/HDF

  20. Further Information http://www.npaci.edu/DICE

More Related