1 / 41

Beyond Storage: Rethinking the role of repositories in scholarly communication

Beyond Storage: Rethinking the role of repositories in scholarly communication. DELOS Workshop Digital Repositories: Interoperability and Common Services May 11, 2005. Sandy Payette Cornell University. First… is there a problem?. Existing scholarly communication system.

ace
Télécharger la présentation

Beyond Storage: Rethinking the role of repositories in scholarly communication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Beyond Storage:Rethinking the role of repositories in scholarly communication DELOS Workshop Digital Repositories: Interoperability and Common Services May 11, 2005 Sandy Payette Cornell University

  2. First… is there a problem?

  3. Existing scholarly communication system • Does not mirror the reality of the scholarly process • Published information artifacts do not resemble the rich information that is produced along the process • Not evolved enough to enable easy and effective integration and dissemination of new, rich forms of digital information

  4. The Future: Rich Scholarly Information Networks

  5. Roles of digital repositories today • Early Dissemination: • Enhance upstream scholarly communication • Improvement over traditional pre-print (paper) sharing among scholars • Open Access: • Harnad’s “subversive proposal” • Possibility of bypassing or eliminating traditional publisher model • Document Discovery: • Searching for documents in a repository, • Federation or metadata harvest for search over multiple repositories • Storage and Archiving: • E-print archives: author-self archiving gives scholars control over their intellectual output • Institutional repositories: institutions commit to preservation

  6. Evolutionary, but not revolutionary • In many ways repositories represent an evolution of the traditional publishing paradigm • Submit documents • Gain access to documents… • Share results earlier in the scholarly process, and electronically • Still locked into document-centric paradigm • Store documents to promote access • Store documents to promote archiving • Index documents to promote search and discovery • Citation analysis to understand relationships of documents

  7. Signs of Change – Scholars exercising the network • Grid computing in sciences • Share computing resources • Share services and distributed virtual file systems • Examples • Enabling Grids for E-Science (http://public.eu-egee.org/) • National Virtual Observatory (http://www.us-vo.org/) • Humanities computing • Hyperlinked historical documentary editions • New Forms of Digital Scholarship • Rossetti archive (http://www.rossettiarchive.org/) • Perseus (www.perseus.tufts) • Pompeii Forum (http://pompeii.virginia.edu) • Tibetan and Himalayan Digital Library (thdl.org)

  8. Vision for more revolutionary approach

  9. The revolutionary opportunity… • Looming on the horizon is the potential of a future scholarly communication system that is • Highly collaborative • Network-based • Data-intensive • Process-oriented • We can change the way research and education is conducted by exposing rich knowledge-oriented information assets • Digital repositories must be rationalized within this broader vision.

  10. New Functionality • Content aggregation: • combining information entities in novel ways • Knowledge integration: • capturing semantic and factual relationships among information entities • Information reuse: • allowing secondary, tertiary products • Information transformation: • combining information entities with computational services • Collaboration and contribution: • blurring the line between authors, publishers, users, experts…

  11. A New Scholarly Information System 3 Basic Requirements • Redefine the “information unit” of scholarly communication • Create a scholarly communication system that better supports the process of research and learning • Record the “crumb trails” of the scholarly process

  12. Data (1) The new “information unit” • Documents • Text • Data • Simulations • Images • Video • Computations • Automated Analyses Aggregations

  13. (2) Process-oriented Scholarly Communication System • Decompose the traditional process (Roosendaal & Geurts) • Registration (establish intellectual priority of result) • Certification (certify quality and validity of result) • Awareness (ensure accessibility) • Archiving (ensure availability for future use) • Rewarding (means to support tenure, promotion, compensation) • But, they missed some things…

  14. (2) Process-oriented Scholarly Communication System • Add new services to the mix • Workflow • Collaborative functions (e.g., annotation, re-use) • Data mining and analysis • Preservation monitoring and migration • Expose all as network-accessible atomic services • Service discovery • Service invocation • Service aggregation, orchestration, choreography

  15. Process-orientation - workflows Ingest-oriented process Ingest to Repo Assign Access Policy Validate byte- streams Index and Register Link to Simulation Service SIP World of Services Ingest To Archive Preservation-oriented process Format Migration Make Copies Visit The Doctor Object Versioning In Repo Ingest To Archive Digital Object

  16. (3) Record the “crumb trails” • Events • Critical state transitions of information assets • Preservation-noteworthy events • Provenance • When we enable re-use and re-combination of assets, we must be able to show from whence it came • Relationships • Among information assets • Versions of an asset • Between agents and assets • Between services and assets

  17. How are current repository technologies poised?

  18. Selected repositories with notable features re: the vision • Open-source repository software • Fedora • DSpace • Installed Systems • aDORe (Los Alamos National Laboratory) • arXiv • Grid projects • Storage Resource Broker (SRB) • Chimera

  19. Fedora vs. the vision • Flexible digital object model • Services associated with digital objects • Relationships among digital objects • Relationship ontology • RDF-based metadata • Search the repository “as a graph” • Upcoming – new security architecture • Policy enforcement (XACML) • Repository policy • Object policies (fine-grained control)

  20. Fedora Repository – Web Services Web Services Exposure

  21. Fedora Objects – RDF Graph view Member Object Collection Object

  22. DSpace vs. the vision • The related Simile project is most interesting • Significance: semantic web technologies brought to the task of search and discovery across different repository systems • RDF-based search across heterogeneous metadata formats • Ontology-based • DSpace History system • Event recording • RDF-based • Opportunity in DSpace 2 • Web service exposure? • Service-based dissemination architecture?

  23. LANL’s aDORe vs. the vision • Standards-based repository architecture • OAI-PMH • MPEG21-DIDL • Open URL • Very good example of the use of simple protocols to enable modular service-based architecture • Services dynamically associated with objects

  24. 3 2 1 6 7 4 5 aDORe architecture TechReport OAI-PMH Indata.lanl.gov LANL OAI PMH A&I MPEG-21 DIP Engine DID with DIM publisher OAI PMH DID A&I APPLICATION publisher OpenURL OpenURL FTXT Registry of trans- formations Profile/ BehaviorRegistry publisher OAI PMH Ingest Pre-Ingest Repo Index OAI PMH OAI PMH Identifier Resolver CNRI handle, JAVA, C Slide courtesy of Herbert Van de Sompel

  25. arXiv vs. the vision • Progress in decomposition and distribution of traditional steps in scholarly publishing value chain

  26. arXiv – service pathways (decomposed and distributed)

  27. Selected Grid vs. the vision • SRB • Distributed, virtualized file system • Support for very large amounts of data • Data grid compatible with computational grid • Possible as backend persistent store for other repository systems (e.g., DSpace, Fedora) • Chimera • Derived data as first class information entities • Information model (Virtual Data System) • Process model (Virtual Data Language)

  28. New Technical Architecture

  29. The architecture challenge • Current situation • Heterogeneous repository systems • Heterogeneous object models (or no object model) • Multiple protocols and service APIs • Services lacking formal interface definitions • Can these resources ever play nicely together? • Need common abstractions…

  30. Publisher Repositories Document Repositories Web Resources Data Stores Databases Solution: Information Network Overlay Client Layer Information Network API NetworkRepresentation Layer Source Layer

  31. Translate to Technical Requirements • Rich information objects • Integration of local and remote sources • Mixed genre • Dynamic information objects • Integration with local and distributed services • Graph-based information model to enable overlay • Nodes are information objects • Edges are relationships among those objects • Service-oriented process model: • Coordination of information entities and services • Workflow; multi-step executions; transformations • Interoperable access and management API for objects • Fine granularity access control

  32. Pathways Project • National Science Foundation Funding 2004-2007(http://www.infosci.cornell.edu/pathways) • Van de Sompel, Payette, Erickson, Lagoze, Warner. Rethinking Scholarly Communication: Building the System that Scholars Deserve. D-Lib Magazine September 2004.

  33. Vision: “Graphite” Information Model Most things can be represented as a graph of nodes and arcs. Cornell/LANL Pathways Project

  34. Service-oriented process model • Key challenge is to integrate a distributed service model within the information network overlay. • Technologies to watch • OWL-S (W3C) • Ontology-based service descriptions • Service modeled within semantic web • Netkernel (1060research) • Enables a graph-like overlay for URI-identified resources • Information entities and services can be accommodated • Grid technologies (Open Grid Services Infrastructure) • Enables creation of ‘virtual organizations’ that can share distributed computational resources and services • Web-services and WSDL in latest incarnation

  35. The W3C’s Take on Things… • People and communities have data stores and programs to share • Vision: Expanding Web of machine accessible resources • Key Web technologies: • Web Services: Web of programs* • Standards for interactions between programs on the Web • Easier to expose and use services • Semantic Web: Web of data* • Standards for data, relationships, descriptions on the Web • Easier to Search for, Share, Aggregate, Extend information • * abstractions :-) Source: http://www.w3.org/2004/Talks/0923-sb-whoiw3c/slide12-0.html

  36. Conclusions: Implications for digital repositories

  37. Beyond Storage Must understand new scholarly activities and new technical developments… so we can frame repositories within a broader service-oriented architecture.

  38. What basic changes can occur now? • Expose repositories as web services • Support compound digital objects • Local and remote content • Any media type • Provide a way to associate services with objects (dynamic views) • Provide ability to assert relationships among objects • Move toward ontology-based metadata • Enable easy integration of repository with other services

  39. Example: Fedora Service Framework (2005-2007)

  40. Research Challenges • Enable low barrier to entry • Simple protocols (e.g., like OAI) • Light-weight (REST vs. SOAP?) • Simple tools to create overlays • Note complexity in setting up Grid-based services • Integration of information and service models • Security and Trust • Authentication and trust among repositories and services • Interoperability of authorization policy • Preservation • Distributed and dynamic resources

  41. Beyond Storage Questions and Discussion!

More Related