1 / 74

Repositories and Scholarly Communication Ecosystems

Repositories and Scholarly Communication Ecosystems. Alex D. Wade Director for Scholarly Communication Microsoft External Research. A bit about me… Academic Librarian. University of Michigan Libraries. University of California, Berkeley. University of Washington.

cherie
Télécharger la présentation

Repositories and Scholarly Communication Ecosystems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Repositories and Scholarly Communication Ecosystems Alex D. Wade Director for Scholarly Communication Microsoft External Research

  2. A bit about me… Academic Librarian University of Michigan Libraries University of California, Berkeley University of Washington • Natural Sciences Library • Engineering Library • Philosophy Librarian • Systems Librarian

  3. A bit about me… Corporate Shill

  4. Microsoft Research Labs External Research Groups Technology Learning Labs Collaborative Institutes and Centers

  5. Microsoft External Research • Division within Microsoft Research focused on partnerships between academia, industry and government to advance research in fields that rely heavily upon advanced computing • Supporting groundbreaking research to help advance human potential and the wellbeing of our planet • Developing advanced technologies and services to support every stage of the research process • Microsoft External Research is committed to interoperability and to providing open access, open tools, and open technology http://research.microsoft.com/collaboration/about/

  6. Repository Trends & Predictions • Clouds (storage and computing) • Data (pick your natural disaster metaphor) • Enhanced Publications • Transparency (of Repository as a ‘place’) • Deposit • Discovery

  7. Mission • Tailor Microsoft software to meet the specific needs of the academic research community • Our approach: • Conduct applied projects to enhance academic productivity by evolving Microsoft’s scholarly communication offerings

  8. Why • Increase relevance of (current) Microsoft software • Integration • Extensibility • Interoperability • Inform future software directions • New products and features • Exposure of Microsoft Research areas • Information Retrieval • Data Mining • NLP & Entity Extraction • Machine Translation

  9. Zentity – a Research Output Repository Platform Native support for RSS, OAI-PMH, OAI-ORE, AtomPub and SWORD A semantic computing platform to store and expose relationships between digital assets Flexible data model enables many scenarios and can be easily extended over time v.1 (v.2 available later this month!) : http://research.microsoft.com/zentity/

  10. Hybrid Approach • Triple stores • Evolution friendly • Poor performance • No need to model everything in advance • Semantic interpretation at the application level • Relational schema • Evolution not so easy • Great opportunities for optimization • Model everything in advance • Zentity Platform • Maintain a balance • Try to model the frequently used entities in our app domain • Try to capture the frequently used relationships • Allow for extensibility (Relationships, Properties)

  11. Key Features • Core data model with extensibility, which can be used to create custom data models, even for domains other than Scholarly Communications • Built-in Scholarly Works data model with predefined resources • Extensive Search similar to Advanced Query Syntax (AQS) • Pluggable Authentication and Authorization Security API • Basic Web-based User Interface to browse and manage resources with reusable custom controls (Scholarly Works only) • RSS/ATOM, OAI-PMH, AtomPub, SWORD Services for exposing resource information • Extensive help with code samples extend the platform by developers

  12. Additional Features • Change history management for tracking changes to resource metadata and relationships • Various ASP .NET custom controls such as ResourceProperties, ResourceListView, TagCloud, etc. • Import/ export BibTex for managing citations • Prevent duplicates using the Similarity Match API • RDFS parser provides functionality to construct an RDF Graph from RDF XML • OAI-PMH to expose metadata to external search engine crawlers • OAI-ORE support for Resource Maps in RDF/XML • AtomPub implementation for supporting deposits to repository

  13. Zentity Stack

  14. Zentity Visual Explorer

  15. Pivot (Microsoft Live Labs)

  16. Zentity + Pivot Viewer

  17. Research Information Centre – a VRE Framework Version 1.0 (Open Source under Ms-PL): http://ric.codeplex.com/

  18. Research Information Centre Framework Collaborative environment for researchers Personal site for each researcher and project site for each project Federated search, tags, annotations, ratings, etc. Social networking, real-time communication, blogs, wikis Project site navigation and tool based on project lifecycle Version 1.0 (Open Source under Ms-PL): http://ric.codeplex.com/

  19. RIC Framework - Features • Managing a project’s life cycle. • Managing research-related information. • Facilitating Collaboration between team members and other colleagues. • Managing ongoing experiments. • Disseminating results.

  20. RIC Framework – A Sample Research Model • Generic Project tools • Calendar • Task list • RSS feeds • Alerts & notifications • Federated Search • Real-time communication • Blogs • Wikis • Plan Studies • Investigate new ideas • Search literature • Background research • Research plan • Obtain Funding • Funding sources • Application information • Conduct Research • Centralized storage • Information sharing • Project tracking • Disseminate Results • Project publications management

  21. RIC Framework – Personal Portal

  22. RIC Framework – Project Portal

  23. RIC 2.0 • Just getting started! • Goals: • More lightweight & modular • Concurrent community development • Support for Cloud deployment scenarios • First features • SharePoint/RIC  Respository deposit via SWORD • Trident Scientific Workflow Engine integration

  24. Clouds

  25. Repositories in the Cloud • We can expect digital library environments will follow similar trends to the commercial sector • Leverage computing and data storage in the cloud • Small organizations need access to large scale resources • Scientists already experimenting with Amazon S3 and EC2 services • For many of the same reasons • Little/no resource-sharing across library infrastructures • High storage costs • Physical space limitations • Low resource utilization • Excess capacity • High costs of acquiring, operating and reliably maintaining machines is prohibitive • Little support for developers, system operators

  26. Built to be interoperable • Web standards (HTTP, XML, SOAP, REST, etc.) • Programming language support • .NET SDK • Ruby SDK • Java SDK

  27. Cloud Data Centers: Economies of Scale • Data Centers range in size from “edge” facilities to megascale (100K to 1MK servers) • Offer real economies of scale • Approximate costs for a small size center (1K servers) and a larger, 400K server center. Data Center estimates from James Hamilton

  28. Windows Azure Platform Availability Northern Europe North Central USA Eastern Asia Western Europe South Central USA Southeast Asia

  29. This has happened before…

  30. Courtesy: DuraCloud

  31. Collaboration (RIC in the Cloud) Research Information Centre Business Productivity Online Suite

  32. Data

  33. Realizing Jim Gray’s Vision for Data-Intensive Scientific Discovery • Jim Gray = eScience • A Transformed Scientific Method

  34. Free PDF DownloadOr, Amazon Kindle version & paperback print-on-demand “The impact of Jim Gray’s thinking is continuing to get people to think in a new way about how data and software are redefining what it means to do science." — Bill Gates, Chairman, Microsoft Corporation “One of the greatest challenges for 21st-century science is how we respond to this new era of data-intensive science. This is recognized as a new paradigm beyond experimental and theoretical research and computer simulations of natural phenomena—one that requires new tools, techniques, and ways of working.” — Douglas Kell, University of Manchester “The contributing authors in this volume have done an extraordinary job of helping to refine an understanding of this new paradigm from a variety of disciplinary perspectives.” — Gordon Bell, Microsoft Research http://research.microsoft.com/fourthparadigm/

  35. Jim Gray’s Call to Action Listed 7 key areas for action by Funding Agencies: • Fund both development and support of software tools • Invest at all levels of the finding ‘pyramid’ • Fund development of ‘generic’ Laboratory Information Management Systems • Fund research into scientific data management, data analysis, data visualization, new algorithms and tools

  36. Jim Gray’s Call to Action (continued) Remaining three key areas for action relate to the future of Scholarly Communication and Libraries: 5. Establish Digital Libraries that support the other sciences like the NLM does for Medicine 6. Fund development of new authoring tools and publication models 7. Explore development of digital data libraries that contain scientific data (not just the metadata) and support integration with published literature

  37. A RESTful Interface for Data http://www.odata.org

  38. URL Conventions • Addressing lists and items • Presentation options http://www.odata.org

  39. OData Producers OData Consumers Web Browsers Excel 2010 LinQPad Client libraries for Javascript PHP Java iPhone (Objective C) Windows 7 Phone .NET • SharePoint 2010 • IBM Websphere • Windows Azure Table Storage & SQL Azure • Zentity 2.0 • Services: • Facebook Insights • Netflix • Open Government Data Initiative • Open Science Data Initiative • DBPedia http://www.odata.org

  40. OGDI SDK - (http://ogdi.codeplex.com/)

  41. Project Trident – a Scientific Workflow Workbench Share workflows via Author, Execute and Monitor Workflows Compose and modify workflows via drag & drop canvas View data products, performance metrics, and provenance data, and write them directly into repository Version 1.2 (Open Source under Apache 2.0 License): http://tridentworkflow.codeplex.com/

  42. Data Curation Add-in for Microsoft Excel • Microsoft Research, in partnership withCalifornia Digital Library’s Curation Center • Collaboration with Tricia Cruse & John Kunze • Part of the DataONE (an NSF DataNet Project) • Proposed functionality under consideration: • Versioning- revision history and original raw data can be protected and recovered • Time stamps - easily determine when the data were created and last updated • “Workbook builder”- select from globally shared standardized layouts for capturing data • Export metadata in a standard formats(e.g., a DataCite citation or an EML document that describes the dataset(s) in a workbook) so that researchers can readily share their data, • Globally shared vocabulary of terms for data descriptions (e.g., column names), and as needed to add new terms to the globally shared vocabulary, to enable wide collaboration between researchers • Import term descriptions from the shared vocabularyand annotate them to refine local definitions • Deposit data and metadata into a data archiveto preserve and publish research data PROPOSED

  43. Enhanced Publications

  44. GenePattern Reproducible Research Add-in Services: Connects to GenePattern database Relationships: Inline graphics are synchronized to dataset Data: Control and execute query pipelines into GenePattern Data: Resulting data (and provenance) stored within Word document Source code and binary: http://GenepatternWordAddin.codeplex.com

  45. Creative Commons Add-in for Office Intent: Insert Creative Commons licenses from within Word, Excel, PowerPoint Services: Integrates with Creative Commons Web API to create new licenses Relationships: license information stored as RDF XML within the document OOXML Source code and binary: http://ccaddin2007.codeplex.com

  46. Ontology Add-in for Word Services: Ontology download web service • John Wilbanks • Phil Bourne • Lynn Fink Intent: Term recognition & disambiguation Relationships: Ontology browser Source code and binary: http://research.microsoft.com/ontology/

  47. Article Authoring Add-in for Word Read, convert, and author NLM XML documents ORE Resource Map creation v.2 beta 3: http://research.microsoft.com/authoring/

  48. Chemistry Add-in for Word Author/edit 1D and 2D chemistry. Change chemical layout styles. • Peter Murray-Rust • Joe Townsend • Jim Downing Intent: Recognizes chemical dictionary and ontology terms Relationships: Navigate and link referenced chemistry Data: Semantics stored in Chemistry Markup Language <?xmlversion="1.0" ?> <cmlversion="3" convention="org-synth-report" xmlns="http://www.xml-cml.org/schema"> <moleculeid="m1"> <atomArray> <atomid="a1" elementType="C" x2="-2.9149999618530273" y2="0.7699999809265137" /> <atomid="a2" elementType="C" x2="-1.5813208400249916" y2="1.5399999809265137" /> <atomid="a3" elementType="O" x2="-0.24764171819695613" y2="0.7699999809265134" /> <atomid="a4" elementType="O" x2="-1.5813208400249912" y2="3.0799999809265137" /> <atomid="a5" elementType="H" x2="-4.248679083681063" y2="1.5399999809265137" /> <atomid="a6" elementType="H" x2="-2.914999961853028" y2="-0.7700000190734864" /> <atomid="a7" elementType="H" x2="-4.248679083681063" y2="-1.907348645691087E-8" /> <atomid="a8" elementType="H" x2="1.0860374036310796" y2="1.5399999809265132" /> </atomArray> <bondArray> <bondatomRefs2="a1 a2" order="1" /> <bondatomRefs2="a2 a3" order="1" /> <bondatomRefs2="a2 a4" order="2" /> <bondatomRefs2="a1 a5" order="1" /> <bondatomRefs2="a1 a6" order="1" /> <bondatomRefs2="a1 a7" order="1" /> <bondatomRefs2="a3 a8" order="1" /> </bondArray> </molecule> </cml> Intelligence: Verifies validity of authored chemistry Open Source Project (Apache 2.0 License) http://research.microsoft.com/chem4word/

More Related