1 / 30

Standing on the Shoulder?

Standing on the Shoulder?. Curation and the Record of Science Chris Rusbridge JISC/CNI 2006. Contents. Curation Sustainability Data resources Context Access and re-use Citation, archiving and preserving Breaking news: OAIS Review.

topper
Télécharger la présentation

Standing on the Shoulder?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Standing on the Shoulder? Curation and the Record of Science Chris Rusbridge JISC/CNI 2006

  2. Contents • Curation • Sustainability • Data resources • Context • Access and re-use • Citation, archiving and preserving • Breaking news: OAIS Review

  3. “If I have seen a little further it is by standing on the shoulders of giants” • Newton’s letter to Hooke (1676); possibly a snide remark linked to Hooke’s stature -attributed to Bernard of Chartres by John of Salisbury, 1159 (Metalogicon) • Citation of evidence base fundamental

  4. Curation • Data increasingly important as evidence • Experimental verifiability (the basis of science) • Unrepeatable observations & experiments (particularly environmental in broadest sense) • Legal, compliance & transactions • Cultural resources • For evidential value, data must be curated

  5. Curation • “Maintaining and adding value to a trusted body of digital information for current and future use”

  6. Lynch remarks • Closing the 2005 Curation Conference • 3 views of digital curation • Collection as a living thing • Whole life process, evolving object(s) • Finite process, handover to preservation

  7. Sustainability and exit strategy • Most critical resource for curation: present and future money supply! • Plan for the long term, but have a succession plan • Sustained approach not project mentality

  8. Data resource stages • Curated data is created… • Observations? Fixed! • Or Acquired… • Data brought/bought from outside • Ingest • Development • Derived, refined, combined, processed data • Potentially many stages

  9. NASA research group3 University research group1 University research group2 local decision-making body Slide from Rajendra Bose

  10. Some illustrations: UK census • 1881 census (UKDA) • Hand-written individual return forms: data conversion issue (reference form available): digitisation and access issues • 1961 census (TNA/NDAD) • First using computers to analyse (first major UK-wide computer project?); individual returns closed until 2062: data preservation issue!!! • 2001 census (ONS/CDU) • Data corrections and adjustments: curation issue

  11. Khosrow Hejazian

  12. Student databases • Glasgow: 1960s flat files • Converted to Indexed Sequential • Converted to IDMS-X ~1983 • Converted to Ingres ~1994 still current • All students since 1960s • All prior students who have returned • All General Council <100 years • Think of what has changed in that time! • Faculties, depts, grade structures, regulations… • Curation problem!

  13. Another university • Also 3rd or 4th generation system • Previous data not carried forward • Available on tapes • Let’s hope they are properly looked after, re-tensioned, metadata & documentation available… • Dataset preservation nightmare! • (Urban myth? Told by senior manager!)

  14. Curation of emails • Lots of metadata and context (RFC 822) • Often highly distributed • Split conversations • Unknown numbers of copies • Personal choice of clients • Legal requirements! • Controlled filing and controlled deletion needed…

  15. TWOMASS (Infrared) SDSS (Visual) Slide from Rajendra Bose

  16. Slide from Rajendra Bose

  17. Example… • National Virtual Observatory • Johns Hopkins press release: “Scientists working to create the NVO, an online portal for astronomical research unifying dozens of large astronomical databases, confirmed discovery of [a] new brown dwarf recently. The star emerged from a computerized search of information on millions of astronomical objects in two separate astronomical databases. Thanks to an NVO prototype, that search, formerly an endeavor requiring weeks or months of human attention, took approximately two minutes.”

  18. Context • Data meaningless without context • Linkage • Metadata of many kinds • Workflow! • Provenance • Computational lineage • Authenticity

  19. Access and re-use • Ethics and rights control access • Weak in expressing this long-term • Collaboration tools • Annotation, discussion, review • Re-use leading to change and development • “Publication” • Not just in “print” • Underlying data should be “published”, too • Citation…

  20. Citation • Needs a stable resource to cite… OWL Web Ontology Language Reference W3C Proposed Recommendation 15 December 2003 This version: http://www.w3.org/TR/2003/PR-owl-ref-20031215/ Latest version: http://www.w3.org/TR/owl-ref/ Previous version: http://www.w3.org/TR/2003/CR-owl-ref-2003081

  21. Citation… • The date alone (as in common web citation approaches) is not enough! • Cited object likely to have changed… • Citation should link to the cited object as it was! [6] The CIA World Factbook. www.cia.gov/cia/publications/factbook/. Retrieved on 8 Jan 2006.

  22. Citation needs… • An efficient way to reference and access “archived” past states of a changing dataset (work in progress, Buneman et al) • Less important for original observations • Don’t mess with those data • Less important for incremental datasets • Later stuff should not invalidate earlier • Very important for revisable datasets • Eg Genomics… datasets that result from the combined work of curators, or contain opinions or facts likely to change

  23. XML Archive at time t - 1 Relational Database Data Extractor XML Archiver XML Snapshot at time t XML Archive at time t XMLArch: System Architecture Pre-processor Version Merger Carwyn Edwards

  24. Preservation • Use preserves • Money preserves • Redundancy good, monoculture bad? • LOCKSS-type & other approaches… • Bits are fragile and robust • Don’t rely on portable media • Look after them well • Technology changes… • How fast? What impact? • Metadata matters! (Know what you’ve got)

  25. Preservation • We can’t do it alone • Collective responsibility • We can’t rely on anyone else • Institutional responsibility

  26. It’s about time… • From the very short • Good management (don’t under-estimate but don’t over-estimate) • Through the medium term • Curation: use it or lose it • Gather ye metadata while ye may! • Preservation relay • To the very long term • High commitment, high cost, high risk • Harder to do en masse

  27. OAIS • “Announcement of a Comment Period for the Five Year Review of the Reference Model for an Open Archival Information System (OAIS) Standard” • “… must be reviewed every five years and a determination made to reaffirm, modify, or withdraw the existing standard.” • “…any revision must remain backward compatible with regard to major terminology and concepts.” • “… we do not plan to expand the general level of detail” • “… reduce ambiguities and fill in any missing or weak concepts” • Make suggestions and express interest until 30/10/06 • OAIS-support@delight.gsfc.nasa.gov

  28. Are we standing on the hard shoulder (the road side) waiting for a ride? • Or are we supporting the shoulders of giants (building the evidence bases for future science)?

More Related