1 / 37

What will you need to know?

What will you need to know?. The role of metadata in keeping digital content alive. Robin Wendler, Harvard University Library November 2, 2005 r_wendler@harvard.edu. The Crystal Ball. John William Waterhouse. Private collection. Let me count the ways digital stuff goes bad….

drew
Télécharger la présentation

What will you need to know?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What will you need to know? The role of metadata in keeping digital content alive Robin Wendler, Harvard University Library November 2, 2005 r_wendler@harvard.edu The Crystal Ball. John William Waterhouse. Private collection

  2. Let me count the ways digital stuff goes bad… • Media become obsolete • Media decay • Formats are superseded • Proprietary formats may be orphaned • Hardware breaks • Software is orphaned • Encryption may hinder preservation • User requirements change

  3. How will you know? • Preservation Planning • Monitor your data through metadata for • Integrity • Renderability • Understandability • Authenticity • Identity • Responsibility • Monitor the community • Format support • Requirements

  4. What will you do? • Identify materials at risk • Analyze options • Categorize objects • Formal characteristics • Purpose • Antecedents • Communicate with owners • Perform preservation actions • Create audit trail All utilize and/or generate metadata

  5. Gradual understanding • OAIS • (1st workshop 1995; Blue Book 2002) • http://www.ccsds.org/docu/dscgi/ds.py/Get/File-143/650x0b1.pdf • NLA PANDORA (1996-) • http://pandora.nla.gov.au/index.html • CEDARS (1998-2002) • http://www.leeds.ac.uk/cedars/index.htm • NEDLIB (1998-2000) • http://www.kb.nl/coop/nedlib/ • OCLC/RLG Preservation Framework Working Group (2000-2001) • http://www.oclc.org/research/projects/pmwg/wg1.htm • PREMIS (2003-2005) • http://www.oclc.org/research/projects/pmwg/

  6. Preservation Metadata “…the information necessary to carry out, document and evaluate the processes that support the long-term retention and accessibility of digital content.” “Moving digital objects and their metadata across space and time requires standard mechanisms for encoding and exchange” – Brian Lavoie • Viewed from a preservation lens, all metadata is preservation metadata • Categories of metadata overlap; a single piece of metadata can serve many purposes

  7. OAIS Functional Model Archival Information Systems are permeated by metadata. Metadata is the difference between a repository and just files on a disk.

  8. OAIS Information Model

  9. OAIS Content Information Framework, Expanded by OCLC/RLG WG OAIS Model OCLC/RLG Extensions Still a framework, not usable, defined elements

  10. OAIS Preservation Description Information Framework Reference: provides identifiers and describes mechanisms by which id’s are assigned Context: documents relationships of content to its environment (why created, other formats, editions) Provenance: documents the history, changes, custody of content Fixity: documents data integrity checks or validation and verification keys to ensure no unauthorized changes

  11. Metadata relevant to preservation • Storage management and fixity • Technical characteristics • Structure • Provenance • Rights • Digital signature trail, where applicable • Intellectual access / description

  12. PREMIS:Preservation Metadata Implementation Strategies • Surveyed implementation of digital repositories, assessed adoption of metadata standards (2003/2004) • Defined a core set of implementable preservation metadata elements (2005) • Implementation-independent • Explicit or implicit • Not reinventing the wheel • Descriptive, rights, agents • Privilege automatically-suppliable values • Defined associated XML schemas • Set up ongoing maintenance activity • http://www.loc.gov/standards/premis/

  13. PREMIS Data Model Intellectual Entities Rights Objects Agents Events

  14. Metadata must adhere to the “right thing”: Representation: The set of files, including structural metadata, needed for a complete and reasonable rendition of an Intellectual Entity. File: A named and orderedsequence of bytesthat isknown by an operating system. Bitstream: A contiguous or non-contiguousdata within a filethat has meaningful common properties for preservation purposes. Importance of object modeling Any can express an Intellectual Entity All are kinds of Objects in PREMIS All can be affected by Events Rights adhere to all

  15. Sample PREMIS “semantic unit”

  16. objectIdentifier preservationLevel objectCategory objectCharacteristics compositionLevel fixity messageDigestAlgorithm messageDigest messageDigestOriginator size format formatDesignation formatName formatVersion formatRegistry formatRegistryName formatRegistryKey formatRegistryRole significantProperties inhibitors inhibitorType inhibitorTarget inhibitorKey creatingApplication creatingApplicationName creatingApplicationVersion dateCreatedByApplication originalName storage contentLocation storageMedium environment environmentCharacteristic environmentPurpose environmentNote dependency dependencyName dependencyIdentifier software swName swVersion swType swOtherInformation swDependency hardware hwName hwType hwOtherInformation signatureInformation signatureInformationEncoding signer signatureMethod signatureValue signatureValidationRules signatureProperties keyInformation Core Object Metadata(Yes, this is to make you sweat)

  17. Significant Properties “…objective technical characteristics subjectively considered important, or subjectively determined characteristics.” • Requires identification in advance of what’s crucial, what might be at risk, and how to codify it. Mondrian. Composition with large red plane, yellow, black, gray and blue. 1921. Haags Gemeentemuseum, Hague Monet.Waterloo Bridge, London, at Sunset, 1904Collection of Mr. and Mrs. Paul Mellon. National Gallery of Art.

  18. Rights • Different flavors • Rights • Permissions • Licenses • Submission Agreements • Multiple rights languages • XrML (eXtensible rights Markup Language) • http://www.xrml.org/ • ODRL (Open Digital Rights Language) • http://odrl.net/ • Designed to support DRM • Complex • Patent/licensing issues • PREMIS Rights • Lightweight • Focused on right to preserve • Statements, rather than DRM

  19. PREMIS Permission Statement • permissionStatementIdentifier • linkingObject • grantingAgent • grantingAgreement • permissionGranted • act • restriction • termOfGrant • startDate • endDate • permissionNote

  20. Event Metadata • Events in the life of a digital object • What was done • Who did it • When • Who authorized it • What was the outcome • General • PREMIS Events • Specific, e.g. • AES Process History

  21. PREMIS Events • Must be related to one or more objects • Can be related to one or more agents • Consist of • eventIdentifier • eventType • eventDateTime • eventDetail • eventOutcomeInformation • linkingAgentIdentifier • linkingObjectIdentifier

  22. Beyond PREMIS • Format-specific technical metadata • Detailed event metadata • Structural metadata / content packaging • Specific descriptive metadata

  23. Technical Metadata • Formally characterizes • a class of objects • an individual object • Some technical metadata applies to all formats, most is specific to a category of formats, e.g. • NISO Z39.87: Technical Metadata for Still Images • http://www.niso.org/standards/resources/Z39_87_trial_use.pdf • MIX (XML schema for Z39.87):http://www.loc.gov/standards/mix// • Audio Engineering Society Core Technical Metadata for Audio – in draft • TextMD • http://dlib.nyu.edu/METS/textmd.xsd

  24. Structural Metadata • Not only content, but also metadata and ‘binding’ must be preserved • Enables a complex object to be assembled from its constituent parts • Content, Metadata, Relationships, Behaviors

  25. Structural and Packaging Metadata • Many formats developed in different communities, e.g., • Digital library = METS • http://www.loc.gov/standards/mets/ • Commercial media = MPEG 21 DIDL • Available from ISO www.iso.org • Learning objects = IMS Content Packaging • http://www.imsglobal.org/content/packaging/ • Space data = XFDU – still in draft • http://www.ccsds.org/docu/dscgi/ds.py/GetRepr/File-1912/html • Audio-visual = Advanced Authoring Format (AAF) • http://www.aafassociation.org/html/techinfo/index.html • Television = Television Material Exchange Format (MXF) • Available from SMPTE www.smpte.org • No consolidation of formats, but dialog and mapping

  26. METS Basics • METS provides a framework for • Content files • Metadata • Descriptive • Structural • Technical • Provenance • Source • Relationships • Behaviors • Suitable for • Open Archival Information Systems • Archival information package (AIP) • Submission information package (SIP) • Dissemination information package (DIP) • Display and navigation of digital objects • Sharing of digital objects among libraries and archives

  27. RLG’s METS Viewer Structural Metadata Descriptive Metadata Behaviors Content

  28. Structure of a METS File METS metsHdr Header describing METS file itself fileSec Inventory or manifest of component files dmdSec Descriptive metadata Administrative metadata: -- technical, source, rights, provenance admSec structMap Structure map: the heart of METS structLink Structural map linking, i.e., hyperlinks behaviorSec* Executable behaviors * Less commonly used

  29. Structure Map <div LABEL=“Title page”> <div LABEL=“title page” ORDER=“1” TYPE= <fptr FILEID=“A”> </div> <div LABEL=“Preface”> <div LABEL= “page i” ORDER=“2” ORDERLABEL=“i”> <fptr FILEID=“B”> </div> <div LABEL= “page ii” ORDER=“3”> <fptr> FILEID=“C”> </div> <div LABEL=“Chapter 1”> <div LABEL=“page 1” ORDER=“4”> <fptr FILEID=“D”> </div> <div LABEL=“page 2” ORDER=“5”> <fptr FILEID=“E”>… Title page Preface page i page ii Chapter 1 page 1 page 2…

  30. Referring to Metadata METS METS does not define descriptive or administrative metadata elements. dmdSec and admSec are buckets or sockets where externally-defined metadata can be supplied or referenced metsHdr fileSec dmdSec • Three techniques: • In-line XML • Wrapped base-64 encoded data • Pointers to external information • (e.g., URNs, handles) admSec structMap structLink METS Board endorses range of recommended “extension schemas” behaviorSec

  31. Use of MODS Extension Schema for Descriptive Metadata <div LABEL=“Reports of the president and treasurer” DMDID=“D1”> <div LABEL=“Chapter 1” DMDID=“CH1”> <div LABEL=“page 1” ORDER=“3”> <fptr FILEID=“D”> <div LABEL=“page 2” ORDER=“4”> <fptr FILEID=“E”>… Book Chapter 1 page 1 page 2… <dmdSec ID=“D1” > <mdWrap MDTYPE="MODS">    <xmlData>     <mods:mods xmlns:mods="http://www.loc.gov/mods/v3" xsi:schemaLocation=http://www.loc.gov/mods/v3 …>      <mods:name>       <mods:displayForm> Radcliffe College</mods:displayForm>      </mods:name>      <mods:titleInfo>       <mods:title> Reports of the president and treasurer for...</mods:title>      </mods:titleInfo>      </mods:mods>    </xmlData>   </mdWrap> <mdRef LOCTYPE=“URL” MDTYPE=“MARC” xlink:href=http://... BNI3165”/> Catalog record

  32. Where does all this metadata come from? • Look, Ma, no hands! (as much as possible, that is…) • Don’t make people create it • Machines are faster, cheaper, more accurate • Don’t make people read it • Use controlled values • Expect bulk preservation of like objects • Artisanal preservation is not affordable • Develop and share tools to automate creation, ingest, extraction, exchange

  33. Format Identification Format Validation Well-formedness (Syntactical) Validity (Semantic) Format Characterization http://hul.harvard.edu/jhove/ Modules for AIFF ASCII BYTESTREAM GIF HTML JPEG JPEG2000 PDF TIFF UTF8 WAVE XML JHOVEJSTOR/Harvard Object Validation Environment

  34. Automatic Exposure • RLG initiative advocates for capturing standard technical metadata about digital images automatically as part of image creation: • engage manufacturers in dialog about what technical metadata their products currently capture vs what is required for digital archiving • leverage existing industry efforts • identify and evaluate tools for harvesting technical metadata and explore how those tools can scale to serve the entire community.

  35. Format Registries • Detailed documentation of how typed content is represented • Persistent, unambiguous association between public identifiers for digital formats and their documentation • Lists of systems and services which use or produce the format • Must be inclusive, detailed, rigorous, public, and sustainable • Format Registry projects: • PRONOM • http://www.nationalarchives.gov.uk/pronom/ • Global Digital Format Registry • http://hul.harvard.edu/gdfr/ • TOM • http://tom.library.upenn.edu/ • FRED – demonstration system • http://tom.library.upenn.edu/fred/

  36. Other Registries(Extant and Posited) • Registry of Digital Masters • “I will preserve this digital thing” • http://www.oclc.org/digitalpreservation/why/digitalregistry/default.htm • Profile registries • “I restrict this broader standard in the following ways” • Metadata Element/Schema registries • “I use these elements to mean these things” • http://www.xml.org/xml/registry.jsp • http://www.ukoln.ac.uk/projects/iemsr/ • Etc. • Environment registries • Hardware/software configurations in which given software is known to work

  37. Digital Information Community benefits from metadata cooperation… • Develop common understanding • Crucial metadata • Standards! • Trusted repository certification • Acceptable preservation strategies • Needs and costs • Automate capture/creation of metadata • Work with equipment manufacturers • Develop open source tools • Share burden • Monitor/document digital formats • Avoid duplicate digitization

More Related