470 likes | 606 Vues
Creating Working Digital Libraries. Howard Besser UCLA School of Education & Information http://www.gseis.ucla.edu/~howard. Creating Working Digital Libraries-. Moving from Digital Collections to Digital Libraries Interoperability Importance of Standards Longevity
E N D
Creating WorkingDigital Libraries Howard Besser UCLA School of Education & Information http://www.gseis.ucla.edu/~howard
Creating WorkingDigital Libraries- • Moving from Digital Collections to Digital Libraries • Interoperability • Importance of Standards • Longevity • Best Practices for Managing Digital Projects • Some Wild Musings
Moving from Digital Collections to Digital Libraries • What’s the difference? • Recent history of Library Automation-
Developmental Stages • Experiment with methods • Build real operational systems • Build interoperable operational systems
DL DL DL DL search & presentation search & presentation search & presentation search & presentation user user Traditional Digital Library Model
DL DL DL DL search & presentation user user Ideal Digital Library Model
Developmental Stages • Experiment with methods • Build real operational systems • Build interoperable operational systems • For DL Initiatives • For OPACs • For I & A Services • For Image Retrieval
Key problems we’re facing • Discovery • Interoperability- • Longevity-
For Interoperability Digital Libraries Need Standards • Descriptive Metadata for consistent description • Discovery Metadata for finding • Administrative Metadata for viewing and maintaining • Structural Metadata for navigation • ... Terms & Conditions Metadata for controlling access...
Metadata is not just indexing terms • CBIR attributes used for retrieval on color, shape, texture, etc. • Structural attributes used for page-turning • Administrative attributes used for managing a digital work over time • IPR attributes to limit unauthorized use • Identification attributes to determine what application software is needed to view a particular digital work • Can be located anywhere
Why are Standards and Metadata consensus important? • Managing digital files over time • Longevity • Interoperability • Veracity • Recording in a consistent manner • Will give vendors incentive to create applications that support this
Why Standards? • Why do we need standards? • To make information universally available to users • facilitate sharing and interchange of information • To preserve information (make it safe from changes in hardware and software) • Standards only work if communities widely accept them, but they’re necessary for communities to work together
Serious Longevity Problems • What we know from prior widespread digital file formats • Images separating from their metadata • Inaccessibility of software needed to view an image • Inability to even decode the file format of an image
Journal Archiving • License, don’t own; may not be even able to obtain right to make archival copy • Increasingly no paper back-up at all • Usually we don’t have the important redundancy factor • Stanford’s LOCKSS Project (Lots of Copies Keeps Stuff Safe) and its problems (http://lockss.stanford.edu)
The Short Life of Digital Info: Digital Longevity Problems- • Disappearing Information • The Viewing Problem • The Scrambling Problem • The Inter-relation Problem • The Custodial Problem • The Translation Problem
The Viewing Problem • Digital Info requires a whole infrastructure to view it • Each piece of that infrastructure is changing at an incredibly rapid rate • How can we ever hope to deal with all the permutations and combinations
The Scrambling ProblemDangers from: • Compression to ease storage & delivery • Container Architecture to enhance digital commerce
The Inter-relation Problem • -Info is increasingly inter-related to other info • -How do we make our own Info persist when it points to and integrates with Info owned by others? • -What is the boundary of a set of information (or even of a digital object)?
The Custodial Problem • How do we decide what to save? • Who should save it? • How should they save it? • -methods for later access: emulation, migration, etc. • -issues of authenticity and evidence
The Translation Problem • Content translated into new delivery devices changes meaning • -A photo vs. a painting • -If Info is produced originally in digital form in one encoded format, will it be the same when translated into another format? • Behaviors
Pieces of the Solution (1/2) • -We need to insist upon clearly readable standardized ways for digital objects to self-identify their formats • -We should discourage scrambling • -We need to better understand information inter-relates to other Info, and what constitutes “boundaries” of Info objects
Pieces of the Solution (2/2) • -People and organizations wishing to make information persist need guidelines of how to go about doing it • -We need to better understand how translating from one storage or display format to another affects the meaning of a work • -We need to save the “behaviors” of a digital object, not just it’s “contents”
Metadata can be the first line of defense • Can tell you • where the file is (if you can’t find the file) • where more info about the file is (if you have the file but most other metadata has become separated) • what the file format is • what the compression scheme is • what application program and version is needed for the file
Groups Working onthe Big Longevity Problemhttp://sunsite.Berkeley.EDU/Imaging/Databases/Longevity/ • CPA Task Force • Getty “Time & Bits” Conference & follow-up • NEDLIB, CURL, Michigan • Internet Archive • Long Now
Migration/Refreshing • Impact on evidential value
Best Practices for Managing Digital Projects- • Who will your users be? • Best Practices Guidelines • Workflow and Management Issues
Why are you Managing this Information? • Organizational mission & type • Users • Uses
Think about users (and potential users), uses, and type of material/collection Scan at the highest quality that does not exceed the likely potential users/uses/material Do not let today’s delivery limitations influence your scanning file sizes; understand the difference between digital masters and derivative files used for delivery Many documents which appear to be bitonal actually are better represented with greyscale scans Include color bar and ruler in the scan Use objective measurements to determine scanner settings (do NOT attempt to make the image good on your particular monitor or use image processing to color correct) Don’t use lossy compression Store in a common (standardized) file format Capture as much metadata as is reasonably possible (including metadata about the scanning process itself) Scanning Best Practices
Digital Object Behaviors • Book example
Metadata Standards(from MOA2) • Administrative Metadata • for enhancing resource management • Structural Metadata • for reflecting internal hierarchies and relationships btwn parts • Raw/Seared/Cooked
Workflow and Management Issues- • Managing multiple image files • Persistent Identification • Making your works accessible throughout the Net
The number of variant forms of a work can be enormous • different views of the same object • different scans of the same photo • different resolutions • different compression schemes • different compression ratios • different file storage formats • different details of the same image • ...
Identification/Provenance • how to deal with different versions (browse, hi-res, medium res) derived from the same scan or different encoding schemes (TIFF, PICT, JFIF) • Vocabulary Standards to express this • VRA Surrogate Categories • CIMI's "Image Elements”
Persistent IDs--the Problem • Need to separate work ID from work location • URNs probably won’t be ready until 2003 • Becomes a business process issue when one organization maintains the resource and another organization references it (ie. licensed from vendors or managed by separate administrative structures)
More Persistent IDs--the Approach for today • PURLs • Handles • HTTP redirects • And worry about costs now and conversion costs when URNs become feasible
Data Set ManagementMore issues with referencing IDs • References for mirror sites • References for back-up sites when main site is down or bottle-necked • References for off-site copies and archival copies
Making your works accessible throughout the Net • The DLF/Mellon meeting • An administrative and political issue as much as a a technical one
Some Wild Musings- • Movement towards packages and away from MARC • The disappearance of OPACs
Containers and Packages of MetadataWarwick, not MARC • modular • overlapping • extensible • community-based • designed for a networked world to aid commonality btwn communities while still providing full functionality within each community
DC Qualifiers • allows one community to express important nuances and qualifications, while still making the basic importance available to communities with simple needs • our community can reflect alternate title, transliterated title, and main title, yet they will all be found under a simple Web search under “title”
Crosswalks • mapping btwn differing metadata structures • eliminate the need for monolithic, universally adopted standards • focus on flexibility and interoperatiblity • RDF-based metadata registries
Do we still need OPACs? • Why repeat almost identical bibliographic descriptions in each local system? • Why not store only local information locally, and link to bibliographic descriptions stored in the major utilities? • Could our acquisition systems for monographs begin to use the acquisition systems imposed on us by our parent organizations (like those for supplies)?
Creating WorkingDigital Libraries- • Moving from Digital Collections to Digital Libraries • Interoperability • Importance of Standards • Longevity • Best Practices for Managing Digital Projects • Some Wild Musings
Creating Working Digital Libraries Howard Besser UCLA School of Education & Information http://www.getty.edu/gri/standard/intrometadata/ http://www.ifla.org/II/metadata.htm http://sunsite.Berkeley.EDU/Imaging/Databases/#standards http://sunsite.Berkeley.EDU/moa2/ http://sunsite.Berkeley.EDU/Longevity/ http://purl.oclc.org/metadata/dublin_core/ http://www.gseis.ucla.edu/~howard/image-meta.html http://www.gseis.ucla.edu/~howard/Metadata/UC-May00/ http://sunsite.berkeley.edu/Metadata/sp2000.html http://www.gseis.ucla.edu/~howard/