1 / 63

Down and Dirty Digitization: Everything you need to know about putting content online

Down and Dirty Digitization: Everything you need to know about putting content online. Roy Tennant California Digital Library. Outline. Project Planning Selecting Material to Digitize Digitization Purpose Basic Imaging Principles Capturing Images Editing Images Best Practices

rane
Télécharger la présentation

Down and Dirty Digitization: Everything you need to know about putting content online

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Down and Dirty Digitization:Everything you need to know about putting content online Roy Tennant California Digital Library

  2. Outline • Project Planning • Selecting Material to Digitize • Digitization Purpose • Basic Imaging Principles • Capturing Images • Editing Images • Best Practices • Conversion to Text • Metadata • Access Systems • Skills Required of Staff • Preservation

  3. Project Planning • Who will do the work? • What systems will be required? • What are the specifications for images and metadata? • How much will the project cost? • Who will own and manage the digital products that will be produced? Steve Chapman, from Handbook for Digital Projects, NEDCC

  4. Selecting Material to Digitize • Publishing rights • Available support/funding opportunity • Critical mass • Uniqueness • Reputation • Audience and potential use • Diversity of material type • Ability to stand on its own and fit in with other collections

  5. What Do We Preserve? • The body or the soul? • The artifact • The intellectual content • How do we decide that the artifact has preservation value? • Who decides?

  6. The Artifact • The “look and feel” • The experience of interacting with a specific object • Consequences: • Choices for providing access are limited • Time and money spent on recreating the artifact may be better spent on increasing access • In some cases, preserving the look and feel actually harms other uses

  7. Written Material • Handwritten texts (diaries, etc.), or those with handwritten notations (manuscript drafts, etc.) can easily be considered to have artifactual value • But how much artifactual value do printed texts have? • And born-digital texts? • What’s it worth to you?

  8. “If the goal of preservation is persistent utility, then functionality rather than aesthetics should drive system design.” — Stephen Chapman, “Content Follows Form: Preservation via Systems Design, Microform & Imaging Review

  9. Persistent Utility • Form must be allowed to be altered or destroyed to retain or enhance function • If function cannot be retained or enhanced, then form should be preserved

  10. Considerations for Retaining Items in Original Format • Age • Evidential value • Aesthetic value • Scarcity • Associational value • Market value • Exhibition value

  11. “The issue is not to evaluate the artifact per se to determine what survives and what does not…The issue is the need to agree on a method for interrogating the individual artifact, that would, in a climate of finite resources, help make a good decision about whether and how to preserve it.” — Council on Library and Information Resources, The Evidence in Hand: the Report of the Task Force on the Artifact in Library Collections

  12. How Do We Preserve It? Preservation costs by method calculated by the Library of Congress Preservation Directorate

  13. Types of Materials Printed text/ Simple line art Mixed Halftones Manuscripts Continuous Tone From Anne Kenney, et.al., Moving Theory into Practice

  14. Benchmarking • The process whereby you determine your digitization requirements using the material you will digitize

  15. Resolution The number of pixels in a given area defines the resolution of an image One pixel 1” 500 x 1,000 pixels

  16. Dynamic Range (bit-depth) 1 bit 8 bit grayscale 8 bit color 24 bit color (GIF) (GIF) (JPEG) 1 bit = black or white 8 bits = 256 shades 16 bits = thousands 24 bits = millions 36 bits = billions

  17. RGB Color Space 8 bits per channel = 24 bit color image Color Channels Red Green Blue 12 bits per channel = 36 bit color image

  18. Image Compression • Lossless — the image is unchanged after compression (no image data is lost) • Typical file size: 50% of original • Example: LZW compression • Lossy — the image is altered after compression (image data is lost) • Example: JPEG

  19. TIFF • Tagged Image File Format • Most often used to save “master versions” of images (unedited) • Can be compressed or uncompressed

  20. Compuserve GIF • Graphic Interchange Format (GIF) • Maximum 8 bits/pixel: 256 colors (shades) • Good for: • Text and line art • Thumbnails • Not good for: • Full-color pictures • Anything that requires more than 256 colors

  21. JPEG • Joint Photographic Engineers Group • JPEG is actually a compression scheme; the image file format is JFIF (JPEG File Image Format) • Good for: • Full-color pictures • Anything that requires more than 256 colors • Not good for: • Text or line art

  22. New Image Formats • Portable Network Graphics (PNG) - from the W3C to replace the Compuserve GIF format and provide more capabilities • JPEG2000 - An upgrade of the JPEG format • Flashpix - from a consortium of commercial companies, to provide much higher-resolution images in a way that allows speedy network delivery • MrSID - From LizardTech, good for large format materials (maps, panoramic photos, etc.)

  23. Capturing Images • Technologies • Digital Cameras • Flatbed Scanners • Film Scanners • Kodak PhotoCD • Outsourcing • Standards and Best Practices

  24. Digital Cameras Phase One PowerPhase FX 10,500 x 12,600 pixels, 760MB (48 bit RGB) BetterLight Super6K 6,000 x 8,000 pixels, 136MB (24bit RGB) $16,990

  25. Flatbed Scanners • Minimum requirements: • 600 X 1200 dpi optical resolution • 36-bit color • Not for slides or transparencies, best for 81/2”x11” or 81/2”x14” originals • Sheet feeder (often optional) helpful for digitizing text

  26. Film Scanners • For 35mm slides and negatives;others available for larger formats • $600 - $3,000 • Most around 2700-4000 dpi,30-36 bit color

  27. Kodak PhotoCD • Take pictures with a normal camera, but have your pictures “developed” onto a PhotoCD • A proprietary image format: ImagePAC, but very high resolution (4 different resolutions)

  28. Outsourcing: Pros and Cons • Benefits: • No ramp-up costs (both time and money) • Probably higher quality, at least to begin with • High volume capability • Drawbacks: • May be more costly if you have underutilized staff time • No internal capability or experience developed (that is, when the money runs out, so does your chance to do anything more) • Rare items may require in-house digitization

  29. Outsourcing: How • Write an RFQ (Request for Quote) outlining: • Type and amount of material being digitized • Quality requirements • Volume per unit of time requirements • For RFQ guidance and samples, see RLG Tools for Digital Imaging: • www.rlg.org/preserv/RLGtools.html

  30. Digital Image Work Flow Rotate, Crop, Retouch, Brightness/ Contrast Resize, Sharpen Original TIFF or PCD 10-100+MB JPEG 100K GIF 10K Indexed Color Space RGB Color Space Stored offline Stored online

  31. Editing Images • Rotating • Cropping • Retouching • Adjusting • Resizing • Sharpening • Saving

  32. Image Editing Demonstration

  33. Conversion to Text • Optical Character Recognition (OCR) software is required (Caere OmniPage Pro, Xerox TextBridge, etc.) • Quality and typography of originals is key • Less than 99.5% accuracy is less expensive to have re-keyed offshore • For some applications, uncorrected text is sufficient

  34. Imaging Best Practices • General guidelines for archival versions: • Photos, illustrations, maps, etc.: • 300-600dpi • 24-36 bit color • B/W Text document: • 300-600dpi • 8 bit grayscale • Negatives and Slides: • 2000-4000 pixels in longest dimension • 24-36 bit color for color; 8 bit grayscale for B/W

  35. Imaging Best Practices “The key to image quality is not to capture at the highest resolution or bit depth possible, but to match the conversion process to the informational content of the original, and to scan at that level--no more, no less.”— Moving Theory Into Practice

  36. Metadata: Types • Structured description of an object or collection of objects • Three basic types: • descriptive - e.g., title, creator, subject - used for discovery • administrative - e.g., resolution, bit depth - used for managing the collection • structural - e.g., table of contents page, page 34, etc. - used for navigation

  37. Metadata: Appropriate Level • Collection-level access: • Discovery metadata describes the collection • Example: Archival finding aid encoded in SGML; see http://www.oac.cdlib.org/ • Item-level access: • Discovery metadata describes the item • Example: individual metadata records for each item; see http://jarda.cdlib.org/cgi-bin/imagesearch.pl

  38. Collection Level Access Images Individual Finding Aid Search Interface (Library catalogor dedicated) Individual Finding Aid

  39. Item Level Access Finding Aids Images Search Interface (Dedicated)

  40. jarda.cdlib.org/search.html

  41. Metadata: Granularity • <name>William Randolph Hearst</name> • <name> <first>William</first> <middle>Randolph</middle> <last>Hearst</last></name> • Consider all uses for the metadata • Design for the most granular use • Store it in a machine-parseable format

  42. Metadata: Qualification • <name role=“creator”>William Randolph Hearst</name> • <subject scheme=“LCSH”>Builder -- Castles -- Southern California</subject>

  43. Metadata: Machine Parseability • The ability to pull apart and reconstruct metadata via software • For example, this: • Can easily become this: <name> <first>William</first> <middle>Randolph</middle> <last>Hearst</last></name> <DC.creator>Hearst, William Randolph</DC.creator>

  44. Metadata: Standards • Metadata: • Collection Level: • Encoded Archival Description (EAD) - lcweb.loc.gov/ead/ • Item Level: • MARC • Dublin Core - purl.org/DC/ • MODS - www.loc.gov/standards/mods/ • Harvesting: • Open Archives Initiative, www.openarchives.org

  45. Access Systems • Exhibit • Browse • Search

  46. Access Systems: Exhibit • Goals: • Inviting • Easy to navigate • Highlight selected parts of a collection • Teach • Requirements: • Great graphic design • Informative and succinct commentary • Interesting subject matter

More Related