1 / 34

CS 502: Computing Methods for Digital Libraries

CS 502: Computing Methods for Digital Libraries. Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library. What are Digital Images?. Electronic snapshots taken of a scene or scanned from documents samples and mapped as a grid of dots or picture elements (pixels)

tacita
Télécharger la présentation

CS 502: Computing Methods for Digital Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library

  2. What are Digital Images? • Electronic snapshots taken of a scene or scanned from documents • samples and mapped as a grid of dots or picture elements (pixels) • pixel assigned a tonal value (black, white, grays, colors), represented in binary code • code stored or reduced (compressed) • read and interpreted to create analog version

  3. Four Scanning Methods Bitonal Grayscale Special Treatment Color

  4. Digital Image Quality is Governed By: • resolution and threshold • bit depth • image enhancement • color management • compression • system performance • operator judgment and care

  5. Resolution • determined by number of pixels used to represent the image • expressed in dots per inch (dpi)--actually dots/sq. inch • increasing resolution increases level of detail captured and geometrically increases file size

  6. Effects of Resolution 600 dpi 300 dpi 200 dpi

  7. Threshold Setting in Bitonal Scanning defines the point on a scale from 0 to 255 at which gray values will be interpreted either as black or white

  8. Effects of Threshold threshold = 60 threshold = 100

  9. Bit Depth • number of bits used to represent each pixel, typically 8 bits or more per channel • representing 256 (28) levels for grayscale and 16.7 million (224) levels for color example: 8-bit grayscale pixel 00000000 = black 11111111 = white

  10. Bit Depth • increasing bit depth increases the level of gray or color information that can be represented and arithmetically increases file size • affects resolution requirements

  11. Effects of Grayscale on Image Quality 3-bit gray 8-bit gray

  12. Image Enhancement • can be used to improve image capture • use raises concerns about fidelity and authenticity

  13. Effects of Filters no filters used maximum enhancement

  14. Image Editing

  15. Compression • reduces file size for processing, storage, transmission, and display • image quality may be affected by the compression techniques used and the level of compression applied

  16. Compression Variables • lossless versus lossy compression • proprietary vs. open schemes • level of industry support • bitonal vs. gray/color

  17. Common Compression Schemes • bitonal • ITU Group 4: lossless • JBIG (ISO 11544): lossless • CPC: Lossy • DigiPaper • grayscale/color • LZW, lossless • JPEG: lossy • Kodak Image Pac, “visually lossless” • Fractal and Wavelet compression

  18. Effects of JPEG Compression 300 dpi, 8-bit grayscale uncompressed TIFF JPEG 18.5:1 compression

  19. Compression Observations • the richer the file, the more efficient and sustainable the compression • the more complex the image, the poorer the compression

  20. Equipment used and its performance over time • scanners offer wide range of capabilities to capture detail, dynamic range, and color • scanners with same stated functionality can produce different results • calibration, age of equipment, and environment affect quality

  21. Equipment used and its performance over time • attributes and capabilities of monitor and/or printer are also factors • assess quality visually and computationally • use targets • control QC environment • increasing availability of software to assess resolution, tone, color, artifacts

  22. Image Capture: Create digital objects rich enough to be useful over time in the most cost- effective manner.

  23. How to determine what’s good enough? • Connoisseurship of document attributes • Objective characterizations • Translation between analog and digital • measurement to scanning requirement to corresponding image metrics • e.g., detail sizeresolution MTF • tonal range bit depth signal-to-noise ratio

  24. Case Study • Brittle Books--printed text, use of metal type, commercial publishers, objective measurement, use of Quality Index from micrographics • 600 dpi 1-bit capture adequately preserves informational content of text-based materials

  25. Ensuring Full Informational Capture: “No More, No Less” desired point of capture image quality and utility cost

  26. Create One Scan To Serve Multiple Uses • Derive alternative formats/approaches to meet current and future information needs • Base “derivative” requirements on document attributes, technical infrastructure, user requirements, and cost • Understand technical links affecting presentation and utility of derivatives

  27. User Requirements • completeness • legibility • speed of delivery • “cooked” files

  28. Derivatives from a Digital Master • the richer the image, the better the derivative • a derivative from a rich file is superior in quality to one from a poorer scan • the richer the image, the better the image processing

  29. monitor: 800 x 600 pixels 800 600 document at 60 dpi 480 pixels x 600 pixels 2,000 pixels 1,600 pixels document at 100 dpi 800 pixels x 1,000 pixels document: 8” x 10”, 200 dpi (1,600 x 2,000 pixels)

  30. Compression/File Format Comparison for Derivative Files GGIF Compressed 6:1 (NARA) 6:1 (NARA) JPEG Compressed 20:1 ( LC) Compressed 20:1 (LC) TIFF Uncompressed

  31. Alternatives for Displaying Oversize Images • File formats and compression schemes that support multi-resolution image delivery, e.g., wavelet compression, GridPix, Flashpix • User tools for representing scale (Blake Project ImageSizer, java applet), and improving image quality

  32. Recommendations Coalescing • Intent of conversion drives decisions • issues of access considered at conversion • notion of long-term utility and cross-institutional resources gaining ground • Access images will change with: • changing user needs and capabilities • changes in technologies: file formats, technical infrastructure,compression, web browsers, processing programs, scaling routines

More Related