140 likes | 236 Vues
This framework outlines the process for publishing oral history interviews on the web, including audio formats, transcriptions, search capabilities, and metadata cataloging. It covers cataloging options, cataloging decisions, challenges faced, and the associated costs.
 
                
                E N D
A Framework for Publishing Oral History Interviews to the Web Stephen Paul DavisDirector, Libraries Digital ProgramColumbia University OCLC Western Digital ForumAugust 2006rev. 10/2011
The Players • Columbia's Libraries Digital Program • Columbia Center for Oral History (formerly: Oral History Research Office) • Columbia's Digital Knowledge Ventures (ceased operations) • Backstage Library Works (formerly: OCLC Preservation Services) • George Blood, L.P. (formerly: Safe Sound Archive) • OCLC Digital Archive
The Characters • Bennett Cerf – publisher • Kenneth Clark – psychologist, social activist • Mamie Clark – psychologist, social activist • Moe Foner – labor activist • Andrew Heiskell – publisher • Edward I. Koch – political figure • Mary Lasker – philanthropist • John B. Oakes – newspaper editor • Frances Perkins – political figure • Frank Stanton – leader in broadcasting
The Script • Sessions: 10 interviewees in 193 individual interview sessions • Recordings: 205 hours on 170 Tapes (109 Cassettes, 53 Five-inch Reels, 8 Seven-inch Reels) • Transcriptions • 11,064 pages of typescript in 72 notebook binders • 2,644 pages in MS Word format • Related material: name indexes, biographies, tables of contents, photos
The Plot • Online audio in Real & MP3 format, both downloadable & streaming • Audio segments directly correlated with transcriptions at the paragraph level • Page images of transcriptions in PDF • OCR'd transcriptions plus TEI/XML mark up • Full-text search and retrieval • Name index entries linked back to references in text • Abstract of each interview • A general introduction • A few pictures • Rights and permissions cleared in advance
The Revised Plot • Online audio in Real & MP3 format, both downloadable & streaming • Audio segments directly correlated with transcriptions at the paragraph session level • Page images of transcriptions in PDF • OCR'd Re-keyed transcriptions plus TEI/XML mark up • Full-text search and retrieval • Name index entries linked back to references in text • Abstract of each interview • A general introduction • Three general introductory essays & a video interview with ORHO director emeritus • Ten introductions for the interviewees • A few50 pictures • Ten new, detailed tables of contents • Ten audio & text 'excerpts' to provide interview lead-ins • Rights and permissions cleared in advance • Dropped:Robert F. Wagner, Kitty Carlisle Hart, Alice Hartley Neel, Schuyler Garrison Chapin, Ed Koch (1997) • Almost dropped: Foner (bad language) • Added: Mamie Clark, Mary Lasker, Frances Perkins, John Oakes
Cataloging & Metadata Cataloging options: • Audio: the original audio collection, the complete wav files, the complete MP3 files, the segmented Real files • Transcriptions: the original typescripts and/or Word files; the converted XML files; the generated HTML files Cataloging decisions • Previous catalog records for oral history transcripts left intact under “Reminiscences of …” • New collection-level catalog record created for entire NNY site • New “analytic” catalog records created for each Notable New Yorker subsite as a component of the NNY collection site: 773 0_ |7 nnbc |a Notable New Yorkers |h [electronic resource]. |w (OCoLC65181290)
Ticket Prices • Scanning, keying & XML Markup: $12,200 • Audio transfers, file header edits, MP3 creation & media: $13,720 • Audio time coding & post-processing: $9,000 • Web site (outsource): $17,150 • Pre-production, $2,600 • Rights research & permissions, $1,000 • Web site design, $3,850 • Web programming, $7,500 • Copy editing & QA, $1,400 • XSLT Generation of HTML from METS/TEI, $2,000 • Additional site content: $12,800 • Introductory Essays, $5,700 • Tables of Contents, etc. $5,900 • Video shoot & post-production, $1,200 • Oral History Research Office Contributions: "Priceless" • Text preprocessing • Audio inventory • Rights and permissions clearances • Editorial review • Digital Library Program Contributions: “Ditto” • Project and vendor coordination • Text QC, post-processing, METS file creation • Text indexing & retrieval system (Lucene) • Application integration
Challenges 1 Problems with Rights & Permissions • Permission status uncertain • Permission withdrawn • Permission equivocal Problems with Source Material • Incomplete / outdated inventory of original media • Missing tapes, audio files • Patrons using only (single) copy of transcripts • Misnumbered pages in transcriptions • Missing pages in transcriptions Scanning & Keying Vendor / Digital Program Relations • Novelty of / unfamiliarity with oral history content • Delays in providing vendor with source material • Recognition that typescripts could not be OCR’d because of poor quality; instead 100% rekeying of originals • Clarity, interpretation, accuracy of markup specs
Challenges 2 Web Design Vendor / Digital Program Relations • Outsource design of a web site intended to be maintained afterwards in-house; • Differences in development process, methodology • Difference in “one shot” site versus ongoing collection-driven site • Differences in design “values,” e.g., aesthetics versus usability; “teaching & learning” ethos versus “easy & effective access” ethos; role of branding; • Differences in familiarity and experience with full-text / cross-text search and retrieval • Availability of time to meet & discuss issues, project management by email, deadlines, Curatorial / Digital Program Relations • Curatorial time and staffing constraints • Curatorial enthusiasm leading to requirements creep • Assumptions about feasibility of “last minute changes” Textual Issues Identity of the “master file” after online publication? • “Fixity” of transcriptions in MS Word • Retaining consistency of references / citations in paper version and in online version
Challenges III Issues Relating to the Practice of Oral History • Publishing oral history interviews reflecting older, “outdated” practice along with those reflecting current practice • Making available original, unedited audio files in conjunction with transcriptions reviewed & edited by the interviewees • Web exposure of interviews that were originally to be available onsite to scholars and researchers • Influence on current and prospective interview subjects who know that their comments will be published on the Web
The Moral (Lessons Learned) 1 • Commit to doing more planning up front than you think you need to do; • Set up a rigorous schedule of face-to-face meetings with key stakeholders even if they don't think you need to; • Make sure all content pieces are agreed to, in hand, fixed, and have clear permissions to publish before agreeing to do the project (or at least before contracting with vendors); • Oral Histories are by their nature fuzzy in their fixity; • Widows often object to their husbands' bad language long after their husbands are gone; • Keep detailed inventories of all content pieces before, during and after the project (good asset management); • Enthusiasm can often lead to scope creep;
The Moral (Lessons Learned) II • Push off non-essential scope creep to Phase 2; • Don't try to edit Emeritus' prose; • Many people don't like Realmedia / RealPlayer any more (I blame Microsoft); • Curators often have other things to do than what you're interested in having them do; • Library Digital Program staff always have other things to do than the project the curator is interested in; • If a Digital Project is successful it becomes a permanent part of your life and will always need care and feeding even if you think you're finished with it, so get used to it; • There are less expensive ways to do projects like Notable New Yorkers but not that much less expensive.