1 / 44

The Cost of Archiving: The AILLA Perspective

The Cost of Archiving: The AILLA Perspective. Susan Smythe Kung, PhD skung@austin.utexas.edu www.ailla.utexas.org 3rd INNET Conference “Costing and sustainable finding of endangered language archives” April 29, 2014. Increased number of deposits due to:.

Télécharger la présentation

The Cost of Archiving: The AILLA Perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Cost of Archiving:The AILLA Perspective Susan Smythe Kung, PhD skung@austin.utexas.edu www.ailla.utexas.org 3rd INNET Conference “Costing and sustainable finding of endangered language archives” April 29, 2014

  2. Increased number of deposits due to: Increased awareness of need to preserve primary language materials Increased awareness of AILLA “New” requirement (of US federal funding agencies) for a Data Management Plan (DMP) – NSF requirement since Jan. 2011.

  3. 3 Parts Part 1 – AILLA’s background Part 2 – The costing exercise Part 3 – AILLA’s administrative costs

  4. Part 1: AILLA’s Background

  5. AILLA is a digital repository of multimedia resources in and about the indigenous languages of Latin America. It is a small, special collection within the Benson Latin American Collection at UT-Austin. • The collections consist of • linguistic primary source field data such as field notes, audio and video recordings, photos and sketches in a wide range of genres (stories, myths, chants, songs, conversations, prayers, rituals, etc.) • analyzed data such as grammars, dictionaries, ethnographies, and manuscripts.

  6. AILLA’s Mission: • Preservation: To preserve irreplaceable materials in and about indigenous languages of Latin America, especially primary source field data of the type that has traditionally not been publicly available. • Access: To make these materials and/or their metadata available to everyone, especially indigenous people, over the Internet.

  7. History: • Founded as a joint project between College of Liberal Arts (COLA) and the University of Texas Libraries (UTL) by • Joel Sherzer, Anthropology • Anthony Woodbury, Linguistics • Mark McFarland, UTL Digital Initiatives • Project began in 2000 with seed money from COLA. • Pilot site launched March 2001. • Permanent site launched Jan. 31, 2003. • Repository and website upgrade to take place 2015-2017 (we hope!).

  8. Today: • Jointly supported by the COLA and UTL. • Part of LILLAS Benson Latin American Studies and Collections • Located inside the Nettie Lee Benson Latin AmericanCollection on the campusof the University of Texas at Austin

  9. AILLA Collection Statistics:(stats as of August 29, 2014) • 298 languages • 22 Latin American countries • 12,796 resources • 100,041 media files • 19,294 audio recordings (6,773 hrs, 14 min, 18 sec) • 2,373 video recordings (1,215 hrs, 34 min, 23 sec)

  10. AILLA CollectionStatistics (cont’d): • 5,302 digital texts (97,580 pages) • 38,491 scannedpages • 4,331 images • Only 20% restricted Access • 1.8 TB • 138 Depositors • Over 5,000 registeredusersfromallovertheworld

  11. AILLA Staff: • Full-time Manager – Susan Kung(supported by COLA & UTL) • 2 Graduate Research Assistants, 20 hrs/wk ea.(supported by grant-funded projects)

  12. Work is also done by: • UTL Digital Library Services Staff provide server management and minimal technical support – their salaries do NOT come out of the AILLA budget. • Undergraduate Interns(paid university stipends; independent research credit; volunteer) • MLIS/MSIS Capstone (thesis) projects • Volunteers

  13. AILLA’s Costs: • Digitization, curation, ingestion – Part 2 • Data and Metadata storage – Part 2 • Administration – Part 3 • Software development and maintenance – Not covered here • Data and metadata migration – Not covered here

  14. Part 2: The Costing Exercise Collection 1: Analog Collection 2: Born-Digital

  15. Collection 1 Analog Contents: • 20 audio cassettes, each 60 min. long, in good condition (unknown number of recording events) • Metadata spreadsheet for recordings on cassettes • 5 transcriptions, hand-written, (unknown # of pages) • 200 photographs on photo paper + paper list of photo contents • Collection size (after digitization) = 100 GB

  16. Additional specifications needed at AILLA for Collection 1: • Q1: How many different speech events are on each tape? Our preference is to separate different speech events into separate resources. • I’ll assume there are 3 narratives (of about 10 minutes) per side for a total of (3x2x20) 120 speech events. • Q2: How long are the transcriptions? • I’ll assume they are about 25 pages each, for a total of 125 scanned pages. • Q3: How many research participants were involved? • I’ll assume that there were 10 participants.

  17. A resource is AILLA’s term for an organized bundle or set of related files. A resourcemight consist of • just 1 file, e.g., a single mp3 audio file of a recorded narrative, or • numerous files, e.g., simultaneous audio and video recordings of a speech event, plus an Elan transcription, or a semester’s worth of recorded lectures about indigenous languages, plus the class syllabus and handouts.

  18. Collection 1 Audio: Required Tasks • Digitize the cassettes: each side of each cassette = 1 wav file; total wavs = 40; file names = tape1_sideA, etc. • Edit the wave files into individual speech events and assign AILLA IDs: assuming 3 speech events per side, total speech events = 3x2x20 = 120 wav files • Convert wav (archival) files to mp3 (access) files • Add AILLA IDs to the metadata spreadsheet and collect additional metadata about each speech event, e.g., length of wav, recording specifications, original source, etc.

  19. Collection 1 Paper Transcriptions: Required Tasks • Scan each page and create 5 multi-page tif files. Simultaneously assigned AILLA ID as filenames. • Add row to spreadsheet for the AILLA ID & Metadata. • Convert tif (archival) files to pdf/a (access) files.

  20. Collection 1 Paper Photos: Required Tasks • Scan each photo and create 8 multipage tif files of 25 photos each; assign AILLA IDs. • Add AILLA IDs, photo contents from paper list, and other metadata to the MD spreadsheet • Convert tif (archival) files to jpg (access) files.

  21. Collection 1: Ingestion Required Tasks • Create a collectionforthedepositor • Addall of theresearchparticipants to AILLA’s “peopledatabase” – assume 10 participants • Uploadall files to the server (100GB) • Thesesteps are done together, butconsecutively: • Create 121 AILLA resources (120 speechevents & 1 photoresource), • Link therelevant files, • Enterthemetadata & assignaccesslevel, and • Complete Spanish (or English) translations.

  22. Collection 1: Total One-time Cost = $1,922.34

  23. Collection 1: Recurring Cost = ??? • Yearly server storage for 100 GB = $66/yr • Future file conversion when/if archival and access formats change = ???? • Future upgrades of digital repository and asset management software = ??? • Future file and metadata migration when repository and asset management software upgrades = ???

  24. Collection 2 Born-Digital Contents: • 150 audio wav files, average length = 15 min. • 20 video mp4 files, average length = 30 min. • 250 digital images • 120 eaf files (20 for video, 100 for audio) • Metadata spreadsheet listing contents of all files • Collection size = 150 GB

  25. Additional specifications needed at AILLA for Collection 2: • Q1: How many research participants were involved? • Again, I’ll assume that there were 10 participants. • Q2:What is the file format of the digital images? • I’ll assume that it is jpg

  26. Collection 2: Required Tasks for Digital Collections • Massage the metadata (study its organization, rearrange as necessary, add missing info) • Rename files w/ AILLA IDs: • Rename audio and video files and add the AILLA IDs to the MD spreadsheet; • Match each eaf file to its corresponding audio or video file, assign the appropriate related AILLA ID, rename the file, and rearrange MDS if necessary. • Create mp3 access copies from the wav files.

  27. Collection 2: Ingestion Required Tasks • Create a collectionforthedepositor • Addall of theresearchparticipants to AILLA’s “peopledatabase” – assume 10 participants • Uploadall files to the server (150GB) • Thesesteps are done together, butconsecutively: • Create 127 AILLA resources (150 audio, 20 video & 1 photoresource), • Link therelevant files, • Enterthemetadata& assignaccesslevel, and • Complete Spanish (or English) translations.

  28. Collection 2: Total One-time Cost = $1,910.15

  29. Collection 2: Recurring Cost = ??? • Yearly server storage for 150 GB = $99/yr • Future file conversion when/if archival and access formats change = ??? • Future upgrades of digital repository and asset management software = ??? • Future file and metadata migration when repository and asset management software upgrades = ???

  30. Price List Categories • Digitization of analog media and digital video transfer (all formats except mp4, mpeg, mpg) • Curation & organization • File conversion • Ingestion (file upload, collection creation, participate metadata entry, resource creation and metadata entry) • Server storage fees

  31. Category 1: Analog Media and video transfer Part I

  32. Category 1: Analog Media and video transfer Part II

  33. Category 2: Curation & Organization

  34. Category 3: File Splitting and Conversion

  35. Category 4: Ingestion

  36. Category 5: Storage FeesI haven’t quite figured out how to calculate this charge. I think it’s better to charge a flat fee up front (which can be written into a grant budget), but I want to hear the results of our DELAMAN discussion.

  37. Part 3: AILLA’s Administrative Costs

  38. 3 Areas that fund AILLA’s Administrative Costs: • Institutional Support • Grants – Direct Costs • Grants – Indirect Costs A 4th Area—the AILLA endowment, which was, and still is, built from monetary donation to AILLA– will cover some costs (to be determined) in the future, but it has not been accessed yet.

  39. Institutional Support covers: • Manager’s salary & fringe (UTL & COLA) • Office space & some furniture (UTL) • Phone service (COLA) • Electricity (UT) • Manager’s travel for professional development (UTL & COLA) • Computer ITS – COLA • Server ITS – UTL

  40. Direct Grant Costs (currently) cover: • Manager travel to get collections & to make presentations about them at conferences • 2 GRAs: salary, fringe & tuition remission • Depositor/collaborator trips to AILLA • Shipping • Some server costs

  41. Direct Grant Costs have covered (past): All of the above, plus: • PC and Mac computers and laptops • Scanners – 2 flat bed, 2 ADF • Software – digitization and conversion • Audio equipment (tape cassette decks, MD deck, reel-to-reel players) • Workshops organized by AILLA (including travel for invited participants)

  42. Indirect Grant Costs cover: • Computers for administration, digitization and ingestion • Other computer accessories – sound cards, storage media, printers • Software – both administrative and for digitization and conversion. • Equipment repair (e.g., cassette decks, reel-to-reel players) • Office supplies (paper, printer ink, pens, pencils, sticky notes, paper clips, etc.) • Printing - AILLA brochure, business cards • Shipping • Visitor expenses (e.g., lunches, parking) • Manager’s membership dues • Administrative cloud storage • Some office furniture

  43. Operating Budget Counting direct and indirect costs from AILLA’s grants, our operating budget is about $75,000. This # does not include the administrative costs that are provided by UT-Austin. BUT, we have a data backlog of approximately 3 years because we do not have time to process the unsolicited deposits because we are so busy with our “solicited” deposit ( our DEL grant to archive Terrence Kaufman’s collection).

  44. Thank you! www.ailla.utexas.org Please send comments or questions to ailla@ailla.utexas.org or skung@austin.utexas.org

More Related