210 likes | 362 Vues
A Digital Library Repository Utilizing the Open Archives Initiative. Developed to meet the needs of UTK Library Special Collections. The Problem :. Tremendous quantities of valuable information exist in Museums, Libraries, and Research Centers
E N D
A Digital Library RepositoryUtilizing theOpen Archives Initiative Developed to meet the needs of UTK Library Special Collections
The Problem: Tremendous quantities of valuable information exist in Museums, Libraries, and Research Centers which are not available in a standardized format via centralized search engines How to make the connection??? Musical scores and sound tracks Historical Documents Theses and Dissertations Scientific records Mathematical findings Photos and videos
The Open Archives Solution: • Translation of records: Into a Common Format and Language: XML & Unqualified Dublin Core • Storage: of these translations • Response:to a standardized set of queries • Gather document descriptions from Repositories into large databases, using OAI Harvesters • Set up search engines to offer up information in these databases
Required For Translation: • Understanding of XML and XML schemas • Determining correct mapping of information to Unqualified Dublin Core Elements, in order to translate legacy files into a metadata format supported by the Open Archives Initiative • Scripts to reduce the labor of translation Musical scores and sound tracks Theses and Dissertations Historical Documents Scientific records Mathematical findings Photos and videos
The 15 elements of Dublin Core Unqualified: Content: Title Description Coverage Relation Source Subject Type Intellectual Property: Contributor Creator Publisher Rights Instantiation: Date Format Identifier Language A Common Language…. Dublin Core
The XML schema constrains each element of the document, providing rules and framework for parsing: <complexType name="dublincoreType"> <choice minOccurs="0" maxOccurs="unbounded"> <element name="subject" minOccurs="0" maxOccurs="unbounded" type="string"/> </choice> </complexType> </schema> A Common Framework: XML schemas
From a TEI Lite SGML file segment: <PROFILEDESC><TEXTCLASS><KEYWORDS> SCHEME="LCSH"><LIST> <ITEM>Letters</ITEM> <ITEM>CherokeeIndians—Claims against</ITEM> <ITEM>Tennessee</ITEM></LIST></KEYWORDS> </TEXTCLASS></PROFILEDESC></TEIHEADER> To an Unqualified Dublin Core XML file segment: <subject> Letters</subject> <subject>CherokeeIndians Claims against</subject> <subject>Tennessee</subject> A Common Format…. XML
Selected Portions of a TEI-Lite SGML record <TEIHEADER> <FILEDESC> <TITLESTMT> <TITLE>[Letter] July 8, 1839, Washington City DC, [to] HP King, Qualla Town / William Holland Thomas: a machine-readable transcription of an image</TITLE>… <AUTHOR>Thomas, William Holland</AUTHOR> … <PUBLISHER>The University of Tennessee Libraries</PUBLISHER> <IDNO>wt025</IDNO>… <AVAILABILITY><P>This work is the property of the Special Collections Library, University of Tennessee, Knoxville, TN. It may be used freely by individuals for research, teaching, and personal use as long as this statement of availability is included in the text.</P></AVAILABILITY></PUBLICATIONSTMT> <SOURCEDESC><BIBL>… <DATE VALUE="1839-07-08">July 8, 1839</DATE>… <NOTE TYPE="summary">This document is a letter dated July 8, 1839 to H.P. King from William Holland Thomas with instructions for running the Indian Store. </NOTE> … <PROFILEDESC> <TEXTCLASS> KEYWORDS SCHEME="LCSH"><LIST> <ITEM>Cherokee Indians</ITEM> <ITEM>Government relations</ITEM> </LIST> /KEYWORDS></TEXTCLASS></PROFILEDESC>… <TEXT><BODY><DIV1 TYPE="letter">…
… Translated to XML Unqualified Dublin Core <title>[Letter] July 8, 1839, Washington City DC, [to] HP King, QuallaTown</title> <contributor>The University of Tennessee Libraries, Knoxville</contributor> <contributor>Southeastern Native American Documents Collection (GALILEO (Georgia statewide project)) GAGAL</contributor> <creator>Thomas, William Holland</creator> <publisher>The University of Tennessee Libraries</publisher> <date>July 8, 1839</date> <description> This document is a letter dated July 8, 1839 toH.P. King from William Holland Thomas with instructions for running the Indian Store.</description> <identifier>Document ID: wt025</description> <identifier>http://www.helios.dii.utk.edu/oai/sgm/00178.html <subject>Cherokee Indians</subject> <subject>Government relations</subject> <rights> This work is the property of the Special Collections Library, University of Tennessee, Knoxville, TN. It may be used freely by individuals for research, teaching, and personal use as long as this statement of availability is included in the text. </rights> <type>letter</type> <type>computer file</type>
Translation Tools: Crosswalks available: MARC to DC:http://www.loc.gov/marc/dccross.html Shown in action at: http://alcme.oclc.org/marc2dc/index.html OTHERS: http://www.sinica.edu.tw/~metadata/tool/mapping-foreign.html http://www.lub.lu.se/tk/metadata/MDin9612.html http://www.getty.edu/research/institute/standards/intrometadata/3_crosswalks/index.html
The Open Archives Solution: • Translation of records: Into a Common Format and Language: XML & Unqualified Dublin Core • Storage: of these translations • Response:to a standardized set of queries • Gather document descriptions from Repositories into large databases, using OAI Harvesters • Set up search engines to offer up information in these databases
Storage of OAI Records MySQL: small, fast, and free: http://www.mysql.com Use scripts to load database and retrieve information Store entire records, already marked up in Unqualified Dublin Core, for quick response; …or Store fields untagged, multiple values for a field separated by tags, and retag upon request: flexibility. This structure allows for a record to be entered once and retrieved in various formats upon request. For local search engines, also store hardcoded xml files in a directory. $sth = $dbh->prepare("select listit from $set where date <= '$until' and date >= '$from' order by id"); mysql> create table gsm( -> id char(10) not null, -> primary key (id), -> date char(10), -> path char (80), -> listit text);
The Open Archives Solution: • Translation of records: Into a Common Format and Language: XML & Unqualified Dublin Core • Storage: of these translations • Response:to a standardized set of queries • Gather document descriptions from Repositories into large databases, using OAI Harvesters • Set up search engines to offer up information in these databases
Response: Offer up document descriptions via a standardized set of queries & responses: the Open Archives Initiative Protocol • 6 Verbs, with 5 required and/or optional arguments • 2) Unique Identifiers, Optional Sets, and Metadata Prefixes • 3) Flow control & Resumption Tokens • 4) Error Codes
Verbs and arguments: The Open Archives Protocol • Identify • ListSets • ListMetadataFormats: optional: identifier • ListIdentifiers: required: metadata prefix (oai_dc); optional: from, until, set, resumption token • ListRecords: required: metadata prefix (oai_dc); optional: from, until, set, resumption token • GetRecord: required: identifier and metadata prefix
Identifiers, Sets, and Metadata Prefixes Current Sets: Input as "Set": Sample Identifiers: har che civ etd emn ead gsm ldr rth tdh vid oai:tkn:har/har0001 oai:tkn:che/che0003oai:tkn:civ/civ0001 oai:tkn:etd/etd0002oai:tkn:emn/emn0001oai:tkn:ead/ead0003oai:tkn:gsm/gsm0045oai:tkn:ldr/ldr0002oai:tkn:rth/rth0034oai:tkn:tdh/tdh0005 oai:tkn:vid/vid0001 Bessie Harvey Collection Cherokee Civil War Collection Electronic Theses and Dissertations Emancipator Encoded Archival Description Great Smoky Mountains Library Development Review Roth Photography Collection Tennessee Documentary History Videos Supported Metadata prefix: oai_dc
Flow Control and ResumptionTokens For ListIdentifiers, ListSets and ListRecords <resumptionToken> LRrtdc20f19990202u20020101 </resumptionToken> LR or LI for ListRecord or ListIdentifier rt: Number or letter combination: which set next dc: Metadata format 20: Which record number to start with this time f19990202 = From date 1999-02-02 U20020101 = Until date 2002-01-01 Specifies the call to the database when this Resumption token is returned!!
Error Codes: version 2.0 badResumptionToken badVerb badArgument idDoesNotExist cannotDisseminateFormat noMetadataFormats noRecordsMatch noSetHierarchy
OAI 1.1 Test interface and Local Search Engine: http://oai.sunsite.utk.edu/1.1.html Search by: word or phrase Searching by all or any field and set, Sorting by date or set Returning: Lists of identifiers or short file descriptions, each with links to full file in HTML, XML, and online document Musical scores and sound tracks Historical Documents Theses and Dissertations Videos and Photos Scientific records Mathematical findings
The Open Archives Solution: • Translation of records: Into a Common Format and Language: XML & Unqualified Dublin Core • Storage: of these translations • Response:to a standardized set of queries • Gather document descriptions from Repositories into large databases, using OAI Harvesters • Set up search engines to offer up information in these databases
More Information: www.openarchives.org CrossWalks: http://www.sinica.edu.tw/~metadata/tool/mapping-foreign.html http://www.lub.lu.se/tk/metadata/MDin9612.html http://www.getty.edu/research/institute/standards/intrometadata/3_crosswalks/index.html Pre-developed repositories, harvesters, search engines, and more: http://www.openarchives.org/tools/tools.html Current Service Providers, who can offer searches of your records from your repository responses; http://www.openarchives.org/service/listproviders.html