190 likes | 306 Vues
This document explores the critical role of metadata in information retrieval and the organization of social processes in academic self-archiving. It highlights the crises facing authors with informal and formal archiving practices, the necessity of improved infrastructure, and how better metadata provision can facilitate broader access and usability of archived materials. The proposed improvements include standardizing metadata formats, implementing a basic model for metadata, and ensuring controlled quality. Collaboration among intermediaries is essential for effective implementation in all disciplines.
E N D
New Century, New Metadata Thomas Krichel http://openlib.org/home/krichel University of Surrey, Hitotsubashi University and Long Island University
Why Metadata • Fun • Information retrieval • Support organization of social process
Crisis of Author Self-archiving • Formal archiving • Small • Metadata poor • Informal archiving • Information retrieval difficult • Lack of support infrastructure
Improving formal archiving • Strengthen the metadata provision • Broaden the mission of archiving • Allow usage of archived material in many user services • Better report on archive material usage • Strengthen the relationship with overlay services
Improving Informal Archiving • Build standardized metadata supply format • Harvest that metadata into larger digital libraries • Offer archival backup for papers
Metadata to Support Self-archiving • Simple to compose • Intuitive vocabulary that is specific to the academic process, e.g. “author” instead of “creator” • Widely applicable • All disciplines and publication forms • High quality i.e. controlled
Metadata Control • Any processing that is done to the metadata before its inclusion in a user service. • Essential in a situation where metadata is harvested.
Types of Control • Syntactic control • Relational control • Retrieval control • Identity control • Verity control • Accession control
Basic Model • Four different record types • Document • Group • Person • Organization
Group and document • There is only one document type. • Groups are used to refine the status of the document. • Group construct meant to be defined by librarians, publishers and other intermediaries.
Person and Institution • Person and institution admit very similar attributes • It is hoped that organizational information will be contributed by intermediaries.
Implementation of Basic Model • RePEc • 100000 documents • 100 groups (series) • 500 authors • 5000 institutions • Example • http://ideas.uqam.ca/EDIRC/data/frbgvus.html • Possible to do the same thing for ReLIS
Basic Grammar • XML syntax • Three groups of XML elements • Nouns: element for items described • Adjectives: elements that describe nouns • Verbs: elements that relate nouns
Modular Design <person><isauthorof> <document><ispublishedby> <organization><hasmember> <person></person> </hasmember></organization> </ispublishedby></document> </isauthorof></person>
Relational Design • <person id=“kmarxthered”><email> k.marx@highgate.london.uk</email> </person> • <document id=“kapital”> <title>Das Kapital</title><hasauthor> <person id=“kmarxthered”/> </hasauthor></document>
Other features • Lang qualifier to all elements, it ISO 639-1 if there are two letters and the bibliographic variant of ISO 639-2 if three letters. • Nouns have id. • Verbs have startdate and enddate qualifiers, and of course have id. • Adjectives can have child elements.
Remaining Problems • Resolvability rules for identifiers • Dates and history • Subject classification using the group mechanism • Aliasing of element names
To be done… • Complete list of verbs and adjectives • Schema design • Parsing and validation software. • Conversion with test collection ReLIS.
Collaboration is welcome Thanks for listening. Have a happy New Year.