Garbage In, Garbage Out: Input Standards andMetadata • Scheme is only half of the equation • Consistency is key • Controlled vocabulary for all • Subjects • Names • Common descriptive terms
Access Points - Purposes • To identify (e.g., an entity known to the user) • To collocate (i.e., bring together related entities/works) • To aid in evaluating or selecting (e.g., Has this author written something newer on the subject? Which of several works with the same title do I want? What level of subject treatment is needed –a whole work on the subject? a chapter? A paragraph?) • To locate the image, etc.
Access Points for Names and Titles -Purposes • To facilitate the retrieval of names and titles that are imperfectly remembered • To facilitate the retrieval of names and titles that are expressed differently in different information packages • To facilitate the retrieval of names and titles that have changed over time • To collocate expressions and manifestations of works • To collocate works that are related to other works
Access Points for Names and Titles– How Accomplished • Name and Title Authority Control • All access points (whether main or added entries) need to be under authority control so that • persons or entities with the same name can be distinguished from each other • all names used by a person or body, or all manifestations of a name of a person or body will be brought together • all differing titles of the same work can be brought together • Therefore, current practice dictates either the establishment of a “heading” for each name or title as an access point or the provision of pointers to draw different representations of names or titles together • Headings are kept track of in authority files; RDF provides a model for linking entities
Name Authority Standards • LCNAF (Library of Congress Name Authority File) – constructed according to principles set out in AACR2R • Getty Vocabulary tools (artist names; geographic names) – VRA Core Categories calls for use of the Getty vocabulary • ISAAR(CPF) – International Standard Archival Authority Record for Corporate Bodies, Persons and Families • EAC – Encoded Archival Context (for describing creators of archival collections) • DCMI Agents – creators, contributors, and publishers – to be used in Dublin Core records
DCMI Agents: Working Definitions • Agent: A person (author, publisher, sculptor, editor, director, etc.) • or a group (organization, corporation, library, orchestra, country, federation, etc.) • or an automaton (weather recording device, software translation program, etc.) • that has a role in the lifecycle of a resource. • Agent Record: A collection of elements describing an agent. • Agent Authority Record: An agent record that includes the particular name that is preferred (considered authoritative) within a particular community (e.g., libraries).
Controlled Subject Terminology- Purposes • To provide subject access to information packages in a catalog or index • To collocate surrogate records for information packages of a like nature • To provide suggested synonyms and syndetic structure to aid a user in subject searching • To save the users’ time
Controlled Subject Terminology– How Accomplished • Conceptual analysis – describe aboutness in natural language • Translate that analysis into the framework of the controlled vocabulary system (e.g., use of single concept terms vs. use of phrases, compound concepts, and precoordinated subdivisions) • Use controlled vocabulary system rules to create controlled subject access points to be added to metadata records
Controlled Vocabularies • Subject heading lists • LCSH (Library of Congress Subject Headings) • FAST (Faceted Access to Subject Terminology) • Sears List of Subject Headings • MeSH (Medical Subject Headings) • Thesauri • AAT (Art & Architecture Thesaurus) • Thesaurus of ERIC Descriptors • Many more...
Names: What Do We Need • Enable the user to retrieve all relevant items associated with a person or group • Enable the user to retrieve all relevant items associated with a name regardless of the fullness or spelling of the person or group • Enable names to be browsed by either last name or first name but displayed in natural order
Names: Existing Tools • ANAC (Automated Name Authority Control system) • Perseus project developed its own named entity extractor optimized for Civil War–era names. Uses MADS • Stanford Natural Language Processor Tools
Authorities • Authority Control governs usage of a controlled vocabulary. This is managed with • Authority Files, that consist of • Authority Records, each of which records a term and its variants as well as evidence. They are created using • Authority Work, bibliographic detective work usually.
Authorities • Each authority record exists to control a term, known in library cataloging as a “heading” • The only “entity” is the controlled heading • The relationships are among the heading and variant forms of the heading • Everything else in the authority record is evidentiary or used for file control
Role of Authority Work • Authority work, in which terms and names are verified and validated, is a critical part of documentation practice. • The concept originated in the library cataloging domain in the days of manual card catalogs and indexes when strict consistency was necessary for minimal access. • Today authority work has extended to other information management communities and its processes and procedures have benefited greatly from computerization. • The development and application of standard controlled vocabularies is an significant outcome of authority work.
Authority Work Characteristics • Authority files are compilations of authorized terms or headings used by a single organization or consortium in cataloging, indexing, or documentation • Authority control is a system of procedures that maintains consistent information in database records.
Authority Work Characteristics • An authority file is a controlled vocabulary, but not all controlled vocabularies are authority files. • Authority files are an integral part of most automated information systems but you will find differing levels of implementation depending on the system. • Authority work procedures may be automated, but the intellectual processes needed to create quality authority files are still best accomplished by humans.
Attributing Works in the Anglo-American Cataloguing Rules • A work may be attributed to an individual creator, it may be attributed to a corporate emanator, or it may be entered under its title. • Individuals: chiefly responsible for the creation of intellectual (artistic, etc.) content (21.1A1). Responsibility may be shared or mixed … • Corporate body: an organization with a name that acts as an entity … and causes a work of collective thought or activity to emanate … (21.1B2). Governments, churches, universities, corporations, conferences, etc.
A “Heading” Contains, but is Not Equal to, A “Name” • A heading includes: • The authorized form of name (title, etc.) • Manipulated in various ways (inverted, for instance) • Qualifiers to make it unique • The name is Richard P. Smiraglia • The heading is Smiraglia, Richard P., 1952-
Constituting Headings: Personal Names • The name of the creator as found in his published works. • If more than one name, choose the latest. • If more than one form, choose that found most often most recently. • If all else fails, choose the fullest form. • Add dates and middle names to resolve conflicts.
Constituting Headings: Corporate Names • The name of the corporate body as found in its published works. • If more than one name use all. • If more than one form, choose the one found most often in its works. • Add terms as qualifiers to resolve conflicts. • Who (Musical group) • Apollo (Spaceship)
Constituting Headings: Subordinate Entry • Government or Corporate Entities with generic names or names implying subordination “Department” “Division” “Bureau” “Committee” etc. • Entered under the name of the intermediate unit with a distinctive name. • California. Employment Data and Research Division. • NOT: California. Employment Development Department. Employment Data and Research Division.
Authority Control • Maintains consistency of usage of names of individuals, corporate bodies, and titles of works. • Always: • Smiraglia, Richard P., 1952- • Not Smiraglia, R.P. • Not Smiraglia, Richard • Always: • Taylor, Arlene G., 1941- • Not Dowell, Arlene Taylor, 1941-
Authority Records • Authority control works through the use of authority records • Authority records record: • Authority work—the actual decision-making process of the cataloger • Variant forms found along the way • References in the catalog from recognized variant forms
A new model of “authority file” • The authority records of creators are meant to include a much more complex set of information than traditional bibliographic authority records, exactly because they are devoted to implementing the model of separate description of archives and creators • Dates of existence, history and geography, functions, occupations, and activities … political, social, cultural context in which the creator worked
Thus the only entity in an authority record is the authorized heading (or “term”) Its variants are attributes, but could also be seen as equivalents The rest is functional: Notes (Evidentiary and Non—two types) Usage Control AF From a Data Modeling Standpoint …. BF A flat file model Headings in the Authority File govern usage in the Bibliographic File. One “ Dickens” in the AF governs all “Dickens” in the BF. Usage is inferential.
Online, new models emerged 2. An ER model separated the headings from their representations in bibliographic records. This reduced redundancy dramatically. Every heading is stored only in the authority file, and copied as needed into the displays arranged from the bibliographic file. All “Dickens” resides only in the AF, with links from the BF. 1. Online flat-file models simply used the authority file as an occasional filter. All headings from the bibliographic file were run against it periodically for validation. AF BF
Authority Control • Traditional Functions • Ensures that access points are unique and consistent in content and form • Provides a network of linkages for variant and related headings in the catalog • Improves precision & recall for database searches
Reasons for Authority Control Success • AC operates within a well-defined and bounded universe—the library catalog • Creation of access points based on principles & standardized practices that guide the process • Authority work is aided by reference to authoritative lists • Performed by highly trained individuals • Part of library culture • Understand cause and effect in the information retrieval process
Functions of the Authority File • Document decisions • Serve as reference tool • Control forms of access points • Support access to bibliographic file • Link bibliographic and authority files
Users • Authority record creators and reference librarians • Library patrons
Users Authority record creators and reference librarians Library patrons User tasks Find Find an entity or set of entities corresponding to stated criteria Identify Identify an entity Contextualize Place a person, corporate body, work, etc. in context Justify Document the authority record creator’s reason for choosing the name or form of name on which an access point is based Users and Tasks
Traditional Authority Control in Libraries • Which names do we control? • Names of authors and some contributors of published books • Composers of sheet music • Names of corporate bodies responsible for official publications • Names associated with resources catalogued since 1981 • Names associated with audio or audio visual resources, where possible • Which names do we exclude? • Names of authors of journal articles or chapters of published books • Contributors whose names fall towards the end of the alphabet or whose contribution we regard as insignificant • Names associated with archival or manuscript material • Names derived from older catalogues • Names associated with most Web Resources • Names in the content management system / institutional repository
Expectations • There is a gap between ambition and delivery • Only some names on some types of resources are controlled User expectations are changing • Silos: • Libraries / Archives / Repositories / Museums • National practices • Institutional practices • Variance over time Is partial authority control acceptable to users? If not, will it be acceptable to administrators?
Workflows • Current workflows are not scalable • Retrospective • Cataloger driven • Decision making • Is A. Rose PhD the same person as Dr. Alex Rose, University of London? • What other information is available? • Is it sufficient to match or disambiguate the identities? • Is there a website / contact details?
Rethinking the Process • Capture information about the person, family or corporate body at the time the resource is created • Devolve responsibility to authors, publishers, researchers and academics • Libraries and bibliographic agencies focus on quality control, complex relationships and conflict resolution. • Capture information in a way that is machine intelligible. • Identification of entities not disambiguation of headings
Automation • Identification • ISNI • Contextual information • FRAD, RDA, MARC 21 • Matching • VIAF, Names • Linking • Controlled vocabularies • Confidence
Models • Interparty - http://www.interparty.org/ • EU Project 2002-2003 • Public identities not persons • Linking not merging • Authority – i.e. who makes the assertion • FRAD Functional Requirements for Authority Data • Extension of FRBR model to authority data • Separation of names from the person, family or corporate body • Recognition of different rules http://www.ifla.org/publications/ifla-series-on-bibliographic-control-34
ISNI: International Standard Name Identifier • ISNI 1422 4586 3573 0476 • Registration Metadata • http://www.isni.org/ • Draft ISO Standard (ISO 27729) • A “bridge “ • Identification of Public Identities • Natural Persons • Legal Persons • Fictional Characters • Groups • Incorporated entities • Libraries, rights management, book trade, publishers, media content industries
Match & Link Authority Files Reduce costs Increase utility Retrospective alignment of bibliographic data OCLCBibliothèque nationale de FranceBibliotheca Alexandrina (Egypt)National Library of the Czech RepublicDeutsche NationalbibliothekNational Library of IsraelLibrary of Congress/NACONational Library of SwedenVatican Library VIAF: The Virtual International Authority File Prototype http://viaf.org/ Linked Data http://outgoing.typepad.com/outgoing/2009/09/viaf-as-linked-data.html
It’s not just about libraries… • FO:AF Friend of a Friend • Social networking metadata • Granularity of parts of a name • http://xmlns.com/foaf/spec/ • EAC-CPF: Encoded Archival Context – Corporate Bodies, Persons, and Families • Communication standard for exchange of authority records • ISAAR (CPF) • Draft Standard http://eac.staatsbibliothek-berlin.de/
Thoughts • Controlling names remains important in the context of linked data and the Semantic Web • Identification and collocation of variants is more important than establishing a preferred form • Current techniques are not scalable • Automation and participation are the way forward • Web services for identification • No simple solution • Exension of the collaborative model
FRAD • Functional Requirements for Authority Data • IFLA Division of Bibliographic Control working group 1999- • April 2007 draft for world-wide review • Approved March 2009
FRAD Entities • Name by which bibliographic entities are known (in the “real” world) • Identifier assigned to those entities • Controlled access point based on those names or identifiers • These are the heart of the authority data
Name • A character or group of words and/or characters by which an entity is known • The basic name or term itself • As found in the “real” world
Definition: Identifier • A number, code, word, phrase, logo, device, etc. that is uniquely associated with an entity, and serves to differentiate that entity from other entities within the domain in which the identifier is assigned • Not only bibliographic identifiers
Definition: Controlled Access Point • A name, term, code, etc. under which a bibliographic or authority record or reference will be found • Includes established or authorized headings and variant headings or references
Basic FRAD Model BIBLIOGRAPHIC ENTITIES known by NAMES and / or IDENTIFIERS basis for CONTROLLED ACCESS POINTS
More FRAD Entities • Rules governing construction of a controlled access point • Agency applying the rules, and creating/modifying the controlled access point
MADS • MODS users kept asking for a compatible authority record • Metadata Authority Description Schema • April 2004, Preliminary version out for review • December 2004 new draft out for review • April 2005 version 1.0 published