1.13k likes | 1.3k Vues
METADATA as we know it: MARC in context. An overview Prepared by: Eva Bolkovac As part of a staff training initiative for the JDC/STOD/Cataloging Subcommittee June 2006. JDC/STOD/Cataloging Subcommittee:. Chair: Marena Fisher Core members since October 2005: Kathryn Trotti
E N D
METADATA as we know it: MARC in context An overview Prepared by: Eva Bolkovac As part of a staff training initiative for the JDC/STOD/Cataloging Subcommittee June 2006
JDC/STOD/Cataloging Subcommittee: Chair: Marena Fisher Core members since October 2005: Kathryn Trotti Eva Guggemos Ana Amelia Contrastano Additional members: Joan Swanekamp Shaundolyn Slaughter Eva Bolkovac Consultant: Stephen C. Jones, Chair of STOD
Quote from an early Metadata Practitioner, or is it déjà vu (all over again) I can not help thinking that the golden age of cataloging is over, and the difficulties and discussions which have furnished an innocent pleasure to so many will interest them no more. (Charles A. Cutter, published in 1904)
Reference work relies on good catalog records “The work of the reference department covers everything necessary to help the reader in his inquiries, including … expert aid in the use of the catalog.” (Isadore Gilbert Mudge, 1936) from Guide to reference books. 6th ed. Chicago: American Library Association
What does this mean for us? • Cataloging=Metadata • Metadata can be harvested automatically by indexing robots • Metadata can be embedded in a digital object • Cataloging *is* a Public Service that increases the usefulness of information, it aids “resource description and discovery” metadata helps people find the information they are looking for
What does this mean for us? Cont. • Managing information (including access/rights management), and the long-term preservation of information (digital archiving) • Metadata is broader in scope than the traditional role of the technical services librarian/Cataloger • Increased collaboration
Cataloging=Metadata • When we catalog a book, a serial, a map, etc., we describe that particular item using a metadata standard, MARC21, together with other rules, like AACR2. We create a catalog record. • The delivery platform, or mechanism for the catalog record is the LMS, the Library Management System (like our Orbis). • Cataloging requires special skills.
Cataloging=Metadata • When we catalog digitized objects, an image of a picture, a manuscript, a finding aid, etc., we describe that particular item, using a metadata standard, DC, TEI, EAD, together with other standards, perhaps AACR2 (or its future version RDA). We create a metadata record. • The delivery platform, or mechanism for the metadata record is the Web, or some other digital management software. The LMS is not designed to deliver metadata records on the web, it is MARC-based. • Cataloging requires special skills.
Sounds similar? • Yes, traditional library cataloging is a form of metadata, BUT THE DIFFERENCE IS IN TECHNOLOGY! • The technical environment has completely changed from the MARC-based system to the Internet, to Web delivery, when applying non-MARC XML-based metadata standards. • Cataloging principles remain very similar whether applying the MARC21 metadata standard or other non- MARC metadata standards. • The new technology brings new sets of rules with itself.
The library’s goal is: • To provide simultaneous access to its traditional library collections as well as its digital collections, in a seamless, integrated manner [searching across multiple data types and databases].
If we didn’t catalog…………. For the user it would mean…………………
Metadata is: • A simple and classic definition: data about data or information about information • More accurately: structured data or information about an information resource • Used differently in different user communities according to their needs • Machine understandable information designed to be indexed and retrieved on the Web –not by online catalogs • In libraries a formal scheme used to describe an object/resource (including digital) • MARC21 *is* metadata (ISO 2709) - an international standard used for bibliographic data in library catalogs. It can also be used to describe digital objects (it has limitations)
Metadata does: • Through cataloging – facilitates discovery of relevant information • Facilitates interoperability • Facilitates resource discovery. Same as in a quality catalog record!
A metadata record is… • A file of information, usually presented as an XML document • It captures the basic characteristics of a data or information resource – structured data about data (same concept as in a MARC catalog record) • Data elements are defined for a metadata record by the rules of a particular standard that is applied • Created and maintained
Different metadata standards for different folks • DC – Dublin Core (ISO15386) :the “CIP” of the digital world – simple • DC can be expressed in XML (RDF/XML) Resource Description Framework (RDF – a data model designed to integrate multiple metadata schemes) • QDC – Qualified Dublin Core – more sophisticated than simple DC • EAD – Encoded Archival Description created to display finding aids on the web • TEI – Text Encoding Initiative for electronic text (Lite version, too)
Different metadata standards for different folks cont.. • MathML – Mathematical Markup Language (an application of XML), represents mathematical symbols and formulae • FGDC – Federal Geographic Data Committee -for maps (although MARC can be used to describe a map, it is not designed to convey complex numeric information for GIS – Geographic Information Systems - data sets) • Onix – Online Information Exchange for book industry, bibliographic, trade used by publishers, an international standard, XML based, libraries may receive Onix records in the future
Simple Dublin Core: DC(can be embedded in the head of an HTML document) The Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata elements Title Type Creator Format Subject Identifier Description Source Publisher Language Contributor Relation Date Coverage Rights Each Dublin Core element is optional and may be repeated. The Dublin Core Metadata Initiative (DCMI) has established standard ways to refine elements and encourage the use of encoding and vocabulary schemes. There is no prescribed order in Dublin Core for presenting or using the elements.
Qualified Dublin Core: QDC • Refines the 15 DC elements, making them more specific Some of these are: Title refinement: Alternative Date refinement: Created Date.Issued Date.Modified Format.Extent Relation refined: Is Version Of Is Part Of Is Format Of • QDC includes recommended encoding schemes which help in the interpretation of the element value (eg LCSH)
XML – Extensible Markup Language • Is not a metadata format itself, but can be used to express metadata formats • a language container: a ‘metalanguage’ • a W3C – WWW Consortium standard • XML tags have no predefined meaning. • XML is a syntax for data structure standard creation • A flexible text format, important for data exchange on the Web • Unlike HTML it does not specify how to display data on the Web (bold, color, etc). That is done through XSLT – Extensive Stylesheet Language transformations.
Example of a simple XML record <?xml version="1.0" encoding="ISO-8859-1" ?> <!-- Edited with XML Spy v2006 (http://www.altova.com) --> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
Same XML record with an error The XML record is transformed using XSLT – a stylesheet language for XML - to display it on the Web. The record has to be well formed and has to be validated, otherwise it can’t be displayed…. http://www.w3schools.com/xml/note_error.xml
Mappings and crosswalks between metadata formats • Crosswalks facilitate moving metadata from one scheme to another; mapping of the data elements, semantics, and syntax. • They facilitate interoperability and exchange of metadata. • Like translating from one language to another • Examples at : http://www.oclc.org/research/projects/mswitch/1_crosswalks.htm and http://www.loc.gov/marc/marcdocz.html • Difficulties betweens crosswalks of different metadata formats (field definitions) • Best practices for standardized records http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?CrosswalkingLogic
One size does not fit all • No single standard is suitable for all purposes • Proliferation of standards • Obvious advantages exist to having a single standard for cataloging both digital and non-digital materials
Searching empowered by metadata • Through analysis of resource content • Appropriate thesauri • Designated fields for data exchange and migration • Richer than keyword
METADATA TYPES: • Descriptive (such as author, title, abstract); it is the most standardized -- MARC, MODS, DC • Structural (such as how resource is put together, pages, chapters) –METS (Metadata Encoding and Transmission Standard), XML • Administrative (such as technical information, file type, track history of creation and changes, access/rights management and intellectual property, preservation metadata to archive the resource) –ERMI (Electronic Resource Management Initiative), PREMIS (PREservation Metadata Implementation Strategies)
Descriptive metadata type/format: MARC21 • Machine Readable Cataloging record, an international communicationstandard, ISO 2709 • Originally designed in the late 1960’s to aid in the transfer of bibliographic data onto magnetic tape, and to replace the printed catalog cards with electronic form • MARC is not a cataloging code • A carrier for bibliographic information, such as titles, names, subjects, notes, publication information, and physical descriptions of objects
MARC21 • Standard for exchanging bibliographic, holdings and other data between libraries. • Allows for data elements for different types of material: a foundation that most library catalogs are built on.
Catalog this! Using MARC21 formats! • Books • Continuing resources (serials) • Integrating resources • Maps • Music (scores) • Sound recordings • Visual materials • Electronic resources • Mixed materials
Levels of description: full, minimal and in-between • Describe a book • Describe a collection, a whole set, separate volumes in the set • Describe a photograph • Describe a chapter in a book • Describe a video • Describe an electronic resource, a digital object
Describing information on the web: • MARC is designed for use in library catalogs, by automated library systems, not for use on the web • MARC can be transformed to be displayed on the web: MARCXML, MODS • To describe other objects on the web: use Dublin Core metadata standard, use XML, use TEI for text, EAD for digital finding aids, etc
MARCXML • A framework for working with MARC data in an XML environment • Complete MARC record is represented in XML, no loss of data • Can convert back to MARC easily, no loss of data • All MARC formats (book, map, music, etc) are supported • Customizable for local solutions
MARC record LDR 01281cam 2200337 a 4500 001 ocm25508902\ 003 OCoLC 005 20060530010502.0 008 920219s1993\\\\caua\\\j\\\\\\000\0\eng\\ 010 \\$a 92005291 040 \\$aDLC$cDLC$dOCLCQ$dBAKER 020 \\$a0152038655 :$c{dollar}15.95042 \\$alcac 050 00$aPS3537.A618$bA88 1993 082 00$a811/.52$220 049 \\$aYUSS
MARC record continued 100 1\$a Sandburg, Carl,$d1878-1967. 245 10$a Arithmetic /$cCarl Sandburg ; illustrated as an anamorphic adventure by Ted Rand. 250 \\$a 1st ed. 260 \\$a San Diego :$bHarcourt Brace Jovanovich,$cc1993. 300 \\$a 1 v. (unpaged) :$bill. (some col.) ;$c26 cm. 500 \\$a One Mylar sheet included in pocket. 520 \\$a A poem about numbers and their characteristics. Features anamorphic, or distorted, drawings which can be restored to normal by viewing from a particular angle or by viewing the image's reflection in the provided Mylar cone.
End of MARC record… 650 \0$a Arithmetic $vJuvenile poetry. 650 \0$a Children's poetry, American. 650 \1$a Arithmetic $vPoetry. 650 \1$a American poetry. 650 \1$a Arithmetic $v Poetry. 650 \1$a American poetry. 650 \1$a Visual perception. 7001\$a Rand, Ted, $e ill.
MARCXML: Example • Where MARC exists within the world of XML -XML - just a different way of encoding MARC • Tags are preserved in their semantics • 1:1 mapping • No loss of data during conversion • Extensible – can be customized • http://www.loc.gov/standards/marcxml/Sandburg/sandburg.xml
MODS: Metadata Objects Description Schema • XML-based descriptive metadata standard • A subset of data elements are derived from MARC21, uses language-based tags • Highly compatible with MARC21, (but not a MARC replacement) • Richer than Dublin Core • Uses natural language tags rather than numeric tags • Accommodates special requirements for digital resources
MARC limitations in the digital environment • Lack of expandability due to rigorous record formats (goes back to the production of printed card catalog cards) • Weaknesses in describing bibliographic attributes of digitized resources • Incompatible with other MARC formats • Bibliographic relationships are not easily represented • Can’t be processed directly by web applications
So, is MARC still needed? YES! • But it’s one of the metadata standards we can use – not the only one • OPAC is not necessarily the “center of discovery” • Can be retooled, repurposed, transformed… • Still the best way to describe resources for discovery, identification and retrieval in traditional library catalogs (like Voyager)
MARC21 standards: • MARC21 Format for Bibliographic Data • MARC21 Format for Authority Data • MARC21 Format for Holdings Data • MARC21 Format for Classification Data • MARC21 Format for Community Data
Bibliographic record type: • A carrier primarily for bibliographic information about printed or manuscript textual materials, maps, music, serials, visual materials, electronic resources • ..or any source of information which can be represented in a catalog record
Authority records: • Are a carrier for information concerning the authorized forms of names, titles, subjects (and subject divisions) to be used in constructing ACCESS POINTS • Describe names and terms which need to be standardized for optimal retrieval of data • Include personal, corporate, geographic names and controlled vocabularies
Value of Authority control • Most efficient and effective mechanism for optimal retrieval of information • Without authority control, access to information can be severely compromised
Holdings records: • Are a carrier for holdings information for three types of bibliographic items: • Single-part • Multipart • Serial (may include copy-specific information, information needed for local processing, maintenance, preservation or version information) • Indicate the number and locations of copies of a resource cataloged in the bibliographic record
Classification records: • A carrier for information about classification numbers and the captions associated with them that are formulated according to a specific authoritative classification scheme
Community Information records: • A carrier for descriptions of non-bibliographic resources that fulfill the information needs of a community.
MARC • In the beginning different flavors….MARC, USMARC, CANMARC, UKMARC, AUSMARC…harmonization MARC21 • Information is stored in a consistent form • Data is manipulated by a computer • Allows for communication between systems • Accommodates extensive data elements • A highly complex communication or data structure standard that provides concise data management
MARC has three components: • Record structure (based ISO2709 and ANSI Z39.2) • Content designation – these are the codes used to tag elements of data within a MARC record • Data content of a record – the object we are coding, a book, a map, etc. according to data formatting standards (AACR, LCSH, LC Classification, DDC, etc.)
MARC supporting documentation • Character sets: - MARC-8 (8-bit encoding) - UCS/UNICODE UTF-8 (8/16 bit encoding) - 15,000+ characters - Latin, Cyrillic, Hebrew, Arabic, CJK • Code lists: countries, geographical, languages, sources, relators http://www.loc.gov/marc/specifications/
We’ve got standards! • Standards are: a set of rules and guidelines that provide a common framework • Aid interoperability • ISO – International Organization for Standardization and NISO, W3C, DLF – Digital Library Federation: METS – Metadata Encoding and Transmission Standard: OAI – Open Archives Initiative • ISBN, ISSN, ISMN (music) are ISO standards • ISO/IEC 11179 – IT Metadata Registries (MDR) • ISO 2709 Format for information exchange