Understanding Digital Encoding and Metadata in Text Resources
Explore ontology, mark-up languages, TEI, Dublin Core, EAD, MARC, and more in this comprehensive guide to structuring and managing data effectively. Learn about different encoding standards and how to create custom schemas.
Understanding Digital Encoding and Metadata in Text Resources
E N D
Presentation Transcript
Digital Encoding • What’s behind E-text Resources?
Before Metadata... • How to structure your data • Ontology: a specification of a conceptualization of reality • http://www.ontoknowledge.org/oil/ • OWL (Ontology Web Language) • http://www.w3.org/TR/owl-features/
Mark-up Languages • GML: Generalized Markup Language • IBM (1969-1980) • SGML: (Standard Generalized Markup Language • American National Standards Institute (ANSI) (1980- ) • XML: Extensible Markup Language • W3C (1996- )
What is Metadata? • http://www.vraweb.org/metadata.html
Examples of Metadata schema • TEI (Text Encoding Initiative) • Dublin Core • EAD (Encoded Archival Description) • MARC (MAchine-Readable Cataloging) • RDF (Resource Description Framework)
Variations of TEI • TEI: Full DTD, uses SGML (437 elements) • TEI Lite: Subset of the TEI DTD, uses SGML (140 elements) • Bare Bones TEI: An even smaller subset of TEI developed in 1994 primarily as a learning tool, uses SGML (33 elements) • TEI xLite: TEI Lite DTD, rewritten in XML
TEI in the Metadata World • TEI and its variations have become some of the most widely used text encoding standards for texts in the humanities. • While the full original form of TEI is perhaps too elaborate for many projects, it’s abridged versions such as TEI xLite have nevertheless been used on many successful projects. • http://www.tei-c.org/Applications/index-lang.html
<tei.2> <teiHeader> <fileDesc> <titleStmt><title>A Sample of TEI</title></titleStmt> <publicationStmt><p>An unpublished document.</p> </publicationStmt> <sourceDesc><p>Document created in electronic form.</p> </sourceDesc></fileDesc></teiHeader> <text><body> <p>This is a TEI encoded document!</p> </body></text> </tei.2>
Dublin Core • Dublin Core: all elements are optional. • The base set of DC elements = 15 elements • title, creator, subject, description, publisher, contributor, date, type, format, identifier, source, language, relation, coverage, rights • DC-Qualified: • http://dublincore.org/documents/usageguide/qualifiers.shtml • http://www.dstc.edu.au/Research/Projects/rdf/dc-in-rdf-ex.html • http://www.ukoln.ac.uk/metadata/dcdot/
EAD • Developed in the mid-1990’s, with version 1.0 of the DTD being released in 1998 • Version 2 (called EAD 2002) • 146 elements defined in EAD 2002 DTD • http://www.loc.gov/ead/tglib/element_index.html
MARC • MARC21 • http://www.loc.gov/marc/ • MARC21XML • http://www.loc.gov/standards/marcxml/// • MODS (Metadata Object Description Schema) • http://www.loc.gov/standards/mods/
Can’t find a schema to fit your project? • Make your own! • Custom making a DTD (Document Type Definition) • http://www.w3schools.com/dtd/default.asp • http://webapp1.dlib.indiana.edu/letopis/search.jsp?lang=en
The End! Further Reading