210 likes | 333 Vues
Not just numbers on shelves: using the DDC for information retrieval. Gordon Dunsire Presented at the Symposium “Bridging the class( ification ) divide: the new DDC languages and retrieval possibilities ”, 27 April 2010, Bibliotheca Alexandrina, Alexandria, Egypt. Overview.
E N D
Not just numbers on shelves: using the DDCfor information retrieval Gordon Dunsire Presented at the Symposium “Bridging the class(ification) divide: the new DDC languages and retrieval possibilities”, 27 April 2010, Bibliotheca Alexandrina, Alexandria, Egypt
Overview • “Traditional” uses of the DDC • Machine-readability opens up possibilities for subject-based information retrieval • Hierarchical and linear browse • Keyword search • Terminology services (hub-spoke) • Multilingual retrieval • Semantic web • EDUG IT survey
Traditional use of the DDC • Shelfmarking • Shelf location in a linear sequence • Notation can be fitted to a (book) spine • Subject grouping • Notation brings similar topics together and keeps separate topics apart • Collection analysis by subject or discipline • Management information by subject • Loans, acquisitions, etc.
Digital environment • Notation <> Captions • Notation in catalogue record can be (automatically) matched to human-friendly caption(s) • Opposite of classification process, where caption is matched to notation • Sometimes via Relative Index • Length of caption not a limiting factor • Length of notation also not limiting • No need to truncate notation • Notation/caption changes (legacy) more easily managed
Information retrieval • Notation hierarchy can be used to display caption hierarchy • Built notation (i.e. added subdivisions) can be parsed to identify facet captions • E.g. Place, time • Keywords can be found inside captions • Notation can be linked to caption variants • Translations of the DDC • “Captions” or subject headings outside of the schedules
Linear browse • Captions listed in alphabetical order • With or without Relative Index • Already in alphabetical order • Possibility of keyword-in-context (KWIC) or keyword-out-of-context (KWOC) indexes • Each significant word in caption rotated to the front (or extracted) and interfiled in alphabetical order • Possibility of integration with subject headings • Or substitute for headings
Hierarchical browse • Captions and/or notations exposed at one “level” only • Controlled by numeric notation • First digit = level 1; First 2 digits = level 2, etc. • Decimal notation so maximum of 10 topics at each level • User drills-down in hierarchical order from the top (broadest topic) • Or drills-up from specific to general • Levels can be expressed as tag clouds • Topics weighted by notation (3xx, 32x, 321 ...)
Keyword retrieval • Captions included in: • DDC keyword index • Subject keyword index • E.g. With subject headings • General keyword index • E.g. With titles, notes, etc. • DDC caption terminology distinct from other major subject heading schemes • Alternative terms (and spellings) • DDC caption: “Acquisition through exchange, gift, deposit” • LCSH: “Book donations” [neither term in Relative Index]
Terminology services (1) • Captions, headings, terms from any scheme can be “classified” by DDC • i.e. Assigned a DDC notation • Notation becomes a bridge or link between headings from different schemes • Hub-and-spoke, with DDC as the hub and each different scheme as a spoke • More efficient that one-one mappings between headings • Combinatorial explosion • 3 schemes > 3 mappings • 4 schemes > 6 mappings ...
Terminology services (2) • Hub (i.e. DDC notation) is transparent to user • Term A > DDC notation < Term B • Term A <> Term B • Approach used by High-Level Thesaurus (HILT) project • Successful, but scalability an issue • Even though more efficient that Term-Term approach • Scalability might be more achievable in a distributed environment • i.e. Semantic Web
Translations • Caption to caption translation • English caption <> Arabic caption • But notation is common, and language-free • Non-English translation is similar to non-DDC topic/subject heading scheme • Intrinsic hub-spoke architecture • Arabic caption <> English caption (= notation) <> German caption • Arabic caption <> German caption • Translations can be automatically switched • “Instance” notation remains the same
DDC and the Semantic Web • OCLC is developing a representation of the DDC in resource description framework (RDF) • The basis of the semantic web • http://dewey.info • Includes notations, captions, notes, and legacy (audited changes) • Only DDC Summaries available so far • 11 languages including English • Can be added to the linked-data “soup” • Distributed processing, development and services
Thank you • G.dunsire@strath.ac.uk • EDUG IT (links to applications) • http://www.slainte.org.uk/edugit/ • Dewey.info (DDC in RDF) • http://dewey.info/