Enhancing Cataloging with FRBR: Implementation in Large Databases
This report explores the implementation of the Functional Requirements for Bibliographic Records (FRBR) model in large databases by OCLC Research. FRBR clusters bibliographic items into a four-level structure: Work, Expression, Manifestation, and Item, improving cataloging, discovery, and delivery of bibliographic records. By analyzing 47 million records in WorldCat, it discusses challenges, such as incomplete author entries, and presents an algorithm for clustering works. The findings aim to streamline navigation and enhance user experience in accessing bibliographic data.
Enhancing Cataloging with FRBR: Implementation in Large Databases
E N D
Presentation Transcript
Implementing FRBR on Large Databases Thomas Hickey Diane Vizine-Goetz OCLC Research
What is FRBR • IFLA study group report: Functional Requirements for Bibliographic Records • Bibliographic model independent of cataloging rules • Clusters bibliographic items into a four-level structure • Work • Expression • Manifestation • Item
Work Concept Person Expression Object Manifestation Corporate Body Event Item Place Control of Entities in FRBR Entities Surrogates Uniform titles Citations Names Subjects
Why FRBR? • Potential to improve: • Cataloging • Discovery • Delivery • By • Bringing versions of works together • Showing relationships of various kinds • Enabling users to navigate to level of interest
Research on FRBR & WorldCat • Subsets • By library, region • Example/problem sets • Shakespeare, the Bible • Humphry Clinker • 1,000 random works • By genre • Dissertations • Fiction • Whole file, 47 million bibliographic records
Our Approach • Concentrating on work-level • Problems with expression-level clusters • Efficient, maintainable, understandable • Few, if any, false matches with correct cataloging • Err on the side of missed matches • Some accommodation of frequent variants • Compare with manually clustered
The Algorithm • A key is generated for each record • Extract author, title • Look up in NACO authority file • Added entry information as needed • Form a key from bibliographic record • Author, title, added entry information • These can be sorted, compared
Problems • Many (17%) records do not have • Author main-entry • Uniform title • In general these can not be matched • Look at added entries • Information at the expression and manifestation levels • Handled separately • 180,000 clusters involving ~400,000 records
Top 10 WorldCat Clusters # RecsAuthor/Title Key 8,383 bible\n t 8,055 bible 6,174 bible\authorized 4,033 bible\o t\psalms 3,964 haggadah 3,477 great britain/treaties etc 2,402 bible\o t 2,248 koran 2,153 arabian nights
Top 10 from a Public Library # RecsAuthor/Title Key 89 bible\authorized 85 mother goose 84 chopin, frederic\1810 1849/piano music 81 schulz, charles m/peanuts 63 davis, jim/garfield 61 moore, clement clarke\1779 1863/night before christmas 60 mozart, wolfgang amadeus\1756 1791/instrumental music 58 bach, johann sebastian\1685 1750/cantatas 57 beethoven, ludwig van\1770 1827/sonatas 56 twain, mark\1835 1910/adventures of huckleberry finn
Results • Manual estimate: 1.5 manifestations/work in WorldCat • Algorithm: ~1.3 • 25,844 clusters have 20 or more records • 401,659 clusters have 5 or more records
Preliminary Plans • Build structures for FRBR into new catalog • Expose FRBR clustering for searching • Make visible in cataloging • As consensus on implementation is developed • As cataloging rules accommodate FRBR
Spin-offs • NACO normalization code • Testbed • Server • Authority work • ePrints UK • FRBR in other projects • FictionFinder • NDLTD union catalog
Fiction Subset • 2,665,662 WorldCat records • 1,758,479 work clusters • 1.5 records/cluster • 3,866 clusters have 20 or more records • 50,540 clusters have 5 or more records
Top 10 clusters for fiction # RecsAuthor/Title Key 1,296 defoe, daniel\1661 1731/robinson crusoe 1,248 carroll, lewis\1832 1898/alices adventures in wonderland 971 cervantes saavedra, miguel de\1547 1616/don quixote 828 stevenson, robert louis\1850 1894/treasure island 689 twain, mark\1835 1910/adventures of huckleberry finn 624 twain, mark\1835 1910/adventures of tom sawyer 618 swift, jonathan\1667 1745/gullivers travels 600 andersen, h c\hans christian\1805 1875/tales 581 stowe, harriet beecher\1811 1896/uncle toms cabin 570 arabian nights
FictionFinder • Employs work clusters in a prototype system for searching and browsing bibliographic records for fiction • Indexes records at the work level and organizes displays by work and expression (primarily language) • Includes records for textual items; additional modes of expression (moving image, sound) to be added later
395 records for author “crichton, michael\1942” clustered into 17 entries
Benefits • Aggregated displays for works and expressions • Enhancement of (fiction) records at work level • with elements from records within the work cluster (e.g., summaries, genre terms, subject headings, class numbers) • with external data (e.g., literary prizes, prequels/sequels, evaluative content)
Challenges • Identifying appropriate bibliographic data for systematically grouping or differentiating works and expressions • Works • Genre (graphic novel v.s novel) • Genre + mode of expressions (audio book v.s radio play) • Degree of modification (abridgement of juvenile work v.s an adaptation for young children) • Expressions • translators, illustrators, editors
Next Steps • FRBR algorithm • Explore applications • Refine algorithm as needed • FictionFinder • Add records for sound and image • Conduct user studies
Links • Functional Requirements for Bibliographic Records - Final Report • http://www.ifla.org/VII/s13/frbr/frbr.htm • Experiments with the IFLA Functional Requirements for Bibliographic Records (FRBR) • http://www.dlib.org/dlib/september02/hickey/09hickey.html • OCLC Research Activities and IFLA's Functional Requirements for Bibliographic Records • http://www.oclc.org/research/projects/frbr/index.shtm • Implementing FRBR on Large Databases • http://staff.oclc.org/~vizine/CNI/OCLCFRBR.htm