1 / 48

Controlled Vocabularies: Name Authority Control

Controlled Vocabularies: Name Authority Control. University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval. Review. Dublin Core Other Metadata Systems Cognitive basis of categorization and subject classification.

debra
Télécharger la présentation

Controlled Vocabularies: Name Authority Control

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Controlled Vocabularies: Name Authority Control University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval Information Organization and Retrieval

  2. Review • Dublin Core • Other Metadata Systems • Cognitive basis of categorization and subject classification Information Organization and Retrieval

  3. Title Creator Subject Description Publisher Other Contributors Date Resource Type Format Resource Identifier Source Language Relation Coverage Rights Management Dublin Core Elements Information Organization and Retrieval

  4. Issues in Dublin Core • Lack of guidance on what to put into each element • How to structure or organize at the element level? • How ensure consistency across descriptions for the same persons, places, things, etc. Information Organization and Retrieval

  5. More Metadata Systems • The following are a sample of metadata systems for a variety of special types of data/documents/objects. Information Organization and Retrieval

  6. Type of Metadata systems and standards • Naming and ID systems – URLs, ISBNs • Bibliographic description – MARC, Dublin Core, TEI, etc. • Music -- SMDL • Images and objects – CIMI, VRA Core Categories • Numeric Data – DDI, SDSM • Geospatial Data – FGDC • Collections – EAD Information Organization and Retrieval

  7. Metadata Resources • Check the Links section from the class home page • Best site is the “Digital Library: Metadata Resources” page from IFLA at http://www.ifla.org/ifla/II/metadata.htm Information Organization and Retrieval

  8. Hierarchical vs. Faceted (Subject Heading vs. Descriptor)Category Systems Information Organization and Retrieval

  9. Controlled Vocabulary(The following slides follow Bates 88) • Start with the text of the document • Attempt to “control” or regularize: • The concepts expressed within • mutually exclusive • exhaustive • The language used to express those concepts • limit the normal linguistic variations • regulate word order and structure of phrases • reduce the number of synonyms or near-synonyms • Also, provide cross-references between concepts and their expression. Information Organization and Retrieval

  10. Classification Schemes • Classify possible concepts. • Goals: • Completely distinct conceptual categories (mutually exclusive) • Complete coverage of conceptual categories (exhaustive) Information Organization and Retrieval

  11. Subject headings assign one (or a few) complex heading(s) to the document Descriptors Mix and match AssigningHeadings vs. Descriptors How would we describe recipes using each technique? Information Organization and Retrieval

  12. WILSONLINE Athletes Athletes--Heath&Hygiene Athletes--Nutrition Athletes--Physical Exams … Athletics Athletics -- Administration Athletics -- Equipment -- Catalogs … Sports -- Accidents and injuries Sports -- Accidents and injuries -- prevention ERIC Athletes Athletic Coaches Athletic Equipment Athletic Fields Athletics … Sports psychology Sportsmanship Subject Heading vs. Descriptor Information Organization and Retrieval

  13. Describe the contents of an entire document Designed to be looked up in an alphabetical index Look up document under its heading Few (1-5) headings per document Describe one concept within a document Designed to be used in Boolean searching Combine to describe the desired document Many (5-25) descriptors per document Subject Headings vs. Descriptors Information Organization and Retrieval

  14. Hierarchical Classification • Each category is successively broken down into smaller and smaller subdivisions • No item occurs in more than one subdivision • Each level divided out by a “character of division”. Also known as a feature. • Example: distinguish Literature based on: • Language • Genre • Time Period Information Organization and Retrieval

  15. Literature English French Spanish ... ... ... Prose Poetry Drama ... Prose Poetry Drama ... 16th 17th 18th 19th 16th 17th 18th 19th Hierarchical Classification Information Organization and Retrieval

  16. Labeled Categories for Hierarchical Classification • LITERATURE • 100 English Literature • 110 English Prose • English Prose 16th Century • English Prose 17th Century • English Prose 18th Century • ... • 111 English Poetry • 121 English Poetry 16th Century • 122 English Poetry 17th Century • ... • 112 English Drama • 130 English Drama 16th Century • … • 200 French Literature Information Organization and Retrieval

  17. Faceted Classification • Create a separate, free-standing list for each characteristic of division (feature). • Combine features to create a classification. Information Organization and Retrieval

  18. A Language a English b French c Spanish B Genre a Prose b Poetry c Drama C Period a 16th Century b 17th Century c 18th Century d 19th Century Aa English Literature AaBa English Prose AaBaCa English Prose 16th Century AbBbCd French Poetry 19th Century BbCd Drama 19th Century Faceted Classification along with Labeled Categories Information Organization and Retrieval

  19. Important Question:How to use both types ofclassification structures? • How to look through them? • How to use them in search? Information Organization and Retrieval

  20. Today • More on Controlled vocabularies • Choice of names • Form of names • Name Authority files • Types of Controlled Vocabularies Information Organization and Retrieval

  21. Controlled Vocabularies • Vocabulary control is the attempt to provide a standardized and consistent set of terms (such as subject headings, names, classifications, etc.) with the intent of aiding the searcher in finding information. Information Organization and Retrieval

  22. Controlled Vocabularies • Names and name authorities & Other Types of Controlled Vocabulary (Today) • Design of controlled vocabularies for subject access -- Thesaurus design (Thursday) Information Organization and Retrieval

  23. Names • Cutter’s objectives of bibliographic description: • To enable a person to find a document of which the author is known • To show what the library has by a given author • First serves access • Second serves collocation Information Organization and Retrieval

  24. Problems with Names • How many names should be associated with a document? • Which of these should be the “main entry”? • What form should each of the names take? • What references should be made from other possible forms of names that haven’t been used? Information Organization and Retrieval

  25. The problem • Proliferation of the forms of names • Different names for the same person • Different people with the same names • Examples • from Books in Print (semi-controlled but not consistent) • ERIC author index (not controlled) Information Organization and Retrieval

  26. Rules for description • AACR II and other sets of descriptive cataloging rules provide guidelines for: • Determining the number of name entries • Choosing a main entry • Deciding on the form of name to be used • Deciding when to make references Information Organization and Retrieval

  27. Authority control • Authority control is concerned with creation and maintenance of a set of terms that have been chosen as the standard representatives (also know as established) based on some set of rules. • If you have rules, why do you need to keep track of all of the headings? Can’t you just infer the headings from the rules? Information Organization and Retrieval

  28. Conditions of Authorship? • Single person or single corporate entity • Unknown or anonymous authors • Fictitiously ascribed works • Shared responsibility • Collections or editorially assembled works • Works of mixed responsibility (e.g. translations) • Related Works Information Organization and Retrieval

  29. Added Entries • Personal names • Collaborators • Editors, compilers, writers • Translators (in some cases) • Illustrators (in some cases) • Other persons associated with the work (such as the honoree in a Festschrift). • Corporate Names • Any prominently named corporate body that has involvement in the work beyond publication, distribution, etc. Information Organization and Retrieval

  30. Choice of Name • AACR II says that the predominant form of the name used in a particular author’s writings should be chosen as the form of name. • References should be made from the other forms of the name. Information Organization and Retrieval

  31. Form of the Name • When names appear in multiple forms, one form needs to be chosen. Criteria for choice are • Fullness (e.g. Full names vs. initials only) • Language of the name. • Spelling (choose predominant form) • Entry element: • John Smith or Smith, John? • Mao Zedong or Zedong, Mao? (Mao Tse Tung?) Information Organization and Retrieval

  32. Name Authority Files ID:NAFL8057230 ST:p EL:n STH:a MS:c UIP:a TD:19910821174242 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:05-14-80 RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 08-21-91 Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC 053 PR6005.R517 100 10 Creasey, John 400 10 Cooke, M. E. 400 10 Cooke, Margaret,$d1908-1973 400 10 Cooper, Henry St. John,$d1908-1973 400 00 Credo,$d1908-1973 400 10 Fecamps, Elise 400 10 Gill, Patrick,$d1908-1973 400 10 Hope, Brian,$d1908-1973 400 10 Hughes, Colin,$d1908-1973 400 10 Marsden, James 400 10 Matheson, Rodney 400 10 Ranger, Ken 400 20 St. John, Henry,$d1908-1973 400 10 Wilde, Jimmy 500 10 $wnnnc$aAshe, Gordon,$d1908-1973 Different names for the same person Information Organization and Retrieval

  33. Name Authority Files ID:NAFO9114111 ST:p EL:n STH:a MS:n UIP:a TD:19910817053048 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:06-03-91 RFE:a CSC:c SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 08-19-91 040 OCoLC$cOCoLC 100 10 Marric, J. J.,$d1908-1973 500 10 $wnnnc$aCreasey, John 663 Works by this author are entered under the name used in the item. For a listing of other names used by this author, search also under$bCrease y, John 670 OCLC 13441825: His Gideon's day, 1955$b(hdg.: Creasey, John; usage: J .J. Marric) 670 LC data base, 6/10/91$b(hdg.: Creasey, John; usage: J.J. Marric) 670 Pseuds. and nicknames dict., c1987$b(Creasey, John, 1908-1973; Britis h author; pseud.: Marric, J. J.) Information Organization and Retrieval

  34. Name authority files ID:NAFL8166762 ST:p EL:n STH:a MS:c UIP:a TD:19910604053124 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:08-20-81 RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 06-06-91 Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC 100 10 Butler, William Vivian,$d1927- 400 10 Butler, W. V.$q(William Vivian),$d1927- 400 10 Marric, J. J.,$d1927- 670 His The durable desperadoes, 1973. 670 His The young detective's handbook, c1981:$bt.p. (W.V. Butler) 670 His Gideon's way, 1986:$bCIP t.p. (William Vivian Butler writing as J .J. Marric) Different people writing with the same name Information Organization and Retrieval

  35. Other Types of Controlled Vocabularies • Gazetteers (Geographic Names) • Code lists (e.g. LC Language Codes) • Subject Heading Lists • Classification Schemes • Thesaurii Information Organization and Retrieval

  36. Storage Line Interest profiles & Queries Documents & data Information Storage and Retrieval System Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Formulating query in terms of descriptors Indexing (Descriptive and Subject) Storage of profiles Storage of Documents Store1: Profiles/ Search requests Store2: Document representations Comparison/ Matching Potentially Relevant Documents Structure of an IR System Search Line Adapted from Soergel, p. 19 Information Organization and Retrieval

  37. Uses of Controlled Vocabularies • Library Subject Headings, Classification and Authority Files. • Commercial Journal Indexing Services and databases • Yahoo, and other Web classification schemes • Online and Manual Systems within organizations • SunSolve • MacArthur Information Organization and Retrieval

  38. Types of Indexing Languages • Uncontrolled Keyword Indexing • Indexing Languages • Controlled, but not structured • Thesauri • Controlled and Structured • Classification Systems • Controlled, Structured, and Coded • Faceted Classification Systems Information Organization and Retrieval

  39. Indexing Languages • An index is a systematic guide designed to indicate topics or features of documents in order to facilitate retrieval of documents or parts of documents. • An Indexing languageis the set of terms used in an index to represent topics or features of documents, and the rules for combining or using those terms. Information Organization and Retrieval

  40. Indexing Languages • Library of Congress Subject Headings • Yellow Pages Topics • Wilson Indexes (“Reader’s Guide”) Information Organization and Retrieval

  41. Thesauri • A Thesaurus is a collection of selected vocabulary (preferred terms or descriptors) with links among Synonymous, Equivalent, Broader, Narrower and other Related Terms Information Organization and Retrieval

  42. Thesauri (cont.) • National and International Standards for Thesauri • ANSI/NISO z39.19--1994 -- American National Standard Guidelines for the Construction, Format and Management of Monolingual Thesauri • ANSI/NISO Draft Standard Z39.4-199x -- American National Standard Guidelines for Indexes in Information Retrieval • ISO 2788 -- Documentation -- Guidelines for the establishment and development of monolingual thesauri • ISO 5964-- Documentation -- Guidelines for the establishment and development of multilingual thesauri Information Organization and Retrieval

  43. Thesauri (cont.) • Examples: • The ERIC Thesaurus of Descriptors • The Art and Architecture Thesaurus • The Medical Subject Headings (MESH) of the National Library of Medicine Information Organization and Retrieval

  44. Classification Systems • A classification system is an indexing language often based on a broad ordering of topical areas. Thesauri and classification systems both use this broad ordering and maintain a structure of broader, narrower, and related topics. Classification schemes commonly use a coded notation for representing a topic and it’s place in relation to other terms. Information Organization and Retrieval

  45. Classification Systems (cont.) • Examples: • The Library of Congress Classification System • The Dewey Decimal Classification System • The ACM Computing Reviews Categories • The American Mathematical Society Classification System Information Organization and Retrieval

  46. Automatic Indexing and Classification • Automatic indexing is typically the simple deriving of keywords from a document and providing access to all of those words. • More complex Automatic Indexing Systems attempt to select controlled vocabulary terms based on terms in the document. • Automatic classification attempts to automatically group similar documents using either: • A fully automatic clustering method. • An established classification scheme and set of documents already indexed by that scheme. Information Organization and Retrieval

  47. Clustering Agglomerative methods: Polythetic, Exclusive or Overlapping, Unordered clusters are order-dependent. Doc Doc Doc Doc Doc Doc Doc Doc Rocchio’s method 1. Select initial centers (I.e. seed the space) 2. Assign docs to highest matching centers and compute centroids 3. Reassign all documents to centroid(s) Information Organization and Retrieval

  48. Automatic Class Assignment Automatic Class Assignment: Polythetic, Exclusive or Overlapping, usually ordered clusters are order-independent, usually based on an intellectually derived scheme Doc Doc Doc Doc Search Engine Doc Doc Doc 1. Create pseudo-documents representing intellectually derived classes. 2. Search using document contents 3. Obtain ranked list 4. Assign document to N categories ranked over threshold. OR assign to top-ranked category Information Organization and Retrieval

More Related