1 / 16

The Significance of Vocabulary

The Significance of Vocabulary. Michael Buckland. School of Information Management and Systems University of California, Berkeley. The Significance of Vocabulary. An economic claim: Vocabulary problems reduce the benefits and return on investment in information services.

ayoka
Télécharger la présentation

The Significance of Vocabulary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Significance of Vocabulary Michael Buckland School of Information Management and Systems University of California, Berkeley

  2. The Significance of Vocabulary • An economic claim: Vocabulary problems reduce the benefits and return on investment in information services. • Vocabulary is used for indexicality, therefore issues of identity are central to LIS. • Vocabulary is central to digital libraries. • Vocabulary central to explaining the history of conceptions of LIS!

  3. A correctly formed Library of Congress Subject heading, but who would think of such search terms? God --- Knowableness --- History of doctrines --- Early church, ca. 30-600 --- Congresses.

  4. Economic Rationale: • Massive investment in repositories • Large investment in categorization schemes: classifications, thesauri, concept codes, headings, … • Categorization schemes usually specialized and stylized • Increasingly unfamiliar to searchers, hence ineffective, inefficient use

  5. Remedy Support for searching unfamiliar metadata vocabularies: Interface to translate searcher’s vocabulary into system’s vocabulary.

  6. Examples Automobile import, export data (Census Bureau) Automobiles? No data. Cars? “Railway or tramway stock” (Passenger motor vehicles, spark ignition engine.)

  7. “Automobiles”, also know as . . . in Library of Congress Classification TL 205 in U.S. Patent Classification 180/280 in Standard Industrial Classification 3711

  8. Example: Coastal pollution F SU COASTAL POLLUTION 0 F TW COASTAL POLLUTION SUMMARIZE SUBJECTS MeSH Seawater Water pollution Bacteria Water microbiology Air pollution Environmental monitoring Bathing beaches LCSH Marine pollution Coastal zone management Water --- Pollution Petroleum industry and trade Beach erosion Coasts Barrier islands

  9. International Harmonized Commodity Classification System: “Computer” • HS 84: “Nuclear reactors, boilers, machines and mechanical appliances” • HS 8471: “Automatic data processing machines and units thereof, magnetic or optical readers, machines for transcribing data” • HS 847120: “Digital auto data proc mach contng in the same housing a CPU and input & output device”

  10. INSPEC Thesaurus subdomain-based indexes: • “Water” subdomain: Fission reactor safety; Fission reactor fuel; Polymers; Organic insulating materials; Water supply; Cable insulation; Insulation testing; and Insulating oils. • “Biology” subdomain: Water; Biomechanics; Physiological models; Neurophysiology; Cellular effects of radiation. • “Information Studies” subdomain: Agriculture; Natural resources; Forecasting theory; Operations research; Erosion.

  11. Example: Vietnam War. U.C. MELVYL Online Catalog FIND XSU VIETNAM WAR Search Results: 0 records FIND XSU VIETNAMESE CONFLICT Search Results: 4,190 records

  12. Dictionaries don’t always help Emanuel Goldberg: Aerial photography using a “Drachen” Actual meaning: Aerodynamic tethered balloon. Standard contemporary English was: Aerostat. German: Drachen (= Kite in dictionary)

  13. “Entry vocabulary” search interfaces: • Software and algorithms map natural language vocabulary to specialized metadata terms. • Allows users to enter ordinary language queries while taking advantage of existing subject headings, categorization • Uses co-occurrence statistics to link users’ ordinary language terms to system vocabularies • Statistical association between lexical items in titles and abstracts and the system’s metadata vocabulary • Suggests most likely system vocabulary

  14. Thesaurus navigation • Facilitates browsing where structure is present: Broader, narrower, related terms • Guides searcher to other parts of the structure Retrieval set analysis • Navigation within micro-domain

  15. Web access: WWW forms-based application supported by PerlSupports searches on remote repositoriesFour subdomain dictionaries in three databases--- BIOSIS (Biological abstracts): subdomain “water”--- INSPEC: subdomains: “information science”, “water” --- U.S. Patent Office classification

  16. Statement of work: • Varied prototype Entry Vocabulary Modules. • Unintrusive development of EVMs by agents • Sensitivity to subdomains. • Natural language processing to augment statistical term frequency. • Recommendations for metadata “codebooks” for numeric databases. • www.sims.berkeley.edu/metadata/

More Related