1 / 29

Building Taxonomies Part 3

Building Taxonomies Part 3. Alice Redmond-Neal Access Innovations, Inc . Enterprise Search Summit New York City, May 21, 2006. Build a taxonomy – simple steps. Get paper and pencil Sharpen pencil Define subject field Collect terms Organize terms Fill in gaps

Télécharger la présentation

Building Taxonomies Part 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Taxonomies Part 3 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City, May 21, 2006

  2. Build a taxonomy – simple steps • Get paper and pencil • Sharpen pencil • Define subject field • Collect terms • Organize terms • Fill in gaps • Flesh out and interrelate terms You’re done!

  3. Define subject field Sociology Psychology Education Law • Review representative collection of content • Determine: • Core areas • Peripheral topics • Scope can be modified later

  4. Before you go on: Build or buy? • Survey existing thesaurus/taxonomy resources for your domain • Test for • Scope • Depth • Make-or-break terms • Cost Don’t reinvent the wheel!

  5. Collect terms • Your documents and databases • Departmental terminology • Text books and their indexes (indices) • Book tables of contents and indexes • Journal quarterly indexes • Encyclopediae • Lexicons, glossaries on the topic • Web resources • Users and experts • Search logs

  6. Gather terms from search logs Beyond the Spider: The Accidental Thesaurus(Richard Wiggins, Information Today, Oct 2002) Top ~100 search terms from search logs Match to web site with appropriate answer Basis for favorites or best bets, presented at the top of results list. (AKA behavior-based taxonomy) Not a thesaurus or taxonomy, but still a useful source of terms.

  7. Organize terms – roughly • Sort terms into several major categories – logical groups of similar concepts as Top Terms • Identify core areas and peripheral topics • 10 – 20 to start • Consider moving proper names to authority files • Result: loose collection of terms under several main headings • Rough and tentative – see how it fits as you go • Initial gap analysis • Add / modify / delete as needed

  8. Labelling a concept – cognitive linguistics • Most-used labels are middle in range from abstract to specific --- relates to search • Linguistic universal – true across cultures • Unique beginner • Life form • Generic • Specific • Varietal Practical application? Insurance Health insurance Group health insurance

  9. Craft the Top Terms • Toughest job and most important step! • Dictates further organization • Determines how browsers/searchers perceive the taxonomy • Coverage • Formality • Establish the concept first, tweak the wording later

  10. The term record = subject term, heading, node, category, descriptor, class • Main Term (MT) • Top Term (TT) • Broader Terms (BT) • Narrower Terms (NT) • Related Terms (RT) • See also (SA) • Scope Note (SN) • History (H) • NonPreferred Term (NP) • Used for (UF), See (S) TAXONOMY THESAURUS see Lexicographer’s lexicon

  11. Usefulness of a term – the “duh” factor • Some terms are so basic for a domain that they have little or no value • “Sports” in Sports Illustrated • “Technology” in Technology Review • “Golf” in Golf Magazine • How useful will the term be for indexing? • Apply to everything in the domain? • Distinguish important concepts? • If term is needed, specify limited use conditions in Scope Note

  12. Hierarchy structures – variations on a theme • Not pre-determined • Winestypevarietyregioncost • Or Winescosttype…. • Varies by user group and needs • May have multiple views of same content • Standard alpha view or customized notation • Affects information architecture, i.e. how web site functions

  13. How do terms relate? • Hierarchical relationships -- Parents and their children • Equivalence relationships -- Aliases • Associative relationships -- Cousins TAXONOMY THESAURUS

  14. Hierarchical relationships • Broader Term represents the category • Narrower Term represents the specific • Three types: • Generic relationship (BTG/NTG) • Whole-part relationship (BTP/NTP) • Instance relationship (BTI/NTI) • BTs/NTs have a reciprocal relationship

  15. Broader to Narrower Terms Politics Elections Presidential elections Gubernatorial elections Mayoral elections Generic Specific Varietal

  16. Hierarchy – Generic (genus-species) relationship • Inheritance or inclusion – what’s true of the parent (BT) is true for all children (NTs) • Applies to entities, actions, properties, agents – not just biological taxonomies Value Teachers Thinking Cultural value Adult educators Contemplation Economic value School teachers Divergent thinking Moral value Special ed teachers Lateral thinking Social value Student teachers Reasoning

  17. Generic relationship test – 1 • Both terms in same fundamental category • “All-and-some” test Rodents SOME ALL Squirrels Pests SOME NOT ALL Squirrels

  18. Generic relationship test – 2 Pests Squirrels Rodents • ALL squirrels are rodents • NOT ALL squirrels are pests • NOT ALL pests are rodents

  19. Hierarchy – Whole-part relationship • Also known as meronymy or partonomy • Four types allowed in thesaurus standards • Body systems and organs • Ear  Middle ear • Geographical locations • Bernalillo County  Albuquerque • Fields of study • Geology  Physical geology • Hierarchical organizational/corporate/social/political structures • Diocese  Parish

  20. Hierarchy – Instance relationship • General category (common noun) = BT • Individual example (proper noun) = NT Seas New York museums Baltic Sea Guggenheim Museum Caspian Sea Museum of Modern Art Mediterranean Sea Museum of Natural History Essentially identical to “final node” in taxonomies. Best practice: long list  move to authority file

  21. Polyhierarchical relationship • Term can logically fit under more than one Broader Term – can have Multiple Broader Terms (MBT) • New to ANSI/NISO standards Spoons Forks Sporks Sporks Nurses Health administrators Nurse administrators Nurse administrators Finance Careers Accounting Accounting

  22. Equivalence relationship • Preferred Term • Thesaurus term and valid for indexing • Thesaurus notation: USE • NonPreferred Term • Not valid for indexing • An alias or imposter • Entry point, directs user to Preferred Term • Thesaurus notation: UF or NPT Spiders Plant pathology UF Arachnids USE Phytopathology

  23. Equivalence – when to use • Synonyms, slang, quasi-synonyms • Scientific and trade names • Ibubrofen UF Motrin™ • Lexical variants • Fiber optics UF Fibre optics • Mouse UF Mice • Upward posting of narrow concepts not specified in taxonomy or thesaurus • Social class UF Elite, Middle class, Working class Get equivalent terms from search logs, brainstorming…

  24. Associative relationship • Related Terms (RTs) ~ cousins • “…terms related conceptually but not hierarchically, and are not part of an equivalence set” (i.e. not synonyms) • Should siblings be Related Terms?? • Both terms are valid thesaurus terms for indexing, and have reciprocal relationship • Expands user’s awareness, reflects thesaurus coverage of unanticipated areas • Standards describe specific types (see Lexicon)

  25. Sibling rivalry and facets • Format and sense of sibling terms should be consistent • If siblings don’t coexist well, separate them • Subdivide large groups of terms into facets, mutually exclusive subcategories • Growing demand with faceted navigation • Facet examples • Properties, Materials, Agents, Actions, Influence • Objects, Styles and periods, Color, Shape (Art & Architecture Thesaurus)

  26. Faceted classification • Pharmaceuticals • (by action) • Anti-inflammatory agents… • (by chemical structure) • Alkaloids… • (by indication) • Pain… • (by use) • Immunosuppression… Facet indicators (aka Node labels), not to be used for indexing

  27. Faceting challenge Propose facet indicators and subgroup these paint varieties into facets. • Paint • Oil paint • High-gloss paint • Interior paint • Matte paint • Latex paint • Semi-gloss paint • Exterior paint

  28. Do you agree? • Paint • (by type) • Oil paint • Latex paint • (by use) • Interior paint • Exterior paint • (by surface) • High-gloss paint • Matte paint • Semi-gloss paint

  29. Scope Notes (SN) • Indicate meaning of the term in the context of this thesaurus, for this audience • Stress – Metal, Psychological, Physiological • Indicate any restriction in meaning • Indicate range of topics covered • Provide direction for indexers; for terms often confused, may suggest an alternative term • Use only as needed – not for every term • Establish and stick with consistent format • Be concise

More Related