1 / 69

Taxonomies: Insuring compatibility and crosswalks

Taxonomies: Insuring compatibility and crosswalks. Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com. Background .

talbot
Télécharger la présentation

Taxonomies: Insuring compatibility and crosswalks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Taxonomies:Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com

  2. Background • "Underlying the information architecture for web sites and search are taxonomies. The standards for thesauri, taxonomies, ontologies, semantic web and topic maps are converging.  • Where do they differ and where are they the same? • This one hour talk will cover the ISO ANSI/NISO and W3C terminology and controlled vocabulary standards, as well as the differences in the new standards compared to the previous editions. • Finally it will talk about the crosswalks and registries underway between these development communities."

  3. What we will cover today • Background • Overview of standards • Specifics on 3 things • NISO Z39.19 • BSI 8723 • IFLA • Thoughts on a registry

  4. Why are taxonomies hot? • Search doesn’t work • Without tagged data • Websites need them to display information • To tag navigation back to content

  5. What’s happening to the business? • Carpet baggers • Differences of opinion • Want to build on existing taxonomies • Need for standards • Need for cross walks • Need for international communication • Need for general registries of taxonomies

  6. The Problem – KEEPING UP • Many players we know and don’t know • Between controlled vocabulary standards • ISO 2788 and 5964, • BSI 8723 • Groups developing guidelines and standards • W3C with SKOS and OWL • Governments world wide developing and mandating taxonomies • Communities • increase reuse • mapping interoperability between controlled vocabularies. 

  7. Traditional Standards • ISO • TC 46 • SC 9 • ANSI • NISO • Z39.19 • BSI • BS 8723 • W3C • OWL • SKOS • US Government • Office of Management and Budget • European Union

  8. Thesaurus related • NISO Z39.19 2006 www.niso.org • BSI (BS 8723) the next revised ISO • ISO 2788 - Monolingual (1986) • ISO 5964 - Multilingual (1985) www.iso.ch/iso/en/ISOOnline.frontpage • ISO 5127, Information and documentation  Vocabulary • OWL from W3C • SKOS the W3C thesaurus standard

  9. Thesaurus and Indexing Standards – ANSI/NISO • ANSI/NISO Z39.19 - 2003 Guidelines for the Construction, Format, and Management of Monolingual Thesauri • NISO Z39.19-200x Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies • NISO TR02-1997 Guidelines for Indexes and Related Information Retrieval Devicesby James D. Anderson

  10. The standards • NISO Z39.19 2006 www.niso.org • BSI (BS 8723) - the next revised ISO • ISO 2788 - Monolingual (1986) • ISO 5964 - Multilingual (1985) www.iso.ch/iso/en/ISOOnline.frontpage • ISO 5127 - Information and documentation  Vocabulary • OWL from W3C • SKOS - the W3C thesaurus standard

  11. The old standard Coverage documents Types of vocabularies Thesauri Single BT Post-coordinated Printed formats Monolingual vocabularies The revised standard Coverage Content objects Types of vocabularies lists, synonym rings, taxonomy Pre-coordinated Web format Multilingual vocabularies (general) Polyheirachical Interoperability Facet analysis Z39.19 - What’s new?

  12. British Standards - BS 8723 • Structured vocabularies for information retrieval – Guide • Part 1: General • Part 2: Thesauri • Part 3: Vocabularies other than thesauri • Part 4: Interoperability between vocabularies • Part 5: Interoperability with applications

  13. ISO TC 37 Scope of ISO TC 37: Standardization of principles, methods and applications relating to terminology and other language resources. • TC 37/SC 1 - Principles and methods • TC 37/SC 2 - Terminography and lexicography • TC 37/SC 3 - Computer applications for terminology • TC 37/SC 4 - Language resource management

  14. Other ISO standards: Concept-oriented terminology ISO 704:2000 Terminology work - Principles and methods ISO 860:1996 Terminology work - Harmonization of concepts and terms ISO 1087-1:2000 Terminology work - Vocabulary - Part 1: Theory and application ISO 1087-2:2000 Terminology work - Vocabulary - Part 2: Computer applications ISO 10241:1992 Preparation and layout of international terminology standards

  15. Sample ISO - Data Categories • ISO 12200:1999 Computer applications in terminology - Machine-readable terminology interchange format (MARTIF) - Negotiated interchangeISO 12616:2002 Translation-oriented terminographyISO/TR 12618:1994 Computer aids in terminology - Creation and use of terminological databases and text corpora ISO 12620:1999 Computer applications in terminology - Data categories • used to create glossaries

  16. ISOThesaurus and Indexing Standards • ISO 2788:1986 Documentation - Guidelines for the establishment and development of monolingual thesauri • ISO 5964:1985 Documentation - Guidelines for the establishment and development of multilingual thesauri • ISO 5963:1985 Documentation - Methods for examining documents, determining their subjects, and selecting indexing terms • ISO 999:1996 Information and documentation - Guidelines for the content, organization and presentation of indexes

  17. ISO TC 46/SC 9 • Information and Documentation - Identification and Description • TC 46 is ISO's Technical Committee (TC) for information and documentation standards. • SC 9 is the TC 46 Subcommittee (SC) that develops and maintains ISO standards on the identification and description of information resources.

  18. ANSI/NISO Thesaurus and Indexing Standards • ANSI/NISO Z39.19 - 2005 Guidelines for the Construction, Format, and Management of Monolingual Thesauri • NISO Z39.19-200x Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies • NISO TR02-1997 Guidelines for Indexes and Related Information Retrieval Devicesby James D. Anderson

  19. Reports to use • Report on the Workshop on Electronic Thesauri, November 4-5, 1999 http://www.niso.org/news/events_workshops/thes99rprt.html • Final Report to the ALCTS/CCS Subject Analysis Committee: Subcommittee on Subject Relationships/Reference StructuresJune 1997 http://archive.ala.org/alcts/organization/ccs/sac/rpt97rev.html

  20. Other links • http://esw.w3.org/topic/SkosDev/ThesaurusLinks/XmlFormats • MARC-21 XMLSchema. • Zthes Z39.50 profile for thesaurus navigation (2001). • TML thesaurus markup language (1999). • ADL Thesaurus Protocol XML formats (2002). • MeSH XML format (2001). • GEMET XML format (2003). • APAIS XML thesaurus format, an extension of Zthes (2000). • Open University thesaurus schemas (2002). • Soergel XML thesaurus specification (2001).

  21. W3C • OWL – Web Ontology Language • RDF – Resource Description Format • Topic Maps • SKOS - Simple Knowledge Organization Systems • Which community to serve? • Build on the current standard • Might make this link next

  22. Other things to watch • Other W3C and ISO areas • Support groups • Blogs • Communities of Practice • SIMILE • Web 2.0 activities • WSDL – Web Services Digital Library

  23. Other Relevant ISO & W3C Standards For translation, terminology and applied linguists go to: http://appling.kent.edu/ResourcePages/LTStandards/Chart/standards.chart.htm#Ontology • Markup Languages • Metadata Resources • Character Coding • Access Protocols and Interoperability • Content Creation, Manipulation, and Maintenance • Authoring Standards • Text and Content Markup • Translation Standards • Terminology and Lexicography Standards • ISO TC 37 Standards • Terminology Interchange Standards • Controlled Language Standards • Taxonomy and Ontology Standards • Corpus Management Standards • Locale-Related Standards

  24. SIMILE • Semantic Interoperability of Metadata and Information in unLike Environments • Forming a data reference for open source taxonomies

  25. Revised Standards for Controlled Vocabularies U.S. Standard (NISO Z39.19 - 2005) British Standard (BS 8723 - 2005) IFLA Guidelines - 2005

  26. U.S. Standard for Controlled Vocabularies – NISO Z39.19 NISO Z39.19-200x Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies Some of the slides are based on Emily Fayen 2004.6 SLA presentation, Margie Hlava’s talk at 2005 Data Harmony User Group meeting 2005 and Marcia Zeng – NKOS Meeting in Denver

  27. A little bit history… • ANSI/NISO Z39.19,Guidelines for the Construction, Format, and Management of Monolingual Thesauri – 1993 • The most frequently requested NISO Standard • In spite of its age the Standard is still relevant • 1999: NISO Workshop on Electronic Thesauri http://www.niso.org/news/events_workshop/thes99rpt.html • 2002: NISO initiates revision of Z39.19 • 2004: 1993 reaffirmed • 2005 new standard published

  28. Scope • Expand beyond thesaurus • Make more user-friendly • Explain important concepts • Explain principles of vocabulary control • Include electronic information environment • Include additional user search methods: • Browse • Navigate • Keyword searching • Expand beyond A & I services • Include Web applications

  29. The Team: • Vivian Bliss – Microsoft • Carol Brent – ProQuest • John Dickert – DTIC • Lynn El-Hoshy – Library of Congress • Marjorie Hlava – Access Innovations • Stephen Hearn – ALA • Sabine Kuhn – Chemical Abstracts Service • Pat Kuhr – H.W. Wilson Company • Diane McKerlie – DMA Consulting • Peter Morville -- Semantic Studios • Stuart Nelson – National Library of Medicine • Allan Savage – National Library of Medicine • Diane Vizine-Goetz – OCLC • Marcia Lei Zeng – Special Libraries Association

  30. Introduction Scope Referenced Standards Definitions, Abbreviations, and Acronyms Controlled Vocabularies – Purpose, Concepts, Principles, and Structure Term Choice, Scope, and Form Compound Terms Relationships Displaying Controlled Vocabularies Interoperability Construction, Testing, Maintenance, and Management Systems Z39.19 Chapters

  31. The old standard Coverage documents Types of vocabularies Thesauri Single BT Post-coordinated Printed formats Monolingual vocabularies The revised standard Coverage Content objects Types of vocabularies lists, synonym rings, taxonomy Pre-coordinated Web format Multilingual vocabularies (general) Poly hierarchical Interoperability Facet analysis Z39.19 - What’s new?

  32. Principles of Controlled Vocabularies • There are four important principles of vocabulary control that guide their design and development.• eliminating ambiguity• controlling synonyms• establishing relationships among terms where appropriate• testing and validation of terms

  33. Type of vocabulary control

  34. Lists A list is a simple group of terms Example: Alabama Alaska Arkansas California Colorado . . . . Frequently used in Web site pick lists and pull down menus

  35. Synonym Rings A synonym ring is a list of synonyms or near synonyms that are used interchangeably for retrieval purposes

  36. Synonym rings are usually found as sets of lists that allow users to access all content containing any of the terms. e.g., cholesterol: Cholesterol Blood Cholesterol Serum Cholesterol Good Cholesterol Bad Cholesterol LDL . . . Synonym Rings-- Examples -- Frequently used in systems where the content is not indexed or the indexing vocabulary is not controlled

  37. An example from International SEMATECH; a search for Silicon would look like this: Your search was submitted as “SILICON” or “SI”

  38. Synonym Rings are used-- • To expand queries for content objects. • any one of these terms retrieves any of the terms in the cluster. • With unstructured natural language format, • interface draws together similar terms • With search engines • Help control of the diversity of the language

  39. Taxonomies A taxonomy is a set of preferred terms, all connected by a hierarchy or polyhierarchy Example: Chemistry Organic chemistry Polymer chemistry Nylon Frequently used in web navigation systems

  40. Thesauri A thesaurus is a controlled vocabulary with multiple types of relationships Example: Rice UF paddy BT Cereals BT Plant products NT Brown rice RT Rice straw

  41. Thesauri (cont.) Relationship types: • Equivalence (Use/Used For) – indicates preferred term in a synonym relationship • Hierarchy – indicates broader and narrower terms • Associative – almost unlimited types of relationships may be used - related It is the most complex format for controlled vocabularies and widely used.

  42. Interoperability • One of the most important issues from the 1999 workshop • Question: How to • compare indexes • perform searches • merge databases that have been developed using different controlled vocabularies?

  43. Interoperability (CONT.) • Factors Affecting Interoperability • Multilingual Controlled Vocabularies • Searching • Indexing • Merging Databases • Merging Controlled Vocabularies • Achieving Interoperability • Storage and Maintenance of Relationships among Terms in Multiple Controlled Vocabularies

  44. II. The British Standard BS 8723: Structured Vocabularies for Information Retrieval – Guide Slides based on the presentation by Stella G Dextre Clarke, Alan Gilchrist ,Leonard Will In ISKO 2004, London

  45. Existing BSI/ISO thesaurus standards • ISO 2788-1986 Guidelines for the establishment and development of monolingual thesauri = BS 5723:1987 • ISO 5964-1985 Guidelines for the establishment and development of multilingual thesauri = BS 6723:1985

  46. What needs updating? • Printed versus electronic application • Guidance on management software • Interoperability: • Mapping between thesauri and other types of vocabulary • Formats/protocols for data exchange with downstream applications • Applicability to end-user applications, not just those for information professionals

  47. Outline of new standard BS 8723: Structured vocabularies for information retrieval – Guide • Part 1 - Definitions, symbols and abbreviations • Part 2 – Thesauri • Part 3 - Vocabularies other than thesauri; • Part 4 - Interoperability between vocabularies • Part 5 - Interoperation between vocabularies and other components of information storage and retrieval systems

  48. Part 3 chapters • Classification schemes • Subject heading lists • Taxonomies • Ontologies • Semantic nets (?) • Search thesauri

  49. Issues for Part 3 • How much guidance is needed on how to build other sorts of vocabulary? • Should we describe the idiosyncrasies of existing schemes, even where we judge there is a ‘better’ way? • Pick out the characteristics of different vocabulary types that govern when and how you can map them. • But some of the observable characteristics might not be what we’d recommend.

  50. Part 4: Interoperability between vocabularies • Huge demand for accessing information • indexed with another language and/or vocabulary. • ‘Mapping’. The Semantic Web is just one application. • Includes multilingual thesauri • special case of mapping between vocabularies. • Applies where • more than one language or vocabulary is in use, • access to all resources is through one vocabulary

More Related