360 likes | 511 Vues
This resource delves into the concept of quality in taxonomies as explained by Dr. Claude Vogel, Founder and CTO of KM World. It discusses the principles of quality, including "best value for the money" and "nominal conformance," emphasizing standards like ISO 2788 and ISO 5964 for taxonomy development. The text explores quality-related concepts such as maintainability, flexibility, and usability within taxonomies while outlining the critical steps in taxonomy creation, including kickoff, requirements review, lexicon review, and taxonomy review.
E N D
Quality Taxonomies Dr. Claude Vogel Founder & CTO KM World 2000
Ontology / Taxonomy Static Discovery Root Ontology Taxonomy Generation Dynamic Discovery
What is Quality ? • “Best value for the money” • According to this definition, you are entitled to get high performance from a costly product; likewise a low cost product or service is expected to be a poor delivery. For example, a loose demo delivery is both predictable and acceptable, since its quality is: low conformance / low cost.
What is Quality ? • “Good Quality is Nominal Conformance” • Taxonomy Quality is defined as Taxonomy Conformance to: • Valid requirements; • Explicitly documented development standards; and, • Implicit characteristics that are expected of all professionally developed taxonomies, such as the desire for good maintainability.
Standards • ISO 2788-1986 • International Organization for Standardization. Documentation—Guidelines for the Establishment and Development of Monolingual Thesauri. 2nd ed. n.p.: ISO, 1986. (ISO 2788-1986(E)). (Available in the U.S. from American National Standards Institute) • ISO 5964-1985 • International Organization for Standardization. Documentation—Guidelines for the Establishment and Development of Multilingual Thesauri. n.p.: ISO, 1985. (ISO 5964-1985(E)). (Available in the U.S. from American National Standards Institute) • ANSI/NISO Z39.19-1993 • National Information Standards Institute. Guidelines for the Construction, Format, and Management of Monolingual Thesauri. Bethesda, MD: NISO Press, 1994. 69p. (ANSI/NISO Z39.19-1993) • SEMIO Quality Plan v1 2000 • ISO/IEC 13250 Topic Maps • RDF • Please refer to RDF at http://www.w3.org/RDF and XML at http://www/w3/org/XML
Project Plan • Kick-off • Requirements Review • Lexicon Review • Taxonomy Review • Tags Review • Final Review
1. Kick-off • Objectives • Purpose • Scope • Scale • Users • Conditions of receipt • Roles • Supplier • Customer • Admin • KE • Experts • Users • Planning • Training and Transfer
2. Requirements Review • Sources • Lexicon • Ontology • Install
Sources • Dispersion (Multiplicity, Size, Homogeneity) • Refresh • Access
Typical Patterns • Disparity • Adjust sources • Adjust crawl strategy • Isolate communities / taxonomies
Lexicon • Vocabularies, etc. • Substitutions: Acronyms, Synonyms, etc. • Preferred Keywords: Brand Names, etc. • Banned Keywords
Typical Patterns • Lack of requirements • Use Librarian Resources
Ontology • Thesaurus ? • Is the information domain analysis complete, consistent, and accurate ? • Is the partitioning of the problem complete ?
Typical Patterns • Directory versus Taxonomy • Isolate “directory” branches • Thesaurus versus Taxonomy • Put an ontology on top of thesaurus • Check ASAP match of thesaurus generics with extracted lexicon • Very high level design for top categories requirements • Plan to work bottom-up • See also Taxonomy (functions, combinations, etc.)
Install • Implementation / Integration: • Are external and internal interfaces properly defined? • Are all requirements traceable to the system level? • Has prototyping been conducted for the user/customer? • Is performance achievable within the constraints imposed by other system elements? • Are requirements consistent with schedule, resources, and budget?
Typical Patterns • Scale • Security • Missing Documents
3. Lexicon Review • Coverage • Extracted words / Words • (Extracted Index / Index) • Sources bench-marking • Coverage • Extraction quality • Topic distribution • Structure • Most Frequent Phrases • Most Productive Generics • Substitutions • Exceptions
Typical Patterns • Low level of frequency / quality for the most meaningful content • Increase size of value corpus • Filter and re-import lexicon
4. Taxonomy Review • Taxonomy Operation • Correctness • Reliability • Usability • Integrity • Efficiency • Taxonomy Revision • Maintainability • Flexibility • Testability • Taxonomy Transition • Portability • Reusability • Interoperability
Tax Liability Loan Term loan Short-term loan Folk Taxonomies Design The Berlin and Kay model: Taxonomy = Nomenclature + Terminology Unique Beginner Life Form Generic Specific Varietal
Correctness • Accuracy • Completeness • Consistency
Accuracy Precision Recall
Completeness Taxonomy Maps Lexicon Collection
Tagging Taxonomy Maps Lexicon Document Collection Concentration Works Against Quality • Tagging Coverage • Ontology Coverage • Hook Coverage • Map Coverage • Lexical Coverage • Collection Coverage
Consistency:Typical Patterns • Objectivization • Hyperonymy • Speciation • Necessity
Employment Firing Hiring Salaries Avoid functional categories Don’t mix functions / objects Exhaust scripts Match idiomatic phrases Objectivization
Parts Air Conditioning Belts and Hoses Body Brake System Chassis Engine Exhaust System Fuel System Glass Ignition Avoid meronymy Don’t mix meronymy / hyperonymy Exhaust prototypes Genericity
Person Unwelcome person Unpleasant person Selfish person Opportunist Backscratcher Avoid “strings” of categories Avoid (non-idioms) properties for categories Speciation (WordNet)
Necessity • Avoid non-productive categories • Avoid combinations of categories
lf lf lf g g g 1 2 n 1 2 i g g g g g g s s s s s s 4 3 4 5 6 m n 1 2 3 s s s s 5 6 7 8 v v 1 2 Nomenclature (Design Structure) Quality Index Balance UB Level 0 Level 1 Depth Level 2 i j Level 3 UB = unique beginner lf = life-form g = generic s = specific v = varietal Level 4 Width
Complexity Index • Cyclometric complexity increases with number of Cross References within the Taxonomy, giving an indication of complexity and difficulty of testing. • Taxonomy Complexity Index combines: • autonomy • closure • similarity • typicality • commonality • redundancy • stability
Maturity index • The IEEE standard 982.1-1988 suggests a taxonomy maturity index to provide an indication of the stability of the taxonomy . • Maturity Index combines: • number of modules in current ontology / taxonomy. • number of modules in current ontology / taxonomy that have been changed. • number of modules added to current ontology / taxonomy. • number of modules deleted from the previous version of the ontology / taxonomy.
5. Tags Review • Document coverage • Concepts coverage <tagset> <document> <docurl>http://www.TaxSource.com</docurl> <tag> <tagname>Liability</tagname> <weight>1.289</weight> </tag> <tag> <tagname>Federal Funds</tagname> <weight>0.746</weight> </tag> </document></tagset>
6. Final Review • Receipt • Maintenance
Quality Taxonomies Claude Vogel cvogel@semio.com KM World 2000