1 / 40

Ontology Best Practices

Ontology Best Practices. A Software Developer’s View. Software Development Today…. Many different software languages and programs Wide variety of domains and architectures Each software language has a single syntax Children are writing software programs

oki
Télécharger la présentation

Ontology Best Practices

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontology Best Practices A Software Developer’s View

  2. Software Development Today… • Many different software languages and programs • Wide variety of domains and architectures • Each software language has a single syntax • Children are writing software programs • Software tools range from simplistic to very powerful • Successful software programs designed by lead architect • Support tools for software development are plentiful • Software development has existed for a few decades

  3. Ontologies in the Semantic Web Today… • Many different ontologies, no two alike • Different syntaxes (Turtle, RDF, etc.) • Ontology nuances are poorly understood, even by “experts” • Very limited inferencing • Inferencing mostly limited to syllogisms • Usually require an ontology specialist and domain expert • Frequently designed by committee • The concept of ontologies has been around for hundreds of years Why can’t we get it right?

  4. Similarities between Ontologies and Software • Ontologies define classes of things, e.g., Person • Ontologies define properties associated with the classes of things • Ontologies allow for Individuals – members of classes with particular property values • Classes inherit information from parent classes • (Object Oriented) Programming defines classes of things • Programming defines properties and methods associated with each class • Programming uses instances of classes with particular property values • Classes inherit information (data and methods) from super-classes

  5. Ontology Example(OWL/XML syntax) <owl:Classrdf:about="#Opera"> <rdfs:subClassOfrdf:resource="#MusicDrama"/> </owl:Class> <owl:ObjectPropertyrdf:ID=“hasComposer"> <rdfs:domainrdf:resource="#MusicDrama" /> <rdfs:rangerdf:resource="#Composer" /> </owl:ObjectProperty> <owl:DatatypePropertyrdf:ID=“numberOfActs"> <rdf:typerdf:resource="&owl;FunctionalProperty" /> <rdfs:domainrdf:resource="#MusicDrama" /> <rdf:rangerdf:resource="&xsd;positiveInteger"/> </owl:DatatypeProperty> … <Opera rdf:ID="Tosca"> <hasComposerrdf:resource="#Giacomo_Puccini"/> <hasLibrettistrdf:resource="#Victorien_Sardou"/> <hasLibrettistrdf:resource="#Giuseppe_Giacosa"/> <hasLibrettistrdf:resource="#Luigi_Illica"/> <premiereDaterdf:datatype="&xsd;date">1900-01-14</premiereDate> <premierePlacerdf:resource="#Roma"/> <numberOfActsrdf:datatype="&xsd;positiveInteger">3</numberOfActs> </Opera>

  6. Software Example (Java) public class Opera extends MusicDrama { private Composer composer; private Set<Librettist> librettistSet = new HashSet<Librettist>(); private Date premiereDate; private City premierePlace; private intnumberOfActs; public void setComposer(Composer theComposer) { composer = theComposer; } public void addLibrettist(Librettist librettist) { librettistSet.add(librettist); } public void setNumberOfActs(intnumActs) { if (numActs < 1) { throw new IllegalArgumentException(“Number of Acts must be positive”); } numberOfActs = numActs; } … }

  7. Challenge 1: Ontology Syntax & Language • As of OWL2, there are 5 different syntaxes for OWL • 3 different OWL levels (Lite, DL, Full), plus OWL2 • Ontologies can be specified in any level, any syntax • But tools are expected to handle them • Conversely, there is one Java language and syntax, one Javascript syntax, etc. • Some earlier languages, e.g., Fortran, had different versions, but normalized to a single one • Java compiler or tools not expected to handle C programs or Javascript programs Reduce the number of languages, levels Normalize on a common language Language Standardization leads to wider adoption

  8. Challenge 2: Ontology Inclusion <rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns# xmlns:rdfs=http://www.w3.org/2000/01/rdf-schema# xmlns:owl=”http://www.w3.org/2002/07/owl#” xmlns:dc=“http://dublincore.org/documents/dcmi-namespace/” xmlns:foaf=“http://xmlns.com/foaf/0.1/”> • Namespace reference NOT the same as importing • To import another ontology, use <owl:import…> • Further confusion: Namespace URLs do not necessarily indicate location of ontology definition • DublinCore (dc) links to Namespace Policy Document • FOAF links to Vocabulary Specification • No well-developed system or suite of upper level ontologies • Cannot easily find a basic ontology in a particular domain • Lots of reinventing of the wheel… • Increases cost of implementation • Barrier to Entry • Limits rate of adoption

  9. Challenge 2: Ontology Inclusion, continued • Conversely, importing references and definitions is simple in programming languages: package ontology.example; import java.text.SimpleDateFormat; import java.util.*; • Package declaration defines location of defined class; imports define path to imported class definitions • Repositories for common libraries enable reuse (Apache, Sourceforge, etc.) • Could be better organized Make importing consistent, easier Use path to ontology definition as namespace URL Create Repositories for upper-level ontologies

  10. Challenge 3: Poor Documentation • OWL standard stretched across many documents • Written by/for standards committee • W3C documents more focused on syntactic correctness than usefulness • Confusing and intimidating to novices • No consistent examples • Difficult to find OWL2 examples of Property Chain Inclusion in RDF/XML syntax • Multiple techniques for defining the same thing, even within a particular syntax • Leaves developers wondering which one is best • Inhibits learning curve • Affects adoption rate

  11. Challenge 3: Java Documentation Examples

  12. Challenge 3: Poor Documentation, continued • Conversely, software development has huge support. • Java community: many online tutorials (particularly at the official Java site), examples, etc. • Finding help is easy • Well-defined documentation standards • Consistent style, recommendations • Documentation at multiple levels: • Online tutorials • Code and library documentation (e.g., Javadoc) • Freeware as examples (or solutions) Documentation, Examples severely needed

  13. Challenge 4: Unfriendly Naming Conventions & Style • Some ontologies use incomprehensible class names, property names, or individuals • mesh:A01.378.800.667.430.705 – what is it? • Hinders reuse by related ontologies • Prevents adoption of ontology • Many different ways to define/declare information about a Class • Many different ways to organize contents of ontology • Classes then Properties • Properties related to classes near class definitions

  14. Challenge 4: Unfriendly Naming Conventions & Style • Some ontologies use incomprehensible class names, property names, or individuals • mesh:A01.378.800.667.430.705 – what is it? • mesh:Thumb • Hinders reuse by related ontologies • Prevents adoption of ontology • Many different ways to define/declare information about a Class • Many different ways to organize contents of ontology • Classes then Properties • Properties related to classes near class definitions

  15. Challenge 4: Unfriendly Naming Conventions & Style, continued • Conversely, software developers usually follow style guides • Define naming conventions for classes (nouns, camel case with initial capital) and methods (verb phrases, camel case, initial lowercase) • Classes, methods, and variables should be named to indicate what they are • Established style elements • e.g., the preferred way to use an iterator • Possibly reinforced by code sharing • Posting code publically for help debugging it • Common code organizational style Class & Property names should indicate purpose

  16. Challenge 5: Awkward/Limited Properties • Enumerated Datatypes <owl:DatatypePropertyrdf:ID="tennisGameScore"> <rdfs:range> <owl:DataRange> <owl:oneOf> <rdf:List> <rdf:firstrdf:datatype="&xsd;integer">0</rdf:first> <rdf:rest> <rdf:List> <rdf:firstrdf:datatype="&xsd;integer">15</rdf:first> <rdf:rest> <rdf:List> <rdf:firstrdf:datatype="&xsd;integer">30</rdf:first> • Single Value Properties • Functional Property… or… <owl:Classrdf:ID="Operetta"> <rdfs:subClassOfrdf:resource="#MusicalWork"/> <rdfs:subClassOf> <owl:Restriction> <owl:onPropertyrdf:resource="#hasLibrettist" /> <owl:minCardinalityrdf:datatype="&xsd;nonNegativeInteger">1</owl:minCardinality> </owl:Restriction>

  17. Challenge 5: Awkward/Limited Properties • No way to specify range of values on properties • Defining quantities with units <Measurement> <observedSubjectrdf:resource="#JaneDoe"/> <observedPhenomenonrdf:resource="#Weight"/> <observedValue> <Quantity> <quantityValuerdf:datatype="&xsd;float">59.5</quantityValue> <quantityUnitrdf:resource="#Kilogram"/> </Quantity> </observedValue> <timeStamprdf:datatype="&xsd;dateTime">2003-01-24T09:00:08+01:00</timeStamp> </Measurement> • No way to have values that vary over time • Person hasLocation ??? • Properties of properties would solve this • But it could create as many messes… Make Properties more useful

  18. Challenge 6: Open-World Assumption A curious thing about the ontological problem is its simplicity. It can be put in three Anglo-Saxon monosyllables: 'What is there? ' It can be answered, moreover, in a word--'Everything.‘ - Willard Van OrmanQuine • OWL uses the Open World Assumption (OWA): If a fact cannot be determined, it is undefined • Individuals can potentially belong to multiple classes, even those which should be distinct. • Contrary to normal human thought processes • “A Person cannot be a Car too!” • Implementing distinct classes or property restrictions can be computationally expensive • N distinct classes results in an Order n2 operation • Adding a property restriction on a class creates more classes

  19. Challenge 6: Open-World Assumption • OWA is reasonable for some domains, but not others • Causes severe challenges in inferencing • Some software products allow turning off OWA • But this requires the developer to implicitly know ontology assumptions (OWA or not) • Conversely, Software programs define individuals explicitly as members of particular classes • Cannot be a member of a different class (except superclasses) • Properties (class variables) have default values • Specifically either single or multi-valued Support for exclusive classes, default values Ontology could specify OWA/CWA handling

  20. Challenge 7: The Data Challenge • Ontologies allow for Classes, Properties, Things • Confusing what is an Individual vs. a Class • OWL-Full specifically promotes this confusion! • Some ontologies include large number of Individuals in its OWL file • But we’re building for the Semantic WEB • No one wants to wait 10 minutes to download/access an ontology! • Many ontologies separate the Assertions (Abox) from the Terminology (Tbox) • Abox=individuals, Tbox=classes, properties • But this isn’t part of OWL standard!

  21. Challenge 7: The Data Challenge, continued • Software programs define classes, but create instances of those classes at run-time. • Data typically stored outside of software programs, accessed at run-time • Model-View-Controller software pattern separates Model (the data) from the Control (the logic manipulating the data) and the View (the presentation of the information) • Databases: separate data (tables) from data model (schema) Separate Individuals from Classes and Properties Make Class/Property Definitions web-accessible Don’t put Individuals in web-accessible OWL file Access Individuals through SPARQL

  22. Challenge 8: Inconsistent Reasoners • Different Reasoner implementations yield different results on same ontology • OWA/CWA • Different rule implementations • Performance optimization • Causes ontologies to be dependent on choice of reasoner • If you do not use the same reasoner as the ontology developer, you may not get expected results • Help forums filled with “try this reasoner instead…” • No way for an ontology to specify its reasoner requirements

  23. Challenge 8: Inconsistent Reasoners, continued • In software development, when two different compilers or software versions give different answers, this is a BUG! • Software testers develop test suites to verify proper functionality • Testing typically evaluates as many aspects of the software program as possible Need consistent reasoners Validation suites Document expected behavior Need a mechanism that allows ontologies to define their reasoner requirements Again, lack of standardization will limit adoption

  24. Challenge 9: Ontology Development Tools • Many different ontology development tools • Different tools support different syntaxes, ontology levels • Ontology editors mostly by commercial vendors • Protégé offers an open-source ontology editor • Tools mostly stand-alone applications • NeOn, Top Braid Composer uses Eclipse, but as a standalone application • Tool Installation frequently challenging • Validation tools lacking • Need to validate whether ontology is well-constructed

  25. Ontology Tools Protégé 4 Top Braid Composer Xturtle

  26. Challenge 9: Ontology Development Tools • Software developers have largely migrated to development platforms – particularly Eclipse • Easy to install, with automated updates • Automatically compiles code, validating, as you edit • Freeware • Eclipse also can include plugins for editing XML, HTML, connecting to databases, configuring web servers, and much more • Why do OWL editors stand apart? Integrate ontology editing with other editors Would link ontology to other development tasks

  27. Challenge 9: Inferencing and Reasoning • Ontology inferencing very limited • OWL implies class-superclass inference: “if X is a truck then X is a vehicle” • OWL2 supports object property chains • Limited –only allows particular kinds of chains, inference • Can do: Bob hasSister Jane, therefore Bob hasSibling Jane (hasSister is a subproperty of hasSibling) • Cannot do: Bob hasSibling Jane, AND Jane hasGender Female, therefore, Bob hasSister Jane (property intersection) – hasGender is a DatatypeProperty • Cannot do: Jane is Bob’s sister, therefore Jane hasGender female (mixing Datatype, Object property) • No boolean operations, comparators • Cannot do: Jane hasAge 12; if X hasAge < 18, then X isA Child; therefore Jane isA Child

  28. Challenge 9: Inferencing and Reasoning • Performance challenges with more complex ontologies, rule sets • Challenges of forward & backward reasoning • OWL-Full nearly impossible to bound • Conversely, Software Programs have an unlimited set of ways to enhance information • Only bound by algorithm complexity, designer’s creativity • Software programs automatically support superclass relations Support wider range of inference rules Provide scope for inferencing (no OWL-Full)

  29. Challenge 11: Slow Enhancements to Standards • OWL in Feb 2004; OWL2 in Nov 2009 • Small number of changes • Long cycle for enhancements to standard • Impossible to keep current due to standardization process, formalisms used • Conversely, software languages update frequently • Java: Major release every 18 months (originally) • Open-Source libraries release even faster • Newer versions of software incorporate desired changes quickly • If it doesn’t get out quickly, developers will find alternatives OWL needs a more efficient update/release process

  30. Challenge 12: No Clear Role for Ontology • Software Systems don’t use ontologies to access information • RDF Triple Stores don’t need Ontologies to hold data • Databases use schemas to describe how they store information • Many sites & systems claiming to use ontologies only use it for metadata, not content • Sites: Author, Title, Publish Date, etc. • Systems: artifacts of systems – data, functionality of system • Migration to Microformats Understand HOW, WHERE and WHY to use ontologies Create working systems where ontologies work with software, services, data schemas

  31. Challenge 13: The Eclectic Ontologist • Crafting ontologies is seen as a specialized task • Ontologists rarely appear in Project Team diagrams • When they do, they are frequently isolated from developers • Sometimes isolated from Subject Matter Expert (biggest mistake of all) • Most software developers do not understand ontologies • To be honest, ontologists do not make it easy… • But software developers deal with highly complex systems all the time • Certainly capable of understanding ontologies Break down the barriers to using ontologies Make ontologies easier to use and integrate

  32. Examples of “Challenged” Ontologies: DBpedia • Captures information in RDF form effectively • Ontology, however, is huge • Duplicate, redundant, confusing, or useless properties: • Cambridge has a property for “imagesize” • Cambridge has two values for yearPrecipitationMm • What does “location” property indicate? • World of Warcrafthas “length” property – with over 25 values! • wikiPageUsesTemplate property – who cares? • Demonstrates value, but also danger, of crowd-sourcing information repositories • No central control or curation

  33. Examples of “Challenged” Ontologies: MeSH <owl:Classrdf:about=http://bioonto.de/mesh.owl#A01.047.025> <rdfs:labelrdf:datatype="http://www.w3.org/2001/XMLSchema#string">Abdominal Cavity</rdfs:label> <rdfs:subClassOfrdf:resource="http://bioonto.de/mesh.owl#A01.047"/> </owl:Class> <owl:Classrdf:about="http://bioonto.de/mesh.owl#A01.047.025.600"> <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Peritoneum</rdfs:label> <rdfs:subClassOfrdf:resource="http://bioonto.de/mesh.owl#A01.047.025"/> </owl:Class> • Unfriendly class naming • Also illustrates confusion between data, classes • Strictly a hierarchical categorization of medical terms, not a true ontology • Classes/subclass relationships not correct • Thumb is a subclass of hand, therefore thumbs are hands(!?)

  34. Examples of “Challenged” Ontologies: Cyc <owl:Classrdf:about="Mx8NhB4rcacO5KgcQdidVfCUDYB6bg-SZ292ZXJubWVudCBtZWV0aW5nHiu9ZdiAnCkRsZ2tw3ljb3JwD6Q3NGVhZTkxOC0xN2IxLTQxZDktODUyNy05MGI1NGRlOTBmYzM"> <rdfs:labelxml:lang="en">government meeting</rdfs:label> <Mx4rwLSVCpwpEbGdrcN5Y29ycA xml:lang="en">a existing object type named &quot;governmentmeeting&quot;</Mx4rwLSVCpwpEbGdrcN5Y29ycA> <rdf:typerdf:resource="Mx4rpPHhAOB1EdqAAAACs6hRXg"/> <rdfs:subClassOfrdf:resource="Mx4rvVj27ZwpEbGdrcN5Y29ycA"/> <owl:sameAsrdf:resource="http://sw.cyc.com/concept/Mx8NhB4rcacO5KgcQdidVfCUDYB6bg-SZ292ZXJubWVudCBtZWV0aW5nHiu9ZdiAnCkRsZ2tw3ljb3JwD6Q3NGVhZTkxOC0xN2IxLTQxZDktODUyNy05MGI1NGRlOTBmYzM"/> </owl:Class> • Unfriendly class naming • Tries to represent everything • Bizarre class representations

  35. Summary: Good Ontology Design • Semantic Web is about enabling automated processes to comprehend and process information • Design ontologies for use in tools – not just standalone • Separate Individuals from Class & Property definitions (T-Box) • Store Individuals in a SPARQL-accessible Triple Store • Make T-Box OWL available as a web document (small) • Avoid OWL-Full • Pay attention to Properties, not just classes • Design for Reuse by others • Understandable class, property names • Follow conventions for ontology style • Namespace URI should be URL of actual ontology definition • Ontology should be independent of tools used to access, edit, or reason over it

  36. Summary: OWL Improvements • More Inference & Reasoning options • Establish expected behavior for reasoners • Conformance testing suites • Faster cycles for updates • Establish Style Guides/standards • W3C Documents should focus on usability, not formalisms • Formalism necessary, but shouldn’t be the first/only thing found by a search

  37. Summary: OWL Improvements • Make Open-World an option • Exclusive classes • Default values for properties • Simplify ontology definition for common constructs (single value, enumerated datatypes, etc.) • Improve Property Specifications • Temporal Constructs • Cleaner Enumerated datatypes • Property ranges • Datatypes with units • Simplified datatypes – no need for 16 different numeric datatypes

  38. Summary: Community Improvements • Better examples, tutorials, etc. • Multiple examples for every ontology construct, with thorough explanations • All supported languages, levels • Community forums for collaborating, developing ontology-based solutions • Wiki? Forums? Ontology.org? • Site for “open-source” ontologies • Upper-level ontologies for general domains • Medical, Financial, Social Media, etc. • Design for reuse, and demonstrate it • Assume your ontology will be accessed by others

  39. Last Thoughts… • Ontology challenges directly hinders the adoption of ontology-based semantic technologies • Slows acceptance by community • No large-scale adoption = Less $$$ • Large-scale Java adoption occurred in just a few years • Same with other software languages • Language adoption, standardization fuels job growth • Learn what Java did right – other languages and technologies followed Java’s pattern for success Until the Ontology community addresses these challenges, ontologies will continue to be a marginal player in the semantic web

  40. Further Information

More Related