1 / 38

Dickson K.W. Chiu PhD, SMIEEE Text: Antoniou & van Harmelen: A Semantic Web Primer (Chapter 7)

CSIT600f: Introduction to Semantic Web Ontology Engineering. Dickson K.W. Chiu PhD, SMIEEE Text: Antoniou & van Harmelen: A Semantic Web Primer (Chapter 7). Lecture Outline. Introduction Constructing Ontologies Manually Reusing Existing Ontologies Using Semiautomatic Methods

Télécharger la présentation

Dickson K.W. Chiu PhD, SMIEEE Text: Antoniou & van Harmelen: A Semantic Web Primer (Chapter 7)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSIT600f: Introduction to Semantic Web Ontology Engineering Dickson K.W. Chiu PhD, SMIEEE Text: Antoniou & van Harmelen: A Semantic Web Primer (Chapter 7)

  2. Lecture Outline • Introduction • Constructing Ontologies Manually • Reusing Existing Ontologies • Using Semiautomatic Methods • On-To-Knowledge SW Architecture Dickson Chiu 2005

  3. Methodological Questions • How can tools and techniques best be applied? • Which languages and tools should be used in which circumstances, and in which order? • What about issues of quality control and resource management? • Many of these questions for the Semantic Web have been studied in other contexts • E.g. software engineering, object-oriented design, and knowledge engineering Dickson Chiu 2005

  4. Lecture Outline • Introduction • Constructing Ontologies Manually • Reusing Existing Ontologies • Using Semiautomatic Methods • On-To-Knowledge SW Architecture Dickson Chiu 2005

  5. Main Stages in Ontology Development • Determine scope • Consider reuse • Enumerate terms • Define taxonomy • Define properties • Define facets • Define instances • Check for anomalies Not a linear process! Dickson Chiu 2005

  6. Determine Scope • Many possible correct ontology of a specific domain • An ontology is an abstraction of a particular domain, and there are always viable alternatives • What is included in this abstraction should be determined by • the use to which the ontology will be put • by future extensions that are already anticipated • Basic questions to be answered at this stage are: • What is the domain that the ontology will cover? • For what we are going to use the ontology? • For what types of questions should the ontology provide answers? • Who will use and maintain the ontology? Dickson Chiu 2005

  7. Consider Reuse • With the spreading deployment of the Semantic Web, ontologies will become more widely available • We rarely have to start from scratch when defining an ontology • There is almost always an ontology available from a third party that provides at least a useful starting point for our own ontology Dickson Chiu 2005

  8. Enumerate Terms • Write down in an unstructured list all the relevant terms that are expected to appear in the ontology • Nouns form the basis for class names • Verbs (or verb phrases) form the basis for property names • Traditional knowledge engineering tools (e.g. laddering and grid analysis) can be used to obtain • the set of terms • an initial structure for these terms Dickson Chiu 2005

  9. Define Taxonomy • Relevant terms must be organized in a taxonomic hierarchy • Opinions differ on whether it is more efficient/reliable to do this in a top-down or a bottom-up fashion • Ensure that hierarchy is indeed a taxonomy: • If A is a subclass of B, then every instance of A must also be an instance of B (compatible with semantics of rdfs:subClassOf Dickson Chiu 2005

  10. Define Properties • Often interleaved with the previous step • The semantics of subClassOf demands that whenever A is a subclass of B, every property statement that holds for instances of B must also apply to instances of A • It makes sense to attach properties to the highest class in the hierarchy to which they apply • While attaching properties to classes, it makes sense to immediately provide statements about the domain and range of these properties • There is a methodological tension here between generality and specificity: • Flexibility (inheritance to subclasses) • Detection of inconsistencies and misconceptions Dickson Chiu 2005

  11. Define Facets: From RDFS to OWL • Cardinality restrictions • Required values • owl:hasValue • owl:allValuesFrom • owl:someValuesFrom • Relational characteristics • symmetry, transitivity, inverse properties, functional values Dickson Chiu 2005

  12. Define Instances • Filling the ontologies with such instances is a separate step • Number of instances >> number of classes • Thus populating an ontology with instances is not done manually • Retrieved from legacy data sources (DBs) • Extracted automatically from a text corpus Dickson Chiu 2005

  13. Check for Anomalies • An important advantage of the use of OWL over RDF Schema is the possibility to detect inconsistencies • In ontology or ontology+instances • Examples of common inconsistencies • incompatible domain and range definitions for transitive, symmetric, or inverse properties • cardinality properties • requirements on property values can conflict with domain and range restrictions Dickson Chiu 2005

  14. Lecture Outline • Introduction • Constructing Ontologies Manually • Reusing Existing Ontologies • Using Semiautomatic Methods • On-To-Knowledge SW Architecture Dickson Chiu 2005

  15. Existing Domain-Specific Ontologies • Medical domain: Cancer ontology from the National Cancer Institute in the United States • Cultural domain: • Art and Architecture Thesaurus (AAT) with 125,000 terms in the cultural domain • Union List of Artist Names (ULAN), with 220,000 entries on artists • Iconclass vocabulary of 28,000 terms for describing cultural images • Geographical domain: Getty Thesaurus of Geographic Names (TGN), containing over 1 million entries Dickson Chiu 2005

  16. Integrated Vocabularies • Merge independently developed vocabularies into a single large resource • E.g., Unified Medical Language System integrating 100 biomedical vocabularies • The UMLS meta-thesaurus contains 750,000 concepts, with over 10 million links between them • The semantics of a resource that integrates many independently developed vocabularies is rather low • But very useful in many applications as starting point Dickson Chiu 2005

  17. Upper-Level Ontologies • Some attempts have been made to define very generally applicable ontologies • CYC - with 60,000 assertions on 6,000 concepts • IEEE Standard Upperlevel Ontology (SUO) Dickson Chiu 2005

  18. Topic Hierarchies • Some “ontologies” do not deserve this name: • simply sets of terms, loosely organized in a hierarchy • This hierarchy is typically not a strict taxonomy but rather mixes different specialization relations (e.g., is-a, part-of, contained-in) • Such resources often very useful as starting point • Example: Open Directory hierarchy, containing more then 400,000 hierarchically organized categories and available in RDF format Dickson Chiu 2005

  19. Linguistic Resources • Some resources were originally built not as abstractions of a particular domain, but rather as linguistic resources • These have been shown to be useful as starting places for ontology development • E.g.,WordNet, with over 90,000 word senses Dickson Chiu 2005

  20. Ontology Libraries • Attempts are currently underway to construct online libraries of online ontologies • Rarely existing ontologies can be reused without changes • Existing concepts and properties must be refined using rdfs:subClassOf and rdfs:subPropertyOf • Alternative names must be introduced which are better suited to the particular domain using owl:equivalentClass and owl:equivalentProperty • We can exploit the fact that RDF and OWL allow private refinements of classes defined in other ontologies Dickson Chiu 2005

  21. Lecture Outline • Introduction • Constructing Ontologies Manually • Reusing Existing Ontologies • Using Semiautomatic Methods • On-To-Knowledge SW Architecture Dickson Chiu 2005

  22. The Knowledge Acquisition Bottleneck • Manual ontology acquisition remains a time-consuming, expensive, highly skilled, and sometimes cumbersome task • Machine Learning techniques may be used to alleviate • knowledge acquisition or extraction • knowledge revision or maintenance Dickson Chiu 2005

  23. Tasks Supported by Machine Learning • Extraction of ontologies from existing data on the Web • Extraction of relational data and metadata from existing data on the Web • Merging and mapping ontologies by analyzing extensions of concepts • Maintaining ontologies by analyzing instance data • Improving SW applications by observing users Dickson Chiu 2005

  24. Useful Machine Learning Techniques for Ontology Engineering • Clustering • Incremental ontology updates • Support for the knowledge engineer • Improving large natural language ontologies • Pure (domain) ontology learning Dickson Chiu 2005

  25. Machine Learning Techniques for Natural Language Ontologies • Natural language ontologies (NLOs) contain lexical relations between language concepts • They are large in size and do not require frequent updates • The state of the art in NLO learning looks quite optimistic: • A stable general-purpose NLO exist • Techniques for automatically or semi-automatically constructing and enriching domain-specific NLOs exist Dickson Chiu 2005

  26. Machine Learning Techniques for Domain Ontologies • They provide detailed descriptions • Usually they are constructed manually • The acquisition of the domain ontologies is still guided by a human knowledge engineer • Automated learning techniques play a minor role in knowledge acquisition • They have to find statistically valid dependencies in the domain texts and suggest them to the knowledge engineer Dickson Chiu 2005

  27. Machine Learning Techniques for Ontology Instances • Ontology instances can be generated automatically and frequently updated while the ontology remains unchanged • Fits nicely into a machine learning framework • Successful ML applications • Are strictly dependent on the domain ontology, or • Populate the markup without relating to any domain theory • General-purpose techniques not yet available Dickson Chiu 2005

  28. Different Uses of Ontology Learning • Ontology acquisition tasks in knowledge engineering • Ontology creation from scratch by the knowledge engineer • Ontology schema extraction from Web documents • Extraction of ontology instances from Web documents • Ontology maintenance tasks • Ontology integration and navigation • Updating some parts of an ontology • Ontology enrichment or tuning Dickson Chiu 2005

  29. Ontology Acquisition Tasks • Ontology creation from scratch by the knowledge engineer • ML assists the knowledge engineer by suggesting the most important relations in the field or checking and verifying the constructed knowledge bases • Ontology schema extraction from Web documents • ML takes the data and meta-knowledge (like a meta-ontology) as input and generate the ready-to-use ontology as output with the possible help of the knowledge engineer • Extraction of ontology instances from Web documents • This task extracts the instances of the ontology presented in the Web documents and populates given ontology schemas • This task is similar to information extraction and page annotation, and can apply the techniques developed in these areas Dickson Chiu 2005

  30. Ontology Maintenance Tasks • Ontology integration and navigation • Deals with reconstructing and navigating in large and possibly machine-learned knowledge bases • Updating some parts of an ontology that are designed to be updated • Ontology enrichment or tuning • This does not change major concepts and structures but makes an ontology more precise Dickson Chiu 2005

  31. Potentially Applicable Machine Learning Algorithms • Propositional rule learning algorithms • Bayesian learning • generates probabilistic attribute-value rules • First-order logic rules learning • Clustering algorithms • They group the instances together based on the similarity or distance measures between a pair of instances defined in terms of their attribute values Dickson Chiu 2005

  32. Lecture Outline • Introduction • Constructing Ontologies Manually • Reusing Existing Ontologies • Using Semiautomatic Methods • On-To-Knowledge SW Architecture Dickson Chiu 2005

  33. On-To-Knowledge Architecture • Building the Semantic Web involves using • the new languages described in this course • a rather different style of engineering • a rather different approach to application integration • We describe how a number of Semantic Web-related tools can be integrated in a single lightweight architecture using Semantic Web standards to achieve interoperability between tools Dickson Chiu 2005

  34. Knowledge Acquisition • Initially, tools must exist that use surface analysis techniques to obtain content from documents • Unstructured natural language documents: statistical techniques and shallow natural language technology • Structured and semi-structured documents: wrappers induction, pattern recognition Dickson Chiu 2005

  35. Knowledge Storage • The output of the analysis tools is sets of concepts, organized in a shallow concept hierarchy with at best very few cross-taxonomical relationships • RDF/RDF Schema are sufficiently expressive to represent the extracted info • Store the knowledge produced by the extraction tools • Retrieve this knowledge, preferably using a structured query language (e.g. RQL) Dickson Chiu 2005

  36. Knowledge Maintenance and Use • A practical Semantic Web repository must provide functionality for managing and maintaining the ontology: • change management • access and ownership rights • transaction management • There must be support for both • Lightweight ontologies that are automatically generated from unstructured and semi-structured data • Human engineering of much more knowledge-intensive ontologies • Sophisticated editing environments must be able to • Retrieve ontologies from the repository • Allow a knowledge engineer to manipulate it • Place it back in the repository • The ontologies and data in the repository are to be used by applications that serve an end-user • We have already described a number of such applications Dickson Chiu 2005

  37. Technical Interoperability • Syntactic interoperability was achieved because all components communicated in RDF • Semantic interoperability was achieved because all semantics was expressed using RDF Schema • Physical interoperability was achieved because all communications between components were established using simple HTTP connections Dickson Chiu 2005

  38. On-To-Knowledge System Architecture Dickson Chiu 2005

More Related