identifying key concepts in an ontology through the integration of cognitive principles with statistical and topological

1. Identifying key concepts in an ontology through the integration of cognitive principles with statistical and topological measures Silvio Peroni, Enrico Motta and Mathieu d�Aquin Knowledge Media Institute The Open University

2. The context for the work on the SW testbeds in WP8 is given by this research programme on NGSW apps, which we started about three years ago, which is closely aligned with the OK project. In particular,the idea of NGSW is to exploit large scale semantics by doing away with the classic assumptions characterizing semantic systems (closed conceptualizations, design time metadata alignment at design time, closed KA, etc..). These features of NGSW closely match the key tenets of the OK project: open systems, ability to acquire knowledge dynamically, ability to handle heterogeneity at run time In open knowledge the core focus of our work in the initial 2 years, was in developing the two testbeds and on run-time mapping algorithmsThe context for the work on the SW testbeds in WP8 is given by this research programme on NGSW apps, which we started about three years ago, which is closely aligned with the OK project. In particular,the idea of NGSW is to exploit large scale semantics by doing away with the classic assumptions characterizing semantic systems (closed conceptualizations, design time metadata alignment at design time, closed KA, etc..). These features of NGSW closely match the key tenets of the OK project: open systems, ability to acquire knowledge dynamically, ability to handle heterogeneity at run time In open knowledge the core focus of our work in the initial 2 years, was in developing the two testbeds and on run-time mapping algorithms

3. Reusing Ontologies Semantic Web Search Engines like Watson, Sindice, Swoogle, Falcon-s, etc. help in finding and locating semantic information on the Web. However, they don�t support the user in quickly understanding what the ontology is about, what it contains

4. Summarizing ontologies What is needed is a way to quickly get a general impression of what an ontology is about When ask to summarize ontologies, people come up with explanations like �The AKT portal ontology can be used to describe academic organizations. It covers concepts such as event, person, technology, project, etc..� That is, they can extract the key concepts which can effectively summarize an ontology

5. Identifying Key Concepts

6. Research Issues What are the right concepts that could be used to describe an ontology concisely? Are there any principles/regularities in the way human beings extract �key concepts� from an ontology? Can these principles be automated, to define algorithms that are able to characterize an ontology the way people do? How effectively do the resulting ontology signatures allow knowledge consumers to locate the information they need?

7. Identifying key concepts: Approach Integration of cognitive criteria with lexical statistics, formal and topological criteria Criteria Natural categories (Rosch, 1978) information rich concepts that are �basic� from a cognitive standpoint E.g., dog, cat, chair, etc.. Density information rich concepts from a formal standpoint i.e., concepts rich in attributes, instances, or subclasses, We use both local and global density measures Popularity Lexical statistics Familiar words tend to be more descriptive than unfamiliar one We use both global and local popularity measures Best ontology coverage (topological) We want to ensure that for each concept C in the ontology, there is a key concept Ki, such that either C ? Ki or Ki ? C

8. Computing Natural Categories According to Rosch, people characterize the world primarily in terms of basic objects, such as chair or car. These basic objects are not the most general ones (e.g. vehicle, furniture) and not the specific ones (e.g. red car, nice chair). Hence, we consider as basic objects those that Are central in the hierarchy Have a simple label Because of linguistic evolution, normally a natural category has a simple name. For example, �Chair� is �more natural� than �KitchenChair�

9. Basic Level: example

10. Computing �Name Simplicity� NS(C) = 1 - c(nc-1), nc = number of compounds in the label c a constant in our experiments, we use c = 0.3. NS (�Artist�) = 1 NS (�MusicalArtist�) = 0.7

11. Density The density of a concept C is a measure of how richly described the concept is in the ontology It is is computed on the basis of its number of direct sub-concepts, properties and instances We consider 2 different types of density: the global density and the local density

12. Global Density

13. Global Density: example

14. Local Density The local density of a concept C refers to a density value which is relative to those of the surrounding concepts

15. Local Density: example

16. Popularity How much a category is popular is another criteria that can be used to identify whether a particular category C is a key concept The popularity of a concept, C, is measured as the number of results returned by querying Yahoo with the name of C as keyword Compound names are transformed to a sequence of lower case keywords separated by a space (Marine-Animal, MarineAnimal, marineAnimal, marine_animal are all transformed in �marine animal�)

17. Local and global popularity As in the case of density, we also want to take into consideration both the global and local popularity of a concept We compute these analogously to the way we derive global and local densities

18. Coverage The coverage criterion states that the set of key concepts identified by our algorithm should maximise the coverage of the ontology with respect to its is-a hierarchy Not only we want the right type of concepts to be returned by our method, but also the right spread of concepts must be achieved, to provide the best possible illustration of the ontology

19. Total Coverage

20. Partial Coverage

21. Coverage: formulas

22. The algorithm (1/2) For each class C in O we compute its global and local density, global and local popularity and the natural category value. For each class C in O we compute score(C) Given a number k = n (in our experiments k = 15), let S be the set of k classes in O with the best score and let T be the set of n-k classes in {O ? S} with the best score. If T is empty, we return S and we stop Otherwise, let c be the average of all the values obtained by invoking the function contribution(Ci, {S ? T}), for each Ci ? {S ? T}. And let a be the average of all the values obtained by invoking the function overallScore(Ci, {S ? T}), again for each Ci ? {S ? T}

23. The algorithm (2/2) Let W be the class in T with the worst overallScore(W, {S ? T}) of all the classes in {S ? T}, and let R be the set {{S ? T} ? {W}}. If there is a class B ? {O ? {S ? T}}, such that the average a� of all the values obtained by invoking overallScore(C, {R ? {B}}), computed for each C ? {R ? {B}}, is greater than a, the average c� of all the values obtained by invoking contribution(C, {R ? {B}}), computed for each C ? {R ? {B}}, is greater than or equal to c, we swap W with B in {S ? T} and we go back to step 4. Otherwise we return {S ? T} and we stop.

24. Evaluation 4 ontologies were used for our tests We asked 8 semantic web experts to select up to 20 concepts they considered to be the best descriptors of the ontologies. We also asked them to try and maximise ontology coverage For each ontology, a number of concepts emerged, which a high percentage of experts considered to be key concepts. On these concepts, the experts showed on average a 74.68% agreement ratio

25. Algorithm results (v3) We implemented three versions of the algorithm V1 exhibited a very bad performance (average agreement = 42.56%; no popularity) V2 was much better (average agreement = 63.61%; no nat. categories) V3 showed an excellent correlation with the experts (average agreement = 72.08%)

27. Conclusions We defined an algorithm for computing a summary of an ontology, in the form of key concepts This algorithm is almost as good as humans achieving this task The implementation of the technique will be provided as a service on top of Watson, and used to provide meaningfull snapshots of ontologies Other applications include: To support new navigation/visualization mechanisms, which can improve over the taxonomic displays provided by current ontology engineering tools To identify priority concepts in ontology mapping, automatic classification, ontology evolution, etc. To provide mechanisms for knowledge providers to advertise knowledge contents, without publishing the whole ontology

identifying key concepts in an ontology through the integration of cognitive principles with statistical and topological

identifying key concepts in an ontology through the integration of cognitive principles with statistical and topological

Presentation Transcript

CONSUMER COMPREHENSION of EDUCATIONAL MATERIALS: Key Cognitive Principles

Integration of Information Extraction with an Ontology

A review of key statistical concepts

Partial Integration of GO with the Ingenuity Ontology

Integration of Information Extraction with an Ontology

CONSUMER COMPREHENSION of EDUCATIONAL MATERIALS: Key Cognitive Principles

Concepts and proto-concepts in cognitive science

Weather Integration and Concepts of Integration (CONINT) through FY12

Overview of Concepts and Principles

Statistical concepts

Statistical Concepts Basic Principles

Cognitive Concepts

STEM Integration With Principles of Engineering

Key terms and concepts: introducing first principles

Identifying key studies in nuclear astrophysics through the CARINA network

KEY CONCEPTS IN COGNITIVE PSYCHOLOGY Perceptual processes:

A Broad Overview of Key Statistical Concepts

BUSINESS STATISTICS KEY STATISTICAL CONCEPTS

Key concepts of the new cognitive sciences

Building Corporate Knowledge through Ontology Integration

Key Principles of Statistical Inference

Ontology Integration