Concept Hierarchy Induction by Philipp Cimiano

Concept Hierarchy Inductionby Philipp Cimiano

Objective Structure information into categories Provide a level of generalization to define relationships between data Application: Backbone of any ontology

Overview • Different approaches of acquiring conceptual hierarchies from text corpus. • Various clustering techniques. • Evaluation • Related Work • Conclusion

Machine Readable Dictionaries Entries: ‘a tiger is a mammal’, or ‘mammals such as tigers, lions or elephants’. exploit the regularity of dictionary entries. the head of the first NP - hypernym.

Example

Exception

Exception is-a (corolla, part)………..is a NOT VALID is-a (republican, member) ……….. is a NOT VALID is-a (corolla, flower)………..is a NOT VALID is-a (republican, political party)………..is a NOT VALID

Exception

Alshawis solution

Results using MRDs Dolan et al. - 87% of the hypernym relations extracted are correct Calzolari cites a precision of > 90% Alshawi - precision of 77%

Strengths And Weaknesses Correct, explicit knowledge Robust basis for ontology learning Weakness- domain independent

Lexico-Syntactic patterns Task: automatically learning hyponym relations from the corpora. 'Such injuries as bruises, wounds and broken bones' hyponym (bruise, injury) hyponym (wound, injury) hyponym (broken bone, injury)

Hearst patterns 'Such injuries as bruises, wounds and broken bones'

Requirements Occur frequently in many text genres. Accurately indicate the relation of interest. Be recognizable with little or no pre-encoded knowledge

Strengths And Weaknesses • Identified easily and are accurate Weakness: • patterns appear rarely • is-a relation do not appear in Hearst style pattern

Distribution Similarity 'you shall know a word by the company it keeps’ [Firth, 1957]. semantic similarity of words – similarity of the contexts.

Using distribution similarity

Strengths And Weaknesses • reasonable concept hierarchy. Weakness: • Cluster tree lacks clear and formal interpretation • Does not provide any intentional description of concepts • Similarities may be accidental (sparse data)

Formal Concept Analysis (FCA)

FCA output

Similarity measures

Smoothing

Evaluation • Semantic cotopy (SC). • Taxonomy overlap (TO)

Evaluation Measure

100% Precision Recall

Low Recall

Low Precision

Results

Strengths And Weaknesses • FCA generates formal concepts • Provides intentional description Weakness: • Size of the lattice can get exponential in the size • spurious clusters • Finding appropriate labels for the cluster

Problems with Unsupervised Approaches to Clustering • Data sparseness leads to spurious syntactic similarities • Produced clusters can’t be appropriately labeled

Guided Clustering • Hypernyms directly used to guide clustering • WordNet • Hearst • Agglomerative clustering

Similarity Computation Ten most similar terms of the tourism reference taxonomy

The Hypernym Oracle • Three sources • WordNet • Hearst patterns matched in a corpus • Hearst patterns matched in the World Wide Web • Record hypernyms and amount of evidence found in support of hypernyms.

WordNet • Collect hypernyms found in any dominating synset containing term, t • Include number of times the hypernym appears in a dominating synset

Hearst Patterns (Corpus) • Record number of isa-relations found between two terms

Hearst Patterns (WWW) • Download 100 Google abstracts for each concept and clue:

Evidence • Total Evidence for Hypernyms: • time: 4 • vacation: 2 • period: 2

Clustering Algorithm • Input a list of terms • Calculate the similarity between each pair of terms and sort from highest to lowest • For each potential pair to be clustered consult the oracle.

Consulting the Oracle case 1 • If term 1 is a hypernym of term 2 or vice-versa: • Create appropriate subconcept relationship.

Consulting the Oracle case 2 • Find the common hypernym for both terms with greatest evidence. • If one term has already been classified: t’ = h h is a hypernym of t’ t’ is a hypernym of h

Consulting the Oracle case 3 • Neither term has been classified: • Each term becomes a subconcept of the common hypernym.

Consulting the Oracle case 4 • The terms do not share a common hypernym: • Set aside the terms for further processing.

r-matches • For all unprocessed terms, check for r-matches (i.e. ‘credit card’ matches ‘international credit card’)

Further Processing • If either term in a pair is already classified as t’, the other term is classified under t’ as well. • Otherwise place both terms under the hypernym of either term with the most evidence. • Any unclassified terms are added under the root concept.

Concept Hierarchy Induction by Philipp Cimiano

Concept Hierarchy Induction by Philipp Cimiano

Presentation Transcript

Theseus and the Minotaur Powerpoint By Philipp Gaissert

Charging by Induction

Charging by Contact and by Induction

Concept Hierarchy Induction

Charging by Induction

Charging by Induction

Hierarchy

Philipp Aerni, 2006

Proof by Induction

Concept Presentation Electromagnetic Induction

Dr Philipp Pattberg

Proof by Induction

Philipp Lenard

Charging by Induction

Charging by contact and by induction

Discovering Query Context using Concept Hierarchy

What Is the Concept of Ovulation Induction Cycle?

Philipp Genschel

Theseus and the Minotaur Powerpoint By Philipp Gaissert