Ontologizing Semantic Relations

Ontologizing Semantic Relations Marco Pennacchiotti and Patrick Pantel University of Rome “Tor Vergata” and University of Southern California ACL 2006

Introduction • Many recent efforts have focused on extracting binary semantic relations between entities (is-a, part-of and other relations) • The output of most of these systems is like “Italy is-a country” and “orange similar-to blue” • “orange#1 similar-to blue#1” or “orange#2 similar-to blue#1”.

Introduction • given an instance (x, r, y) of a binary relation r between terms x and y, the ontologizing task is to identify the WordNet senses of x and y where r holds • anchoring approach • x can be disambiguated by retrieving the set of terms that occur in the same relation r with y

Introduction • clustering approach • An instance (x, r, y) can then be ontologized easily by finding the senses of x and y that are subsumed by ancestors linked by a conceptual instance of r. • given conceptual instance (particles#1, PART-OF, substances#1) • (proton, PART-OF, element) ontologizes to (proton#1, PART-OF, element#2) since proton#1 is subsumed by particles and element#2 is subsumed by substances.

Ontologizing Semantic Relations • In order to attach a relation instance (x, r, y) into WordNet, one must: • Disambiguate x and y, that is, find the subsets S”x ⊆Sx and S”y⊆Sy for which the relation r holds (Sx and Sy be the sets of all WordNet senses of x and y) • Instantiate the relation in WordNet, using the synsets corresponding to all correct permutations between the senses in S”x and S”y. We denote this set of attachment points as S”xy.

Ontologizing Semantic Relations • Unlike common algorithms for word sense disambiguation, here it is important to take into consideration the semantic dependency between the two terms x and y.

Anchor Approach • Given an instance (x, r, y) • (1) y is fixed and the algorithm retrieves the set of all other terms X' that occur in an instance (x', r, y) • (2)For each sense pair {Sx, Sx'} ∈ Sxx', a similarity score r (Sx, Sx') is calculated using WordNet: • d(Sx, Sx') is the length of the shortest path connecting the two synsets, and f (Sx') is the number of times sense Sx' occurs in any of the instances of X'.

Anchor Approach • (3) the algorithm inverts the process by setting x as the anchor and computes r (Sy) • (4)All possible permutations of senses are computed and scored by averaging r (Sx) and r (Sy). Permutations scoring higher than a threshold τ1 are selected as the attachment points

Clustering Approach • (1) Given an instance (x, r, y), all sense pair permutations Sxy={Sx, Sy} are retrieved from WordNet • (2) Cxy, is formed for each instance from the permutation of each WordNet ancestor of Sx and Sy, following the hypernymy link, up to degree τ2.

Clustering Approach • (3) Each candidate conceptual instance, c={Cx, Cy}, is scored by its degree of generalization as follows: • ni is the number of hypernymy links needed to go from Si to Ci • (4) score of c : • Gc refer to the set of all candidate conceptual instances subsumed by candidate conceptual instance c, Ic is the set of instances subsumed by c.

Clustering Approach • (5) selects the sense pair of x and y that is subsumed by the highest scoring candidate conceptual instance

Experimental Results • We experiment with two relations: part-of and causation. • We apply Espresso to a dataset consisting of a sample of articles from the Aquaint (TREC-9) newswire text collection • Espresso extracted 1,468 part-of instances and 1,129 causation instances. We manually validated the output and randomly selected 200 correct relation instances of each relation for ontologizing into WordNet 2.0

Experimental Results • Part-of relation • causationrelation

Conceptual Instances: Other Uses • Support to Relation Extraction Tools • our system discards the following incorrect instances:(week, CAUSE, coalition), (demeanor, CAUSE, vacuum) as they are both part of the very low scoring conceptual instance [abstraction#6, CAUSE, state#1]. • Ontology Learning from Text • Word Sense Disambiguation • “the board is composed by members of different countries” => (member, PART-OF, board) • [person#1, PART-OF, organization#1] => (member#1, PART-OF, board#1)

Qualitative Evaluation • correctness of the conceptual instances • Incorrect conceptual instances such as [attribute#2, CAUSE, state#4] • 82% correctness for the part-of relation and 86% for causation • the accuracy of the conceptual instances • An instance is incorrectly attached to a correct conceptual instance • The accuracy for part-of is 84% and for causation it is 76.6%.

Conclusions • two algorithms for automatically ontologizing binary semantic relations • Our best results were on the part-of relation where the clustering approach achieved 13.6% higher F -score than the baseline • We intend to pursue the ideas presented in Section 5 for using conceptual instances

Ontologizing Semantic Relations