70 likes | 187 Vues
Explore derivational morphology, deep syntax, and domain-specific resources for enhancing paraphrase detection algorithms. Utilizes CELEX database, Comlex lexicon, and XIP rules to extract verb-noun relations and logical subject/objects of infinitives. Enhance normalization grammar with over 120 rules involving morphological and syntactic resources. Develop specific lexical item relations like synonymy, HASN, TURNTO, and ISA with around 150 XIP rules.
E N D
Resources for paraphrase detection Caroline Hagège Caroline.Hagege@xrce.xerox.com Caroline Brun Caroline.Brun@xrce.xerox.com
Resource types • Derivational morphology • Deep syntax • Domain-specific resources
1. Derivational morphology • Use of the CELEX database (distributed by the LDC) • http://www.kun.nl/celex/index.html • Hand made revision of the extracted pairs in order to typify the kind of relations (predicate) between them. • Automatic extraction of verbs and corresponding deverbal nouns • Suffixes: +OR, +ER, +ION • Predicates relating noun-verbs from the same morphological family (~ 1600 pairs) • Predicate types: • S0 : The noun paraphrases the action expressed by the verb. • e.g. S0(acceleration,accelerate) • S1H : The noun corresponds to the first actant of the action • expressed by the verb and has a human:+ feature. • e.g. S1H(writer,write)
1. Derivational morphology (cntd) • S1NH : The noun corresponds to the first actant of the action • expressed by the verb and has a human:~ feature. • e.g. S1NH(abbreviation,abbreviate) • S2 : The noun corresponds to the second actant of the action • expressed by the verb. • E.g. S2(affirmation,affirm) • Automatic extraction of noun and corresponding adjective • Suffix: +AN
2. Deep syntax • Use of Comlex lexicon (Grisham & al. 1994) in order to extract logical subject/objects of infinitives. • Example 1 “He ordered Peter to go” • SUBJ-N(order,he), OBJ-N(order,Peter), SUBJ-N(go,Peter) • Example 2 “He promised Peter to go” • SUBJ-N(promise,he), OBJ-N(promise,Peter), SUBJ-N(go,he) • Active-Passive transformation • Use of verb class alternation (Levin 93) • Example 3 “Acetone burns easily” • SUBJ-N(burn,VARIABLE), OBJ-N(burn,acetone),
About 120 rules exploiting the derivational morphology and deep syntactic resources are necessary for the general normalization grammar.
3. Domain-specific resource • Hand-made resources. Directly encoded as XIP rules • Creation of specific relations between lexical items (about 30 relations) • SYNONYMY relations e.g. odor-smell • HASN relation e.g. evaporate-volatility • TURNTO relation e.g. evaporate-vapor • ISAJ relation e.g. burn-burnable • Elaboration of XIP rules exploiting these relations and the normalized syntactic analysis (about 150 rules) • If ( SUBJ-N(#1[lem:have],#2) & OBJ-N(#1,#3) & HASN(#4,#3) ) • PROPERTY(#2,#4) • This rule gives equivalent representations to • X has volatility and X evaporates, X has flammability and X burns etc.