300 likes | 314 Vues
Collaborate with Violeta Seretan on decoding the predicate-argument structure of nominalizations, supervised by Martin van den Berg. This presentation discusses the nominalization problem, the NOMLEX resource, a denominalizer service based on NOMLEX, and additional resources and APIs for NOMLEX and CSLI. It also explores related and future work and includes a demo on text normalization for QA.
 
                
                E N D
Whose presentation is this?SUBJ(present, Violeta Seretan) (Decoding the predicate-argument structure of nominalizations) OBL(collaborate, Lorenzo Thione)PP-OBJ(with, Lorenzo Thione) SUBJ(supervise, Martin van den Berg)
Overview • nominalization problem • NOMLEX resource • Denominalizer service based on NOMLEX • additional resources (CSLI) • APIs for NOMLEX, CSLI • related and future work • demo
Text normalization for QA • Mark Twain published Adventures of Huckleberry Finn in 1885 in America. • Who published H.F.? • Where was H.F. published? • When was H.F. published? • QA/NLU needs to deal with a large spectrum of variation in text: • morphological: published, publishes • syntactic: H.F. was published • lexical: {novel, book, masterpiece, work} {publish, write, author, appear} • nominalization: the publication • Normalization (via parsing): • base word form: publishes -> publish; published -> publish • canonical word order: SUBJ(publish, Mark Twain); OBJ(publish, H.F.) • Lexical semantic resources: • synonyms, hyponyms, hypernyms, … • What about nominalization?
deverbal noun publication of Huckleberry Finn OBJ(publish, Huckleberry Finn) nominalization matrix verb Nominalization Since the publication of Huckleberry Finn in 1885, there have been many reactions to the novel, some of them quite extreme. • When was H.F. published? Nominalization : NP having “a systematic correspondence with a clause structure” (Quirk et al. 1985) Goal: decoding the clause structure
Mapping nominal arguments into verbal roles • Mark Twain’s publication of his book possessive determiner PP adjunct (nominal arguments) • the book publication by Mark Twain modifier PP adjunct (nominal arguments) Mark Twain - publish – book SUBJECT OBJECT (verbal roles)
Role ambiguity Rome’s destruction – SUBJ or OBJ? OBJ(destroy, Rome) SUBJ(destroy, Rome) • Rome’s destructionby barbarians OBJ • Rome’s destruction of Carthage SUBJ Rome’s destruction – OBJ (by default) John’s admiration – SUBJ (by default)
NOMLEX – NOMinalization LEXicon • Macleod et al., New York University • 1’025 deverbal nouns • detailed mapping from nominal arguments to verb roles :ORTH "destruction" :VERB "destroy" :VERB-SUBC ((NOM-NP :SUBJECT ((N-N-MOD) (DET-POSS) (PP :PVAL ("by"))) :OBJECT ((DET-POSS) (N-N-MOD) (PP :PVAL ("of"))) :REQUIRED ((OBJECT :DET-POSS-ONLY T :N-N-MOD-ONLY T)))) role to assign default role
NOMLEXML (NOM :ORTH "accusation" :PLURAL "accusations" :PLURAL-FREQ "not rare" :VERB "accuse" :NOUN-SUBC ((NOUN-PP :PVAL ("about"))) :NOM-TYPE ((VERB-NOM)) :VERB-SUBJ ((DET-POSS) (N-N-MOD) (PP :PVAL ("by"))) :SUBJ-ATTRIBUTE ((COMMUNICATOR)) :OBJ-ATTRIBUTE ((COMMUNICATOR)) :VERB-SUBC ((NOM-NP-PP :SUBJECT ((DET-POSS) (N-N-MOD) (PP :PVAL ("by"))) :OBJECT ((PP :PVAL ("against"))) :PVAL ("of")) (NOM-NP :SUBJECT ((DET-POSS) … Perl
com.fxpal.sake.test (NomLexInterface) com.fxpal.ltng.services.normalization.noun.nomlex (NomLex, NomLexEntry, NomLexClassConstants, Subcat) NOMLEX API in Java
How useful? Oracle acquired PeopleSoft at the end of last year. Oracle’s acquisition of PeopleSoft at the end of last year… Google hits, 10/25/2005: More hits:
SUBJ(acquire, Oracle) OBJ(acquire, PeopleSoft) Argument-role mapping Oracle's acquisition of PeopleSoft possessive PP (of ) :ORTH "acquisition" :VERB "acquire" :VERB-SUBC ((NOM-NP :SUBJECT ((DET-POSS) (N-N-MOD) (PP :PVAL ("by"))) :OBJECT ((N-N-MOD) (PP :PVAL ("of"))))
Denominalizer • Input: sentence • Output: pairs nominal argument – verb role for each nominalization (noun, (argument –role)*)* Exemples: • Oracle's acquisition of PeopleSoft finally materialized after an 18 months struggle between the two companies. (acquisition, (Oracle - SUBJECT) (PeopleSoft - OBJECT)) • Oracle acquisition finally materialized. (acquisition, (Oracle - SUBJECT) (Oracle - OBJECT))
Algorithm com.fxpal.ltng.services.normalization.noun.* parse sentence for each deverbal noun get noun arguments for each NOMLEX entry for noun for each subcat of the entry 1. match arguments against subcat 2. filter assignment results select a subcat output assignments for selected subcat Note: overlapping nominalizations ok: an increase in product sales
1. Matching Oracle's acquisition of PeopleSoft finally materialized. Arguments (acquisition): POSS(acquisition, Oracle) ADJUNCT(acquisition, of) PP-OBJ(of, PeopleSoft) NOM-NP :SUBJECT ((DET-POSS) (N-N-MOD) (PP :PVAL ("by"))) :OBJECT ((N-N-MOD) (PP :PVAL ("of")))
2. Filtering Oracle's PeopleSoft acquisition finally materialized. Arguments (acquisition): POSS(acquisition, Oracle) MOD(acquisition, PeopleSoft) NOM-NP SUBJECT ((DET-POSS) (N-N-MOD) (PP :PVAL ("by"))) OBJECT ((N-N-MOD) (PP :PVAL ("of"))) Alternatives: Oracle: SUBJECT PeopleSoft: SUBJECT, OBJECT
NOMLEX constraints (1) • Uniqueness Constraint: A verbal role may be filled only once. Oracle's PeopleSoft acquisition Matching alternatives: Oracle: SUBJECT PeopleSoft: SUBJECT, OBJECT
NOMLEX constraints (2) • Ordering Constraint: If there are multiple pre-nominal arguments, they must appear in the order: SUBJECT, INDIRECT OBJECT, DIRECT OBJECT, OBLIQUE. FX’s printer sales grew by 50%. Matching alternatives: FX: SUBJECT, OBJECT printer: SUBJECT, OBJECT order: FX, printer verbal roles: SUBJECT, OBJECT
NOMLEX constraints (3) • Obligatoriness Constraint: By default, the subject and object are optional. A NOMLEX entry can specify obligatory roles to be filled. circulation - REQUIRED (SUBJECT) blood circulation SUBJ(circulate, blood) destruction - REQUIRED ((OBJECT :DET-POSS-ONLY T :N-N-MOD-ONLY T)))) Rome’s destruction OBJ(destroy, Rome)
Selectional Restrictions com.fxpal.ltng.services.normalization.noun.csli (Nouns, Verbs, NounsVerbs)
Applying selectional restrictions • room reservation Alternatives: room - SUBJECT, OBJECT reserve - selectional restrictions: SUBJECT: sentient; OBJECT: * room - location, physobj • semantic types for about 5000 N • selectional restrictions for about 5000 V 459/941 verbs from NOMLEX (48.77%)
Coverage extension • What if a noun is not in NOMLEX? • additional deverbal nouns in the CSLI data 4’087 “event nouns” 3348 new, 739 already in NOMLEX 3348/1025 326% more data • NOMLEX template: NOM-NP :SUBJECT ((DET-POSS) (N-N-MOD) (PP :PVAL ("by"))) :OBJECT ((DET-POSS) (N-N-MOD) (PP :PVAL ("of")))
Future work • extensive test and evaluation • other nominalization data • deverbal noun recognition • mapping information (FrameNet) • other lexical resources PropBank – semantic roles VerbLex – selectional restrictions • role assignment in context • word sense disambiguation, anaphora, discourse • collocations the author will make no accusation SUBJ(make, author) -> SUBJ (accuse, author)
Related work • PUNDIT system (Dahl et al., 1987) • SNOWY QA system (Hull and Gomez 1996) • NOMLEX for IE (Meyers et al., 1998) • N-N interpretation (Lapata 2002, Girju et al. 2004)
References • Dahl, Deborah A., Palmer, Martha S.; and Passonneau, Rebecca J. 1987. "Nominalizations in PUNDIT." Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics, Stanford, CA. • Girju, Roxana, Ana-Maria Giuglea, Marian Olteanu, Ovidiu Fortu, Orest Bolohan, and Dan Moldovan. Support vector machines applied to the classification of semantic relations in nominalized noun phrases. In Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics, 2004. • Hull, Richard and Fernando Gomez (1996). Semantic Interpretation of Nominalizations. PDF Format. Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland, Oregon, August, 1996, pp. 1062-8. • Lapata, Maria. 2002. The Disambiguation of Nominalisations. Computational Linguistics 28:3, 357-388. • Macleod, Catherine, Ralph Grishman, Adam Meyers, Leslie Barrett, and Ruth Reeves. 1998. Nomlex: A lexicon of nominalizations. In Proceedings of the 8th International Congress of the European Association for Lexicography, pages 187–193, Liège, Belgium. • Meyers A., et al. Using NOMLEX to produce nominalization patterns for information extraction. In Proceedings of the COLING-ACL Workshop on Computational Treatment of Nominals, 1998. • Quirk, S. R., Greenbaum, G. Leech, and J. Svartvik. 1985. A comprehensive grammar of English language, Longman, Harlow. • Terada Akira, Tokunaga Takenobu. Corpus based method of transforming nominalized phrases into clauses for text mining application. IEICE Transactions on Information and Systems. Vol.E86-D. No.9. pp.1736 -- 1744. 2003.
Selectional restrictions data • CSLI resource: • nouns 4447 • semantic types (ontology) • verbs 4858 • subcategorizations • selectional restrictions • noun-verb 5700 V (9415 N) • noun-verb pairs
FrameNet • aim: word – semantico-syntactic mapping • semantic roles: frame elements (frame-specific) • BNC corpus (100M words); American English – LDC, ANC • more than 600 frames, about 9.000 words Example: accusation frame: Judgment_communication FE (for this word) and their realization:
NOMLEX constraints (4) • restrictions on possible combinations • specified in NOMLEX entry adaptation :NOT ((AND :SUBJECT ((DET-POSS) (N-N-MOD)) :OBJECT ((N-N-MOD)) *plants' weather adaptation plants’ adaptation to weather Note: Not implemented (cannot decide which assignment to remove).
Denominalizer UI com.fxpal.sake.test.DenominalizerTest parse triples output