170 likes | 306 Vues
Using Predicate-Argument Structure for Topic- and Event-based Distillation. Elizabeth Boschee, Michael Levit*, Marjorie Freedman BBN Technologies. *now affiliated with ICSI. Outline. Introduction Approach Proposition Trees Generation Augmentation Scoring Usage
 
                
                E N D
Using Predicate-Argument Structure for Topic- and Event-based Distillation Elizabeth Boschee, Michael Levit*, Marjorie Freedman BBN Technologies *now affiliated with ICSI
Outline • Introduction • Approach • Proposition Trees • Generation • Augmentation • Scoring • Usage • Conclusions and Future Work
Introduction: Distillation Templates • Distillation operates over queries formulated according to fixed “templates”, for example: • Describe the prosecution of [PERSON] for [CRIME] • List facts about [EVENT] • Find statements about [EVENT/TOPIC] made by [PERSON] • Describe reaction of [COUNTRY] to [EVENT/TOPIC] • Etc. • To answer templated queries, a system must be able to decide whether text contains a reference to a query argument • For entity-like arguments (e.g. [PERSON]), standard extraction/coreference techniques are effective • Identifying references to topics/events (formulated in natural language) requires a different approach
Introduction: Sample Distillation Query • Distillation query: List facts about [the widespread looting of Iraqi museums after the US invasion]. • On-topic information can be conveyed without significant overlap with the query terms • Many works of art were stolen from Baghdad galleries in 2003 • Presence of query terms does not guarantee on-topic information • Iraqi museums presented an exhibit on looting in Afghanistan after the US invasion • Some words are more important than others • “looting” vs. “widespread” • Some phrases are more important than others • “Iraqi museums” vs. “after the US invasion” • Approach: Use query predicate-argument structure and extraction information to perform accurate topic identification
Approach • Represent logical structure of query argument and candidate responses using “proposition trees” • Augment proposition trees using additional synonym and extraction information • Define similarity metric over proposition trees • Account for structural transformations, relative importance of query terms, paraphrases and re-wordings, and omissions and additions • Use similarity metric to • Measure relevance of candidate responses to query • Identify redundancy between two candidate responses • This strategy was used in our GALE Year 1 evaluation system and achieved excellent results compared to human participants.
arrest possessive in his Baghdad Proposition Tree Generation • Syntactic parser generates parse tree for each sentence • Rule-based proposition finder transforms parses into simple predicate-argument propositions • Each proposition consists of a noun/verb predicate and zero or more arguments, each with a “role” label • Identifies logical subject and logical object as roles • Prepositional roles are not resolved/analyzed (role label remains “of”, “in”, “at”, etc.) • This process includes trace resolution • A set of proposition trees is created for each sentence • Nodes are either predicates or arguments • Branches are “roles” (e.g. “subject” or “of”) • Example: “his arrest in Baghdad”
arrest arrest possessive possessive in in his Baghdad his (Abu Abbas, Abbas, the PLF leader) Baghdad (the Iraqi capital) Proposition Tree Augmentation • Automatically supplement document proptrees with names or descriptors obtained through in-document coreference • Example: “his arrest in Baghdad”
arrest (capture, apprehend, apprehension, detain) possessive in arrest possessive in his (Abu Abbas, Abbas, the PLF leader, Mohammed Abbas, Abul Abbas) Baghdad (the Iraqi capital, Baghhdad, Bagadad, Baghdag, Bagdad) his (Abu Abbas, Abbas, the PLF leader) Baghdad (the Iraqi capital) Proposition Tree Augmentation • Automatically supplement all proptrees with synonyms • WordNet • Nominalization table • BBN “equivalent name” algorithm • Misspellings • Transliteration variants • Aliases • Acronyms
arrest possessive his Proposition Tree Scoring • Two proptrees are similar if one can be transformed into the other at minimal cost • “his arrest”  “Abbas was captured” • Substitution (synonym): arrest  captured • Substitution (label): possessive  object • Substitution (coreference): his  Abbas • Match score for these two trees: ~80% • Despite zero percent token overlap captured object Abbas
Scoring: Cost Structure • Different tree transformations have different costs • The cost for replacing a word with its synonym is based on the estimated reliability of the synonym • “United Nations”  “UN” is very reliable • “plant”  “works” is less reliable • Certain role substitutions are more costly than others • Changing the role from “in” to “premodifier” is cheap • “the plant in Cernavoda” “  the Cernavoda plant” • Changing the role from “in” to “by” is expensive • “the attack in Iraq”  “the attack by Iraq” • The cost for omitting a word/phrase increases the closer the word/phrase is to the root of the proptree • Because it causes a larger subtree to be omitted • In “the shutdown of the Cernavoda nuclear plant”, “nuclear” can be omitted more easily than “plant”
Scoring: Additions/Omissions • Matching of proptrees is actually non-symmetric • Additions are free; omissions are not • “The shutdown of the Cernavoda nuclear plant by the authorities” is a perfect match to the query argument • “The shutdown of the plant” is not • When comparing query topic to candidate response, use only one direction of similarity • When comparing two candidate responses for redundancy, look at both • Names can only be omitted if they appear somewhere else nearby in the document (i.e. are still in focus) • Eliminates matches to “the shutdown of the nuclear plant” when the document is about Chernobyl rather than Cernavoda
Scoring: Examples • QUERY: “The arrest of Abu Abbas in Iraq” • The US arrest of Palestinian hardline leader Abu Abbas in Baghdad (0.872619) • The capture of Abu Abbas in Iraq (0.8) • Abbas' capture in Iraq by U.S. military forces (0.739286) • the exile Palestinian radical leader who was arrested near Baghdad in Iraq (0.733333) • Abu Abbas was arrested by US troops near the Iraqi capital of Baghdad. (0.686905) • US troops on Tuesday captured Abu Abbas (0.615476)
shutdown of plant premod premod nuclear Cernavoda shutdown plant of premod plant plant nuclear premod Cernavoda Scoring Variations: Subtree • Subtree similarity: • To measure “subtree” similarity of proptree A to proptree B, break proptree A into a set of weighted subtrees • Subtree weight based on size and position within original tree • Score each subtree with respect to proptree B; calculate weighted sum of subtree scores • Gives relatively high score to “The Cernavoda plant was quiet after the shutdown of the plant and all its operations”
shutdown of plant premod premod nuclear Cernavoda Scoring Variations: Node • Node similarity: • To measure “node” similarity of proptree A to proptree B, break proptree A into a set of weighted nodes • Ignore tree structure and role labels shutdown plant nuclear Cernavoda • Score each node with respect to proptree B; calculate weighted sum of node scores • Still uses cost structure for synonyms, coreference, etc. • Gives high score to “In Cernavoda, the plant was quiet after the shutdown of nuclear operations”
Proposition Trees in Answer Finding • In Year 1, proposition tree matching was used primarily in answer selection patterns • Pattern might specify that the candidate answer must be have at least 80% similarity with the query event/topic argument • Desired similarity can be specified using any or all of the three variations of proposition tree similarity (full, subtree, and node) • For instance, a pattern might specify that either minimum 60% full tree match or minimum 90% node match is acceptable • Similarity score can also optionally consider local context • Similarity score for a sentence becomes a smoothed combination of the raw scores of nearby sentences
Proposition Trees in Redundancy Detection • Proptree matching also used in redundancy detection • E.g. “the plant was closed” is found to be redundant with “the shutdown of the plant” • Expected to be very useful for redundancy in Year 2 • Redundancy between specific nuggets of information must be identified and removed • “The plant was closed in August” and “the shutdown of the plant due to drought” will be partially redundant • Pinpoint identification of redundancy will allow for combination into: “The plant was closed in August due to drought”
Conclusions and Future Work • Proposition trees provide an effective way to • Identify appropriate response nuggets in text • Identify and remove redundancy among responses • Strategy used successfully in Year 1 GNG evaluation • Future work: • Better proposition tree augmentation from OntoNotes • Coreference for all noun phrases (not just ACE entities) • Word sense disambiguation (not just blind use of WordNet) • Expand to Chinese (first) and Arabic (second) • Improved weighting on nodes and branches • Investigation of proposition tree algorithms for topic-based document retrieval