140 likes | 151 Vues
Explore how NLP technologies such as Question Analysis, Named-Entity Detection, Coreference Relations, and Categorical Relation Extraction can enhance passage retrieval for question answering. This study demonstrates the effectiveness of these techniques in improving performance in an information retrieval task.
E N D
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton
Introduction • Paragraph retrieval for natural-language questions. • Correctness of answers to natural language questions can be accurately determined automatically. • Standard precursor to TREC question answering task. • What NLP technologies might help this task and are they robust enough?
NLP Technologies • Question Analysis: • Questions tend to specify the semantic type of their answer. This component tries to identify this type. • Named-Entity Detection: • Named-entity detection determines the semantic type of proper nouns and numeric amounts in text.
How these technologies help? • Question Analysis • The category predicted is appended to the question. • Named-Entity Detection: • The NE categories found in text are included as new terms. • This approach requires additional question terms to be in the paragraph. What party is John Major in? (ORGANIZATION) It probably won't be clearfor some time whether the Conservative Partyhas chosen in John Major a truly worthy successor to Margaret Thatcher, who has been a giant on the world stage. +ORGANIZATION +PERSON
NLP Technologies • Coreference Relations: • Interpretation of a paragraph may depend on the context in which it occurs. • Syntactically-based Categorical Relation Extraction: • Appositive and predicate nominative constructions provide descriptive terms about entities.
How these technologies help? • Coreference: • Use coreference relationships to introduce new terms referred to but not present in the paragraph’s text. How long was Margaret Thatcher the prime minister? (DURATION) The truth, which has been added to over each of her11 1/2 years in power, is that they don't make many like her anymore. +MARGARET +THATCHER +PRIME +MINISTER +DURATION
How these technologies help? • Categorical Relation Extraction • Identifies DESCRIPTION category. • Allows descriptive terms to be used in term expansion. Who is Frank Lloyd Wright? (DESCRIPTION) What architect designed Robie House? (PERSON) Famed architect Frank Lloyd Wright… +DESCRIPTION Buildings he designed include the Guggenheim Museum in New York and Robie House in Chicago. +FRANK +LLOYD +WRIGHT +FAMED +ARCHITECT
How does it work? • Coreference • Use Approach described in ACL (Morton 2000). • Divide referring expressions into three classes and create a separate resolution approach for each. • Singular third person pronouns: Statistical • Proper nouns: Rule-based • Definite noun phrases: Rule-based • Apply resolution approaches to text in an interleaved fashion.
John Major, a truly worthy… • Margaret Thatcher, her, … • The Conservative Party • the undoubted exception • Winston Churchill • … 20% 70% ? she 10% 5% 10% Coreference • Pronoun is resolved to entity rather than most recent extent.
Conclusion • Developed and evaluated new techniques in: • Coreference Resolution. • Categorical Relation Extraction. • Question Analysis. • Integrated these techniques with existing NLP components: • NE detection, POS tagging, Sentence detection, etc. • Demonstrated that these techniques can be used to improve performance in an information retrieval task. • Paragraph retrieval for natural language questions.
Porting this approach to ACE • A rapidly developed IE system • Built using the same approach • Pipelined Architecture • Easy to construct from existing components • Easy to plug in new components • Statistical Components – Maximum Entropy • Require less hand-tuning • Easy to improve with new training data or better machine learning algorithms
Tokenizing/Preprocessing Input File NE Tagging Parsing Nominal Tagging Relation Extraction Coreference Output File
Integrating CRF: Results Entity Scores Relation Scores • The CRF tagger significantly improves NE detection, giving a higher entity score. • Better NE detection allows the system to find more relations, giving a higher relation score. Maxent +BBN Maxent +BBN Maxent CRF Maxent CRF