Lecture 32 Question Answering

Lecture 32Question Answering November 10, 2005

Question Answering Tutorial John M. Prager IBM T.J. Watson Research Center jprager@us.ibm.com

Tutorial Overview • Ground Rules • Part I - Anatomy of QA • A Brief History of QA • Terminology • The essence of Text-based QA • Basic Structure of a QA System • NE Recognition and Answer Types • Answer Extraction • Part II - Specific Approaches • By Genre • By System • Part III - Issues and Advanced Topics • Evaluation • No Answer • Question Difficulty • Dimensions of QA • Relationship questions • Decomposition/Recursive QA • Constraint-based QA • Cross-Language QA • References

Part I - Anatomy of QA • Terminology • The Essence of Text-based QA • Basic Structure of a QA System • NE Recognition and Answer Types • Answer Extraction

Some “factoid” questions from TREC8-9 • 9: How far is Yaroslavl from Moscow? • 15: When was London's Docklands Light Railway constructed? • 22: When did the Jurassic Period end? • 29: What is the brightest star visible from Earth? • 30: What are the Valdez Principles? • 73: Where is the Taj Mahal? • 134: Where is it planned to berth the merchant ship, Lane Victory, which Merchant Marine veterans are converting into a floating museum? • 197: What did Richard Feynman say upon hearing he would receive the Nobel Prize in Physics? • 198: How did Socrates die? • 199: How tall is the Matterhorn? • 200: How tall is the replica of the Matterhorn at Disneyland? • 227: Where does dew come from? • 269: Who was Picasso? • 298: What is California's state tree?

Terminology • Question Type • Answer Type • Question Focus • Question Topic • Candidate Passage • Candidate Answer • Authority File/List

Terminology – Question Type • Question Type: an idiomatic categorization of questions for purposes of distinguishing between different processing strategies and/or answer formats • E.g. TREC2003 • FACTOID: “How far is it from Earth to Mars?” • LIST: “List the names of chewing gums” • DEFINITION: “Who is Vlad the Impaler?” • Other possibilities: • RELATIONSHIP: “What is the connection between Valentina Tereshkova and Sally Ride?” • SUPERLATIVE: “What is the largest city on Earth?” • YES-NO: “Is Saddam Hussein alive?” • OPINION: “What do most Americans think of gun control?” • CAUSE&EFFECT: “Why did Iraq invade Kuwait?” • …

Terminology – Answer Type • Answer Type: the class of object (or rhetorical type of sentence) sought by the question. E.g. • PERSON (from “Who …”) • PLACE (from “Where …”) • DATE (from “When …”) • NUMBER (from “How many …”) • … but also • EXPLANATION (from “Why …”) • METHOD (from “How …”) • … • Answer types are usually tied intimately to the classes recognized by the system’s Named Entity Recognizer.

Terminology – Question Focus • Question Focus: The property or entity that is being sought by the question. • E.g. • “In whatstateis the Grand Canyon?” • “What is thepopulationof Bulgaria?” • “Whatcolouris a pomegranate?”

Terminology – Question Topic • Question Topic: the object (person, place, …) or event that the question is about. The question might well be about a property of the topic, which will be the question focus. • E.g. “What is the height of Mt. Everest?” • height is the focus • Mt. Everest is the topic

Terminology – Candidate Passage • Candidate Passage: a text passage (anything from a single sentence to a whole document) retrieved by a search engine in response to a question. • Depending on the query and kind of index used, there may or may not be a guarantee that a candidate passage has any candidate answers. • Candidate passages will usually have associated scores, from the search engine.

Terminology – Candidate Answer • Candidate Answer: in the context of a question, a small quantity of text (anything from a single word to a sentence or bigger, but usually a noun phrase) that is of the same type as the Answer Type. • In some systems, the type match may be approximate, if there is the concept of confusability. • Candidate answers are found in candidate passages • E.g. • 50 • Queen Elizabeth II • September 8, 2003 • by baking a mixture of flour and water

Terminology – Authority List • Authority List (or File): a collection of instances of a class of interest, used to test a term for class membership. • Instances should be derived from an authoritative source and be as close to complete as possible. • Ideally, class is small, easily enumerated and with members with a limited number of lexical forms. • Good: • Days of week • Planets • Elements • Good statistically, but difficult to get 100% recall: • Animals • Plants • Colours • Problematic • People • Organizations • Impossible • All numeric quantities • Explanations and other clausal quantities

Essence of Text-based QA (Single source answers) • Need to find a passage that answers the question. • Find a candidate passage (search) • Check that semantics of passage and question match • Extract the answer

Essence of Text-based QA Search • For a very small corpus, can consider every passage as a candidate, but this is not interesting • Need to perform a search to locate good passages. • If search is too broad, have not achieved that much, and are faced with lots of noise • If search is too narrow, will miss good passages • Two broad possibilities: • Optimize search • Use iteration

Essence of Text-based QA Match • Need to test whether semantics of passage match semantics of question • Count question words present in passage • Score based on proximity • Score based on syntactic relationships • Prove match

Essence of Text-based QA Answer Extraction • Find candidate answers of same type as the answer type sought in question. • Has implications for size of type hierarchy • Where/when/whether to consider subsumption • Consider later

Basic Structure of a QA-System • See for example Abney et al., 2000; Clarke et al., 2001; Harabagiu et al.; Hovy et al., 2001; Prager et al. 2000 Corpus or Web Question Analysis Search Question Query Answer Type Documents/ passages Answer Extraction Answer

Essence of Text-based QA High-Level View of Recall • Have three broad locations in the system where expansion takes place, for purposes of matching passages • Where is the right trade-off? • Question Analysis. • Expand individual terms to synonyms (hypernyms, hyponyms, related terms) • Reformulate question • In Search Engine • Generally avoided for reasons of computational expense • At indexing time • Stemming/lemmatization

Essence of Text-based QA High-Level View of Precision • Have three broad locations in the system where narrowing/filtering/matching takes place • Where is the right trade-off? • Question Analysis. • Include all question terms in query • Use IDF-style weighting to indicate preferences • Search Engine • Possibly store POS information for polysemous terms • Answer Extraction • Reward (penalize) passages/answers that (don’t) pass test • Particularly attractive for temporal modification

Answer Types and Modifiers Name 5 French Cities • Most likely there is no type for “French Cities” • So will look for CITY • include “French/France” in bag of words, and hope for the best • include “French/France” in bag of words, retrieve documents, and look for evidence (deep parsing, logic) • use high-precision Language Identification on results • If you have a list of French cities, could either • Filter results by list • Use Answer-Based QA (see later) • Use longitude/latitude information of cities and countries

Answer Types and Modifiers Name a female figure skater • Most likely there is no type for “female figure skater” • Most likely there is no type for “figure skater” • Look for PERSON, with query terms {figure, skater} • What to do about “female”? Two approaches. • Include “female” in the bag-of-words. • Relies on logic that if “femaleness” is an interesting property, it might well be mentioned in answer passages. • Does not apply to, say “singer”. • Leave out “female” but test candidate answers for gender. • Needs either an authority file or a heuristic test. • Test may not be definitive.

Named Entity Recognition • BBN’s IdentiFinder (Bikel et al. 1999) • Hidden Markov Model • Sheffield GATE (http://www.gate.ac.uk/) • Development Environment for IE and other NLP activities • IBM’s Textract/Resporator (Byrd & Ravin, 1999; Wacholder et al. 1997; Prager et al. 2000) • FSMs and Authority Files • + others • Inventory of semantic classes recognized by NER related closely to set of answer types system can handle

Named Entity Recognition

Probabilistic Labelling (IBM) • In Textract, a Proper name can be one of the following • PERSON • PLACE • ORGANIZATION • MISC_ENTITY (e.g. names of Laws, Treaties, Reports, …) • However, NER needs another class (UNAME) for any proper name it can’t identify. • In a large corpus, many entities end up being UNAMEs. • If, for example, a “Where” question seeks a PLACE, and similarly for the others above, then is being classified as UNAME a death sentence? How will a UNAME ever be searched for?

Probabilistic Labelling (IBM) • When entity is ambiguous or plain unknown, use a set of disjoint special labels in NER, instead of UNAME • Assumes NER is able to rule out some possibilities, at least sometimes. • Annotate with all remaining possibilities • Use these labels as part of answer type • E.g. • UNP <-> could be a PERSON • UNL <-> could be a PLACE • UNO <-> could be an ORGANIZATION • UNE <-> could be a MISC_ENTITY • So • {UNP UNL} <-> could be a PERSON or a PLACE • This would be a good label for Beverly Hills

Probabilistic Labelling (IBM) • So “Who” questions that would normally generate {PERSON} as answer type, now generate {PERSON UNP} • Question:“Who is David Beckham married to?” • Answer Passage:“David Beckham, the soccer star engaged to marry Posh Spice, is being blamed for England 's World Cup defeat.” • “Posh Spice” gets annotated with {UNP UNO} • Match occurs, answer found. Crowd erupts!

Issues with NER • Coreference • Should referring terms (definite noun phrases, pronouns) be labelled the same way as the referent terms? • Nested Noun Phrases (and other structures of interest) • What granularity? • Partly depends on whether multiple annotations are allowed • Subsumption and Ambiguity • What label(s) to choose? • Probabilistic labelling

How to Annotate? “… Baker will leave Jerusalem on Saturday and stop in Madrid on the way home to talk to Spanish Prime Minister Felipe Gonzales.” What about:The U.S. ambassador to Spain, Ed Romero?

Answer Extraction • Also called Answer Selection/Pinpointing • Given a question and candidate passages, the process of selecting and ranking candidate answers. • Usually, candidate answers are those terms in the passages which have the same answer type as that generated from the question • Ranking the candidate answers depends on assessing how well the passage context relates to the question • 3 Approaches: • Heuristic features • Shallow parse fragments • Logical proof

Answer Extraction using Features • Heuristic feature sets (Prager et al. 2003+). See also (Radev at al. 2000) • Calculate feature values for each CA, and then calculate linear combination using weights learned from training data. • Ranking criteria: • Good global context: • the global context of a candidate answer evaluates the relevance of the passage from which the candidate answer is extracted to the question. • Good local context • the local context of a candidate answer assesses the likelihood that the answer fills in the gap in the question. • Right semantic type • the semantic type of a candidate answer should either be the same as or a subtype of the answer type identified by the question analysis component. • Redundancy • the degree of redundancy for a candidate answer increases as more instances of the answer occur in retrieved passages.

Answer Extraction using Features (cont.) • Features for Global Context • KeywordsInPassage: the ratio of keywords present in a passage to the total number of keywords issued to the search engine. • NPMatch: the number of words in noun phrases shared by both the question and the passage. • SEScore: the ratio of the search engine score for a passage to the maximum achievable score. • FirstPassage: a Boolean value which is true for the highest ranked passage returned by the search engine, and false for all other passages. • Features for Local Context • AvgDistance: the average distance between the candidate answer and keywords that occurred in the passage. • NotInQuery: the number of words in the candidate answers that are not query keywords.

Answer Extraction using Relationships • Computing Ranking Scores – • Linguistic knowledge to compute passage & candidate answer scores • Perform syntactic processing on question and candidate passages • Extract predicate-argument & modification relationships from parse • Question: “Who wrote the Declaration of Independence?” Relationships: [X, write], [write, Declaration of Independence] • Answer Text: “Jefferson wrote the Declaration of Independence.” Relationships: [Jefferson, write], [write, Declaration of Independence] • Compute scores based on number of question relationship matches • Passage score: consider all instantiated relationships • Candidate answer scores: consider relationships with variable

Answer Extraction using Relationships (cont.) • Example: When did Amtrak begin operations? • Question relationships • [Amtrak, begin], [begin, operation], [X, begin] • Compute passage scores: passages and relationships • In 1971, Amtrak began operations,… • [Amtrak, begin], [begin, operation], [1971, begin]… • “Today, things are looking better,” said Claytor, expressing optimism about getting the additional federal funds in future years that will allow Amtrak to begin expanding its operations. • [Amtrak, begin], [begin, expand], [expand, operation], [today, look]… • Airfone, which began operations in 1984, has installed air-to-ground phones…. Airfone also operates Railfone, a public phone service on Amtrak trains. • [Airfone, begin], [begin, operation], [1984, operation], [Amtrak, train]…

Answer Extraction using Logic • Logical Proof • Convert question to a goal • Convert passage to set of logical forms representing individual assertions • Add predicates representing subsumption rules, real-world knowledge • Prove the goal • See section on LCC later

Question Answering Tutorial Part II John M. Prager IBM T.J. Watson Research Center jprager@us.ibm.com

Part II - Specific Approaches • By Genre • Statistical QA • Pattern-based QA • Web-based QA • Answer-based QA (TREC only) • By System • SMU • LCC • USC-ISI • Insight • Microsoft • IBM Statistical • IBM Rule-based

Approaches by Genre • By Genre • Statistical QA • Pattern-based QA • Web-based QA • Answer-based QA (TREC only) • Web-based QA • Database-based QA • Considerations • Effectiveness by question-type • Precision and recall • Expandability to other domains • Ease of adaptation to CL-QA

Statistical QA • Use statistical distributions to model likelihoods of answer type and answer • E.g. IBM (Ittycheriah, 2001) – see later section

Pattern-based QA • For a given question type, identify the typical syntactic constructions used in text to express answers to such questions • Typically very high precision, but a lot of work to get decent recall

Web-Based QA • Exhaustive string transformations • Brill et al. 2002 • Learning • Radev et al. 2001

Answer-Based QA • Problem: Sometimes it is very easy to find an answer to a question using resource A, but the task demands that you find it in resource B. • Solution: First find the answer in resource A, then locate the same answer, along with original question terms, in resource B. • Artificial problem, but real for TREC participants.

Answer-Based QA • Web-Based solution: When a QA system looks for answers within a relatively small textual collection, the chance of finding strings/sentences that closely match the question string is small. However, when a QA system looks for strings/sentences that closely match the question string on the web, the chance of finding correct answer is much higher. Hermjakob et al. 2002 • Why this is true: • The Web is much larger than the TREC Corpus (3,000 : 1) • TREC questions are generated from Web logs, and the style of language (and subjects of interest) in these logs are more similar to the Web content than to newswire collections.

Answer-Based QA • Database/Knowledge-base/Ontology solution: • When question syntax is simple and reliably recognizable, can express as a logical form • Logical form represents entire semantics of question, and can be used to access structured resource: • WordNet • On-line dictionaries • Tables of facts & figures • Knowledge-bases such as Cyc • Having found answer • construct a query with original question terms + answer • Retrieve passages • Tell Answer Extraction the answer it is looking for

Approaches of Specific Systems • SMU Falcon • LCC • USC-ISI • Insight • Microsoft • IBM Note: Some of the slides and/or examples in these sections are taken from papers or presentations from the respective system authors

SMU Falcon Harabagiu et al. 2000

SMU Falcon • From question, dependency structure called question semantic form is created • Query is Boolean conjunction of terms • From answer passages that contain at least one instance of answer type, generate answer semantic form • 3 processing loops: • Loop 1 • Triggered when too few or too many passages are retrieved from search engine • Loop 2 • Triggered when question semantic form and answer semantic form cannot be unified • Loop 3 • Triggered when unable to perform abductive proof of answer correctness

SMU Falcon • Loops provide opportunities to perform alternations • Loop 1: morphological expansions and nominalizations • Loop 2: lexical alternations – synonyms, direct hypernyms and hyponyms • Loop 3: paraphrases • Evaluation (Pasca & Harabagiu, 2001). Increase in accuracy in 50-byte task in TREC9 • Loop 1: 40% • Loop 2: 52% • Loop 3: 8% • Combined: 76%

LCC • Moldovan & Rus, 2001 • Uses Logic Prover for answer justification • Question logical form • Candidate answers in logical form • XWN glosses • Linguistic axioms • Lexical chains • Inference engine attempts to verify answer by negating question and proving a contradiction • If proof fails, predicates in question are gradually relaxed until proof succeeds or associated proof score is below a threshold.

LCC: Lexical Chains Q:1518 What year did Marco Polo travel to Asia? Answer:Marco polo divulged the truth after returning in 1292 from his travels, which included several months on Sumatra Lexical Chains: (1) travel_to:v#1 -> GLOSS -> travel:v#1 -> RGLOSS -> travel:n#1 (2) travel_to#1 -> GLOSS -> travel:v#1 -> HYPONYM -> return:v#1 (3) Sumatra:n#1 -> ISPART -> Indonesia:n#1 -> ISPART -> Southeast _Asia:n#1 -> ISPART -> Asia:n#1 Q:1570 What is the legal age to vote in Argentina? Answer:Voting is mandatory for all Argentines aged over 18. Lexical Chains: (1) legal:a#1 -> GLOSS -> rule:n#1 -> RGLOSS -> mandatory:a#1 (2) age:n#1 -> RGLOSS -> aged:a#3 (3) Argentine:a#1 -> GLOSS -> Argentina:n#1

Lecture 32 Question Answering

Lecture 32 Question Answering

Presentation Transcript

Question-Answering

Answering Question 1

Question Answering

Question AnswerinG

Question-Answering: Overview

Answering question 2

Question Answering Tutorial

Question Answering Technologies

Question Answering Tutorial

Question Answering

Question Answering

Question Answering

Question Answering

QUESTION AND ANSWERING

Question Answering

Question Answering

Question Answering

Question Answering System