Question Answering Tutorial

Question Answering Tutorial John M. Prager IBM T.J. Watson Research Center jprager@us.ibm.com

Tutorial Overview • Ground Rules • Part I - Anatomy of QA • A Brief History of QA • Terminology • The essence of Text-based QA • Basic Structure of a QA System • NE Recognition and Answer Types • Answer Extraction • Part II - Specific Approaches • By Genre • By System • Part III - Issues and Advanced Topics • Evaluation • No Answer • Question Difficulty • Dimensions of QA • Relationship questions • Decomposition/Recursive QA • Constraint-based QA • Cross-Language QA • References John M. Prager RANLP 2003 Tutorial on Question Answering

Ground Rules • Breaks • Questions • Topics • Focus on English Text • TREC & AQUAINT & beyond • General Principles • Tricks-of-the-Trade • State-of-the-Art Methodologies • My own System vs. My own Research • Caution John M. Prager RANLP 2003 Tutorial on Question Answering

Caution Nothing in this Tutorial is true Nothing in this Tutorial is true universally John M. Prager RANLP 2003 Tutorial on Question Answering

Part I - Anatomy of QA • A Brief History of QA • Terminology • The Essence of Text-based QA • Basic Structure of a QA System • NE Recognition and Answer Types • Answer Extraction John M. Prager RANLP 2003 Tutorial on Question Answering

A Brief History of QA • NLP front-ends to Expert Systems • SHRDLU (Winograd, 1972) • User manipulated, and asked questions about, blocks world • First real demo of combination of syntax, semantics, and reasoning • NLP front-ends to Databases • LUNAR (Woods,1973) • User asked questions about moon rocks • Used ATNs and procedural semantics • LIFER/LADDER (Hendrix et al. 1977) • User asked questions about U.S. Navy ships • Used semantic grammar; domain information built into grammar • NLP + logic • CHAT-80 (Warren & Pereira, 1982) • NLP query system in Prolog, about world geography • Definite Clause Grammars • “Modern Era of QA” • MURAX (Kupiec, 2001) • NLP front-end to Encyclopaedia • NLP + hand-coded annotations to sources • AskJeeves (www.ask.com) • START (Katz, 1997) • Started with text, extended to multimedia • IR + NLP • TREC-8 (1999) (Voorhees & Tice, 2000) • Today – all of the above John M. Prager RANLP 2003 Tutorial on Question Answering

Some “factoid” questions from TREC8-9 • 9: How far is Yaroslavl from Moscow? • 15: When was London's Docklands Light Railway constructed? • 22: When did the Jurassic Period end? • 29: What is the brightest star visible from Earth? • 30: What are the Valdez Principles? • 73: Where is the Taj Mahal? • 134: Where is it planned to berth the merchant ship, Lane Victory, which Merchant Marine veterans are converting into a floating museum? • 197: What did Richard Feynman say upon hearing he would receive the Nobel Prize in Physics? • 198: How did Socrates die? • 199: How tall is the Matterhorn? • 200: How tall is the replica of the Matterhorn at Disneyland? • 227: Where does dew come from? • 269: Who was Picasso? • 298: What is California's state tree? John M. Prager RANLP 2003 Tutorial on Question Answering

Terminology • Question Type • Answer Type • Question Focus • Question Topic • Candidate Passage • Candidate Answer • Authority File/List John M. Prager RANLP 2003 Tutorial on Question Answering

Terminology – Question Type • Question Type: an idiomatic categorization of questions for purposes of distinguishing between different processing strategies and/or answer formats • E.g. TREC2003 • FACTOID: “How far is it from Earth to Mars?” • LIST: “List the names of chewing gums” • DEFINITION: “Who is Vlad the Impaler?” • Other possibilities: • RELATIONSHIP: “What is the connection between Valentina Tereshkova and Sally Ride?” • SUPERLATIVE: “What is the largest city on Earth?” • YES-NO: “Is Saddam Hussein alive?” • OPINION: “What do most Americans think of gun control?” • CAUSE&EFFECT: “Why did Iraq invade Kuwait?” • … John M. Prager RANLP 2003 Tutorial on Question Answering

Terminology – Answer Type • Answer Type: the class of object (or rhetorical type of sentence) sought by the question. E.g. • PERSON (from “Who …”) • PLACE (from “Where …”) • DATE (from “When …”) • NUMBER (from “How many …”) • … but also • EXPLANATION (from “Why …”) • METHOD (from “How …”) • … • Answer types are usually tied intimately to the classes recognized by the system’s Named Entity Recognizer. John M. Prager RANLP 2003 Tutorial on Question Answering

Terminology – Question Focus • Question Focus: The property or entity that is being sought by the question. • E.g. • “In what state is the Grand Canyon?” • “What is the population of Bulgaria?” • “What colour is a pomegranate?” John M. Prager RANLP 2003 Tutorial on Question Answering

Terminology – Question Topic • Question Topic: the object (person, place, …) or event that the question is about. The question might well be about a property of the topic, which will be the question focus. • E.g. “What is the height of Mt. Everest?” • height is the focus • Mt. Everest is the topic John M. Prager RANLP 2003 Tutorial on Question Answering

Terminology – Candidate Passage • Candidate Passage: a text passage (anything from a single sentence to a whole document) retrieved by a search engine in response to a question. • Depending on the query and kind of index used, there may or may not be a guarantee that a candidate passage has any candidate answers. • Candidate passages will usually have associated scores, from the search engine. John M. Prager RANLP 2003 Tutorial on Question Answering

Terminology – Candidate Answer • Candidate Answer: in the context of a question, a small quantity of text (anything from a single word to a sentence or bigger, but usually a noun phrase) that is of the same type as the Answer Type. • In some systems, the type match may be approximate, if there is the concept of confusability. • Candidate answers are found in candidate passages • E.g. • 50 • Queen Elizabeth II • September 8, 2003 • by baking a mixture of flour and water John M. Prager RANLP 2003 Tutorial on Question Answering

Terminology – Authority List • Authority List (or File): a collection of instances of a class of interest, used to test a term for class membership. • Instances should be derived from an authoritative source and be as close to complete as possible. • Ideally, class is small, easily enumerated and with members with a limited number of lexical forms. • Good: • Days of week • Planets • Elements • Good statistically, but difficult to get 100% recall: • Animals • Plants • Colours • Problematic • People • Organizations • Impossible • All numeric quantities • Explanations and other clausal quantities John M. Prager RANLP 2003 Tutorial on Question Answering

Essence of Text-based QA (Single source answers) • Need to find a passage that answers the question. • Find a candidate passage (search) • Check that semantics of passage and question match • Extract the answer John M. Prager RANLP 2003 Tutorial on Question Answering

Essence of Text-based QA Search • For a very small corpus, can consider every passage as a candidate, but this is not interesting • Need to perform a search to locate good passages. • If search is too broad, have not achieved that much, and are faced with lots of noise • If search is too narrow, will miss good passages • Two broad possibilities: • Optimize search • Use iteration John M. Prager RANLP 2003 Tutorial on Question Answering

Essence of Text-based QA Match • Need to test whether semantics of passage match semantics of question • Count question words present in passage • Score based on proximity • Score based on syntactic relationships • Prove match John M. Prager RANLP 2003 Tutorial on Question Answering

Essence of Text-based QA Answer Extraction • Find candidate answers of same type as the answer type sought in question. • Has implications for size of type hierarchy • Where/when/whether to consider subsumption • Consider later John M. Prager RANLP 2003 Tutorial on Question Answering

Basic Structure of a QA-System • See for example Abney et al., 2000; Clarke et al., 2001; Harabagiu et al.; Hovy et al., 2001; Prager et al. 2000 Corpus or Web Question Analysis Search Question Query Answer Type Documents/ passages Answer Extraction Answer John M. Prager RANLP 2003 Tutorial on Question Answering

Essence of Text-based QA High-Level View of Recall • Have three broad locations in the system where expansion takes place, for purposes of matching passages • Where is the right trade-off? • Question Analysis. • Expand individual terms to synonyms (hypernyms, hyponyms, related terms) • Reformulate question • In Search Engine • Generally avoided for reasons of computational expense • At indexing time • Stemming/lemmatization John M. Prager RANLP 2003 Tutorial on Question Answering

Essence of Text-based QA High-Level View of Precision • Have three broad locations in the system where narrowing/filtering/matching takes place • Where is the right trade-off? • Question Analysis. • Include all question terms in query • Use IDF-style weighting to indicate preferences • Search Engine • Possibly store POS information for polysemous terms • Answer Extraction • Reward (penalize) passages/answers that (don’t) pass test • Particularly attractive for temporal modification John M. Prager RANLP 2003 Tutorial on Question Answering

Answer Types and Modifiers Name 5 French Cities • Most likely there is no type for “French Cities” • So will look for CITY • include “French/France” in bag of words, and hope for the best • include “French/France” in bag of words, retrieve documents, and look for evidence (deep parsing, logic) • use high-precision Language Identification on results • If you have a list of French cities, could either • Filter results by list • Use Answer-Based QA (see later) • Use longitude/latitude information of cities and countries John M. Prager RANLP 2003 Tutorial on Question Answering

Answer Types and Modifiers Name a female figure skater • Most likely there is no type for “female figure skater” • Most likely there is no type for “figure skater” • Look for PERSON, with query terms {figure, skater} • What to do about “female”? Two approaches. • Include “female” in the bag-of-words. • Relies on logic that if “femaleness” is an interesting property, it might well be mentioned in answer passages. • Does not apply to, say “singer”. • Leave out “female” but test candidate answers for gender. • Needs either an authority file or a heuristic test. • Test may not be definitive. John M. Prager RANLP 2003 Tutorial on Question Answering

Named Entity Recognition • BBN’s IdentiFinder (Bikel et al. 1999) • Hidden Markov Model • Sheffield GATE (http://www.gate.ac.uk/) • Development Environment for IE and other NLP activities • IBM’s Textract/Resporator (Byrd & Ravin, 1999; Wacholder et al. 1997; Prager et al. 2000) • FSMs and Authority Files • + others • Inventory of semantic classes recognized by NER related closely to set of answer types system can handle John M. Prager RANLP 2003 Tutorial on Question Answering

Named Entity Recognition John M. Prager RANLP 2003 Tutorial on Question Answering

Probabilistic Labelling (IBM) • In Textract, a Proper name can be one of the following • PERSON • PLACE • ORGANIZATION • MISC_ENTITY (e.g. names of Laws, Treaties, Reports, …) • However, NER needs another class (UNAME) for any proper name it can’t identify. • In a large corpus, many entities end up being UNAMEs. • If, for example, a “Where” question seeks a PLACE, and similarly for the others above, then is being classified as UNAME a death sentence? How will a UNAME ever be searched for? John M. Prager RANLP 2003 Tutorial on Question Answering

Probabilistic Labelling (IBM) • When entity is ambiguous or plain unknown, use a set of disjoint special labels in NER, instead of UNAME • Assumes NER is able to rule out some possibilities, at least sometimes. • Annotate with all remaining possibilities • Use these labels as part of answer type • E.g. • UNP <-> could be a PERSON • UNL <-> could be a PLACE • UNO <-> could be an ORGANIZATION • UNE <-> could be a MISC_ENTITY • So • {UNP UNL} <-> could be a PERSON or a PLACE • This would be a good label for Beverly Hills John M. Prager RANLP 2003 Tutorial on Question Answering

Probabilistic Labelling (IBM) • So “Who” questions that would normally generate {PERSON} as answer type, now generate {PERSON UNP} • Question:“Who is David Beckham married to?” • Answer Passage: “David Beckham, the soccer star engaged to marry Posh Spice, is being blamed for England 's World Cup defeat.” • “Posh Spice” gets annotated with {UNP UNO} • Match occurs, answer found. Crowd erupts! John M. Prager RANLP 2003 Tutorial on Question Answering

Issues with NER • Coreference • Should referring terms (definite noun phrases, pronouns) be labelled the same way as the referent terms? • Nested Noun Phrases (and other structures of interest) • What granularity? • Partly depends on whether multiple annotations are allowed • Subsumption and Ambiguity • What label(s) to choose? • Probabilistic labelling John M. Prager RANLP 2003 Tutorial on Question Answering

How to Annotate? “… Baker will leave Jerusalem on Saturday and stop in Madrid on the way home to talk to Spanish Prime Minister Felipe Gonzales.” What about:The U.S. ambassador to Spain, Ed Romero? John M. Prager RANLP 2003 Tutorial on Question Answering

Answer Extraction • Also called Answer Selection/Pinpointing • Given a question and candidate passages, the process of selecting and ranking candidate answers. • Usually, candidate answers are those terms in the passages which have the same answer type as that generated from the question • Ranking the candidate answers depends on assessing how well the passage context relates to the question • 3 Approaches: • Heuristic features • Shallow parse fragments • Logical proof John M. Prager RANLP 2003 Tutorial on Question Answering

Answer Extraction using Features • Heuristic feature sets (Prager et al. 2003+). See also (Radev at al. 2000) • Calculate feature values for each CA, and then calculate linear combination using weights learned from training data. • Ranking criteria: • Good global context: • the global context of a candidate answer evaluates the relevance of the passage from which the candidate answer is extracted to the question. • Good local context • the local context of a candidate answer assesses the likelihood that the answer fills in the gap in the question. • Right semantic type • the semantic type of a candidate answer should either be the same as or a subtype of the answer type identified by the question analysis component. • Redundancy • the degree of redundancy for a candidate answer increases as more instances of the answer occur in retrieved passages. John M. Prager RANLP 2003 Tutorial on Question Answering

Answer Extraction using Features (cont.) • Features for Global Context • KeywordsInPassage: the ratio of keywords present in a passage to the total number of keywords issued to the search engine. • NPMatch: the number of words in noun phrases shared by both the question and the passage. • SEScore: the ratio of the search engine score for a passage to the maximum achievable score. • FirstPassage: a Boolean value which is true for the highest ranked passage returned by the search engine, and false for all other passages. • Features for Local Context • AvgDistance: the average distance between the candidate answer and keywords that occurred in the passage. • NotInQuery: the number of words in the candidate answers that are not query keywords. John M. Prager RANLP 2003 Tutorial on Question Answering

Answer Extraction using Relationships • Computing Ranking Scores – • Linguistic knowledge to compute passage & candidate answer scores • Perform syntactic processing on question and candidate passages • Extract predicate-argument & modification relationships from parse • Question: “Who wrote the Declaration of Independence?” Relationships: [X, write], [write, Declaration of Independence] • Answer Text: “Jefferson wrote the Declaration of Independence.” Relationships: [Jefferson, write], [write, Declaration of Independence] • Compute scores based on number of question relationship matches • Passage score: consider all instantiated relationships • Candidate answer scores: consider relationships with variable John M. Prager RANLP 2003 Tutorial on Question Answering

Answer Extraction using Relationships (cont.) • Example: When did Amtrak begin operations? • Question relationships • [Amtrak, begin], [begin, operation], [X, begin] • Compute passage scores: passages and relationships • In 1971, Amtrak began operations,… • [Amtrak, begin], [begin, operation], [1971, begin]… • “Today, things are looking better,” said Claytor, expressing optimism about getting the additional federal funds in future years that will allow Amtrak to begin expanding its operations. • [Amtrak, begin], [begin, expand], [expand, operation], [today, look]… • Airfone, which began operations in 1984, has installed air-to-ground phones…. Airfone also operates Railfone, a public phone service on Amtrak trains. • [Airfone, begin], [begin, operation], [1984, operation], [Amtrak, train]… John M. Prager RANLP 2003 Tutorial on Question Answering

Answer Extraction using Logic • Logical Proof • Convert question to a goal • Convert passage to set of logical forms representing individual assertions • Add predicates representing subsumption rules, real-world knowledge • Prove the goal • See section on LCC later John M. Prager RANLP 2003 Tutorial on Question Answering

Question Answering Tutorial Part II John M. Prager IBM T.J. Watson Research Center jprager@us.ibm.com

Part II - Specific Approaches • By Genre • Statistical QA • Pattern-based QA • Web-based QA • Answer-based QA (TREC only) • By System • SMU • LCC • USC-ISI • Insight • Microsoft • IBM Statistical • IBM Rule-based John M. Prager RANLP 2003 Tutorial on Question Answering

Approaches by Genre • By Genre • Statistical QA • Pattern-based QA • Web-based QA • Answer-based QA (TREC only) • Web-based QA • Database-based QA • Considerations • Effectiveness by question-type • Precision and recall • Expandability to other domains • Ease of adaptation to CL-QA John M. Prager RANLP 2003 Tutorial on Question Answering

Statistical QA • Use statistical distributions to model likelihoods of answer type and answer • E.g. IBM (Ittycheriah, 2001) – see later section John M. Prager RANLP 2003 Tutorial on Question Answering

Pattern-based QA • For a given question type, identify the typical syntactic constructions used in text to express answers to such questions • Typically very high precision, but a lot of work to get decent recall John M. Prager RANLP 2003 Tutorial on Question Answering

Web-Based QA • Exhaustive string transformations • Brill et al. 2002 • Learning • Radev et al. 2001 John M. Prager RANLP 2003 Tutorial on Question Answering

Answer-Based QA • Problem: Sometimes it is very easy to find an answer to a question using resource A, but the task demands that you find it in resource B. • Solution: First find the answer in resource A, then locate the same answer, along with original question terms, in resource B. • Artificial problem, but real for TREC participants. John M. Prager RANLP 2003 Tutorial on Question Answering

Answer-Based QA • Web-Based solution: When a QA system looks for answers within a relatively small textual collection, the chance of finding strings/sentences that closely match the question string is small. However, when a QA system looks for strings/sentences that closely match the question string on the web, the chance of finding correct answer is much higher. Hermjakob et al. 2002 • Why this is true: • The Web is much larger than the TREC Corpus (3,000 : 1) • TREC questions are generated from Web logs, and the style of language (and subjects of interest) in these logs are more similar to the Web content than to newswire collections. John M. Prager RANLP 2003 Tutorial on Question Answering

Answer-Based QA • Database/Knowledge-base/Ontology solution: • When question syntax is simple and reliably recognizable, can express as a logical form • Logical form represents entire semantics of question, and can be used to access structured resource: • WordNet • On-line dictionaries • Tables of facts & figures • Knowledge-bases such as Cyc • Having found answer • construct a query with original question terms + answer • Retrieve passages • Tell Answer Extraction the answer it is looking for John M. Prager RANLP 2003 Tutorial on Question Answering

Approaches of Specific Systems • SMU Falcon • LCC • USC-ISI • Insight • Microsoft • IBM Note: Some of the slides and/or examples in these sections are taken from papers or presentations from the respective system authors John M. Prager RANLP 2003 Tutorial on Question Answering

SMU Falcon Harabagiu et al. 2000 John M. Prager RANLP 2003 Tutorial on Question Answering

SMU Falcon • From question, dependency structure called question semantic form is created • Query is Boolean conjunction of terms • From answer passages that contain at least one instance of answer type, generate answer semantic form • 3 processing loops: • Loop 1 • Triggered when too few or too many passages are retrieved from search engine • Loop 2 • Triggered when question semantic form and answer semantic form cannot be unified • Loop 3 • Triggered when unable to perform abductive proof of answer correctness John M. Prager RANLP 2003 Tutorial on Question Answering

SMU Falcon • Loops provide opportunities to perform alternations • Loop 1: morphological expansions and nominalizations • Loop 2: lexical alternations – synonyms, direct hypernyms and hyponyms • Loop 3: paraphrases • Evaluation (Pasca & Harabagiu, 2001). Increase in accuracy in 50-byte task in TREC9 • Loop 1: 40% • Loop 2: 52% • Loop 3: 8% • Combined: 76% John M. Prager RANLP 2003 Tutorial on Question Answering

Question Answering Tutorial

Question Answering Tutorial

Presentation Transcript

Question-Answering

Answering Question 1

Question Answering

Question AnswerinG

Question-Answering: Overview

Answering question 2

Question Answering Tutorial

Question Answering Technologies

Question Answering

Question Answering

Question Answering

Question Answering

Question-Answering: Overview

QUESTION AND ANSWERING

Question Answering

Question Answering

Question Answering

Question Answering System