Natural Language Processing Question Answering

Natural Language ProcessingQuestion Answering Meeting 8, 9/25/2012 Rodney Nielsen

Question Answering • Automating question answering is one of the oldest problems in computational linguistics. • We can usefully divide the kinds of questions we’d like to answer into two groups • Simple (factoid) questions • Where is the spleen? • Complex (narrative) questions • What are the co-morbidities exhibited by patients diagnosed with long QT syndrome?

Corpus-Based Approaches • Focus: • Corpus-based approaches to both kinds • Vs. knowledge-based • Assumptions: • IR system with document index • Relevant user question

Corpus-Based Approaches • Factoid questions • Extract • Complex questions • Summarize • Query-focused summarization

Factoid Q/A

Question Processing • The user’s question plays two roles in the processing... • It serves as the initial basis for creating a query to the IR system • The best query depends on the situation • And it guides the search for an answer in the returned relevant passages • The kind of question dictates the kind of answer; so classify questions based on the type of likely answers

Query Formulation • Who won the Nobel Peace Prize in 2002? • Many answers likely (e.g., large doc collection): Use keywords • won Nobel Peace Prize 2002 • Few answers likely (e.g., small doc collection): Perform query term expansion • won win wins winning succeed gain awarded attain Nobel Peace calm amity Prize award honor 2002 • Very many answers likely (e.g., simple Q to web): Reformulate the query • “the 2002 Nobel Peace Prize was awarded to” • “won the Nobel Peace Prize in 2002”

Query Formulation • Where is the spleen? • Many answers likely (e.g., large doc collection): Use keywords • spleen • Few answers likely (e.g., small doc collection): Perform query term expansion • spleen lien irascibility “short temper” “quick temper” • Very many answers likely (e.g., simple Q to web): Reformulate the query • “the spleen is located in”

Question Classification • Why? • Expected Answer Type classification • What is long QT syndrome? • Definition • What antibiotics treat pneumonia? • Drug

MiPACQ Expected Answer Types • Diagnostic_procedure • Laboratory_procedure • diagnoses__Rel_crt • Health_care_activity • Therapeutic_or_preventive_procedure • degree_of__Rel_crt • Clinical_drug • Definition • Sign_or_Symptom • Finding • causes__Rel_frt_ba • manifestation_of__Rel_frt • Laboratory_or_test_result • treats__Rel_frt_affects • Disorder

Question Classes: Answer Types

Question Classification • Three ways to build a question classifier • Rule-based • What antibiotics treat pneumonia? • If Q matches “What <drug category> (treat|prevent|…) <disorder>” thenExpected_Answer_Type = Drug • Train a statistical ML algorithm • Features: • Q wh-word (e.g., What) • Headword of 1st NP following wh-word (e.g, antibiotics) • … • Combination

Factoid Q/A

Passage Processing • The output from most IR systems is a ranked set of documents. But they’re ranked by an idiosyncratic set of ranking principles that may or may not be ok for Q/A • And documents aren’t the best unit for QA • So passage processing extracts and reranks shorter units from the larger returned set of documents

Passage Processing • Is a computer-generated QT interval always accurate? • “… ALWAYS check with your personal physician before taking any action regarding your health! It amazes me the doctors can be accurate measuring off the strips. The readings with the * all had the computer generated statement as follows: 'Prolonged QT interval or TU fusion, consider myocardial disease, electrolyte imbalance, or drug effects'”

Passage Processing • The first step is simply a segmentation task. The simplest methods just segment into paragraphs... • But it’s really language, genre, and application dependent. • Passage ranking is used to rank the passages on the basis of the probability that they contain the answer. • For that you need to know the answer type

Passage Ranking • There are three methods to do passage ranking. • Rule-based • Train a statistical ML algorithm • Combination • All use the same kinds of evidence (features).

Passage Ranking – Features • Number of NE’s of the right type • Number of query words • Longest sequence of question words • Rank of the document from which the passage was extracted • Query term proximity • N-Gram overlap

Factoid Q/A

Answer Processing • Extract, reformat and present just the relevant information

Answer Processing • Where is the spleen? • The spleen is located in the upper left part of the abdomen under the ribcage. It works as part of the lymphatic system to protect the body, clearing worn-out red blood cells and other foreign bodies from the bloodstream to help fight off infection.

Answer Processing • What is long QT syndrome? • The Long QT Syndrome is a rare disorder of the heart's electrical system that can affect otherwise healthy people. Although the heart's mechanical function is normal, there are defects in ion channels, which are cell structures in the heart muscle. These electrical defects can cause a very fast heart rhythm (arrhythmia) called torsade de pointes. This abnormal rhythm (a form of ventricular tachycardia) is too fast for the heart to beat effectively, and so the blood flow to the brain falls dramatically, causing sudden loss of consciousness, or syncope (fainting).

Answer Processing • Can I treat postpartum depression with Prozac? • Selective serotonin reuptake inhibitors (SSRIs) are first-line agents and are effective in women with postpartum depression. Use standard antidepressant dosages, eg, fluoxetine (Prozac) 10-60 mg/d, sertraline (Zoloft) 50-200 mg/d, paroxetine (Paxil) 20-60 mg/d, citalopram (Celexa) 20-60 mg/d, or escitalopram (Lexapro) 10-20 mg/d. Adverse effects of this drug category include insomnia, jitteriness, nausea, appetite suppression, headache, and sexual dysfunction. • Yes

CU System • Who won the Nobel Peace Prize in 2002? • Saddam Hussein • Former U.S. President Jimmy Carter has won the 2002 Nobel Peace Prize for his worldwide peace and human rights work in an award seen as criticising Washington's drive to oust Iraq's Saddam Hussein.

Answer Processing • Two basic components... • Rule-based patterns for extracting potential answers • Features/heuristics to rank the extracted answers

Answer Type Features

Evaluation • NIST has been running Q/A evaluations as part of it’s TREC program. Both generic Q/A and application specific (bio- etc.). • Typical metric is Mean Reciprocal Rank. • Assumes that systems return a ranked list of possible answers.

MiPACQ • Multisource Integrated Platform for Answering Clinical Questions

Complex Question Answering • What are the common signs and symptoms of Long QT Syndrome? • What are the side effects of opium-based pain medication? • Is peripheral edema usually experienced after other symptoms in Diabetes? • What actions were taken due to a diagnosis of Long QT Syndrome in patients seen at/by …? • What factors effect the efficacy of …?

QA Dialogue • Clinical research cohort queries • Date, Age, Location and Gender constraints • What patients were diagnosed with Ω primary or systemic or acquired not secondary not localized or Ω peripheral γ and Ωtrophic β or β or α

MiPACQ Clinical QA and Data Mining System – Architecture PoC Clinician or Lab investigator Web UI QA System Manager Question Annotation Keyword Queries Answer Pattern Classificatn Pattern Queries Answer Analysis & Re-ranking Answer Type Classificatn Query Validation Answer Summariztn Result Set Annotation Follow-up Question Preprocsng Query Term Semantic Expansion i2b2 Query Interface Lucene Interface Exclusion/Expansion Analysis ClearTK Unstructured Information Management Architecture (UIMA) LexEVS APIs EDT Query Engine Lucene EDT-specific Exclusion/Expansion EDT-specific Query Validation Data, Features, Annotations LexGrid Data EDT RDB MedPedia ClearTK v0.9.7, UIMA v2.3.0, Eclipse Galileo & Java v1.5

MiPACQ Text Annotation Identifinder general Named Entity Tagger OpenNLP Constituent Parser OpenNLP Coreference Classificatn cTAKES Constituent Parser Interface Predicate Identificatn Verb-based Semantic Role Labeler Entity Coreference Interface Entity Relation Classificatn Paragraph Segmentatn Identifinder Interface Dependncy Parser Event Detection & Classificatn Noun-based Semantic Role Labeler Event Coreference Resolution Event Relation Classificatn Clinical Text Analysis and Knowledge Extraction System (cTAKES) Named Enitity tagging Context-dependent Tokenizer Context Annotator Sentence Boundary Detector Tokenizer Normalizer Part of Speech Tagger Phrasal Chunker Dictionary lookup Annotator Negation Detector ClearTK Unstructured Information Management Architecture (UIMA) Data, Features, Annotations Dictionary cTAKES currently tags Disorders, Drugs, Signs & Symptoms, Anatomy, and Procedures. Should add the Laboratory_or_test_result entity? Identifinder and OpenNLP’s Coreference system and constituency parser will be integrated with ClearTK. All other components will be developed in ClearTK

MiPACQ Architecture

Question Annotation • Can you use Prozac along with lithium? • Interacts-with([Drug](Prozac), [Drug](Lithium))

Evaluation • TREC Q/A evaluation for complex questions • Mean F-measure

Questions • 1). Why do they use log when calculating the IDF(inverse document frequency) • 2). On the query formulation for a Q/A system, in what situations would you leave in question words(where,when,how,etc) in the search?

Questions • Would summarizing questions to a question answering system before processing them lead to better results? • What are some properties of a natural language that would make question answering harder to achieve?

Questions • INFORMATION RETRIEVAL: • Other than stopword removal, stemming, or otherwise limiting the number of terms, what are some practical methods for dealing with an inverted index (keys are terms and values are documents that contain the term) that is too large to store in memory? (The only method I can think of is using a FAST disk-based key-value store (though using Python's built-in implementation is painfully slow))

So far in my implementation of a wikipedia indexer I have: • ~2,239,119 documents (I thought at one point I had 4-million so I need to check on this) • ~8,790,964 unique tokens • ~4,964,995 tokens appearing only 1 time (~56.5%) • ~7,674,338 tokens appearing less than 10 times (~87.3%) which to me indicates I probably have an issue with my tokenizer or wikipedia markup remover. If that is the case, then just pruning the token list should be sufficient, but I would still be interested in how say Google would handle it.

Questions • TERM CLUSTERING: • Term clustering is described on page 777 as clustering the terms found in the corpus around the documents that contain those terms • Does that mean we transpose the term-by-document matrix so that we treat the terms as 'documents' and the documents that contain the term as 'terms'? • Is there a specific clustering method that is best suited (or most widely used) for this kind of clustering (where the number of clusters are not known in advance)?

Questions • 1. Indexing technique is one of the sub-stage of a generic question answering system stage. How does indexing a document really work in the context of using NLP? • 2. When evaluating an answer to a question, is every answer right? or it is just correct to some instances of how much information could be gathered to extract an answer to a given query?

Questions • 1> For factoid questions, it was mentioned that users are not generally happy if you present them with a single word or a phrase as an answer. For example,"What's the height of Mt Everest?""29,028 feet."Now, my question is, is it a fact tested on actual users, or the authors just assumed it to be the case (that users won't be happy with single word answers) ?

Questions • 2> This is a comment. I think latent semantic analysis (as well as latent semantic indexing) and topic models will have important bearing on question answering. Let me explain what I mean. In the sentence/paragraph selection stage, sentences are selected based on saliency (or centrality) and keyword-similarity with query. Now, this saliency or keyword-similarity can either be computed in the vector space of words (terms) as the book mentioned, or it can also be computed in the reduced LSA space (or topic space) that has more "latent" information. At the very least, the latent space can be folded in as one of the features in saliency computation. The book never mentioned this approach, but I think it might be useful.Not only that, you can fold in separate domain-specific corpora to generate separate latent vectors for each word that are rich in domain information, and leverage this extra information for query-focused summarization (i.e. question answering for complex queries).

Questions • 3> When using PageRank on sentence graph to measure centrality of sentences, how are the "flows" between sentences computed ? Also, are these flows unidirectional or symmetric (i.e., bidirectional)?

Questions • 4> How do we answer questions like:"What is that green-blue thing over the cupboard?"You mentioned two medical-domain questions in today's class that were similar to the above question, in the sense that these questions were kind of "imprecise". How do we go about solving these imprecise questions? Do we precisiate them (e.g., by removing deixis, presupposition or implicature), or do we extract keywords from these imprecise questions and use them as a proxy for the original questions?

Questions • 5> This is another general comment. To me, the most appealing feature of a QA system seemed like the fact that you can tie up summarization and QA together to answer more complex queries. However, I think abstractive rather than extractive summarization will be more useful for this purpose.

Questions • 1Q. In a question answering system what exactly does the system notice in the given data in order to decide whether the given data is a question or not. Does it only consider ‘?’ to decide or does it also look for a question starter words like what, when, where etc within the data and then answer the question? • Or in Simpler terms • What if someone types in a statement - that is not at all a question but accidentally puts in a ‘?’ at the end of the statement – in a question answering system. Does the QA system consider this statement to be a question or not?

Questions • 2Q. What is the difference between query formulation and query reformulation. On what factors do we determine which method to be used for question answering?

Questions • 3Q. How are the sentences from a group of documents picked to produce a condensed form of it in multiple document summarization? • 4Q. Suppose that a multiple-document summarization is performed on a group of documents and a summary of all the documents put together is obtained. If suppose we perform another summarization process say single document summarization, will the document be further summarized?

Questions • In question classification, how large should the answer type taxonomy be? • In passage retrieval, can you give examples which ranking method (named entities, question keywords, etc) is best for which cases? • Can you go over how to calculate a sentence centrality.

Natural Language Processing Question Answering

Natural Language Processing Question Answering

Presentation Transcript

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing