1 / 16

Relevance Models and Answer Granularity for Question Answering

Relevance Models and Answer Granularity for Question Answering. W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst. Issues. Formal basis for QA Question modeling Answer granularity Semi-structured data Answer updating. Formal basis for QA.

travis
Télécharger la présentation

Relevance Models and Answer Granularity for Question Answering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst

  2. Issues • Formal basis for QA • Question modeling • Answer granularity • Semi-structured data • Answer updating

  3. Formal basis for QA • Problem – QA systems tend to be ad hoc combinations of NLP, IR and other techniques • Performance can be fragile • Requires considerable knowledge engineering • Difficult to transition between question and answer types • some questions can be answered with “facts”, others require more • Process of determining probability of correct answer very similar to probabilistic IR models • i.e. satisfying information need • better answers º better IR

  4. QA Assumptions? • IR first, then QA • Single answer vs. ranked list • Answer is transformed text vs. collection of source texts • who does the inference? • analyst’s current environment • Questions are questions vs. “queries” • experience at WESTLAW • spoken questions? • Answers are text extracts vs. people

  5. Basic Approach • Use language models as basis for a QA system • build on recent success in IR • test limits in QA • Try to capture important aspects of QA • question types • answer granularity • multiple answers • feedback • Develop learning methodologies for QA • Currently more Qs than As

  6. LM for QA (QALM) • View question text as being generated by a mixture of relevance model and question model • relevance model is related to topic of question • question model is related to form of question • Question models are associated with answer models • Answers are document text generated by a mixture of relevance model and answer model • TREC-style QA corresponds to specific set of question and answer models • Default question and answer models leads to usual IR process • e.g. “what are the causes of asthma?”

  7. Basic Approach Question text Estimate models Question Model Relevance Model Answer Model Language Models Rank answers by probability Answer texts

  8. Relevance Model Query Relevant Documents Relevance Models: idea • For every topic, there is an underlying Relevance Model R • Queries and relevant documents are samples from R • Simple case: R is a distribution over vocabulary (unigram) • estimation straightforward if we had examples of relevant docs • use the fact that query is a sample from R to estimate parameters • use word co-occurrence statistics

  9. q1 israeli q2 palestinian q3 raids w ??? Relevance Model:Estimation with no training data Relevance Model R

  10. sample probabilities M M M Relevance Model: Putting things together Relevance Model q1 israeli q2 palestinian q3 raids w ???

  11. Retrieval using Relevance Models • Probability Ranking Principle: • P(w|R) is estimated by P(w|Q) (Relevance Model) • P(w|N) is estimated by collection probabilities P(w) • Document-likelihood vs. query-likelihood approach • query expansion vs. document smoothing • can be applied to multiple granularities • i.e. sentence, passage, document, multi-document

  12. Question Models • Language modeling (or other techniques) can be used to classify questions based on predefined categories • best features need for each category need to be determined • What is the best process for estimating the relevance and question models? • How are new question models trained? • how many question models are enough?

  13. Answer Models and Granularity • Answer models are associated with question models • Best features need to be determined • What is best process for estimating probability of answer texts given answer and relevance models? • How is answer granularity modeled? • Learn default granularity for question category and backoff to other granularities • Is there an optimal granularity for a particular question/database? • How are answer models trained? • Relevance feedback approach a possibility

  14. Semi Structured Data • How can structure and metadata indicated by markup be used to improve QA effectiveness? • Examples – tables, passages in Web pages • Answer models will include markup and metadata features • Could be viewed as an indexing problem – what is an answer passage? – contiguous text too limited • Construct “virtual documents” • Current demo: Quasm

  15. Answer Updating • Answers can change or evolve over time • Dealing with multiple answers and answers arriving in streams of information are similar problems • Novelty detection in TDT task • Evidence combination and answer granularity will be important components of a solution • Filtering thresholds also important

  16. Evaluation • The CMU/UMass LEMUR toolkit will be used for development • TREC QA data used for initial tests of general approach • Web-based data used for evaluation of semi-structured data • Modified TDT environment will be used for answer updating experiments • Others….

More Related