Enhancing Information Retrieval through Query Operations

Introduction • Query operations attempt to reformulate the original query in order to enhance the efficacy of IR • Query operations typically involve • Expanding the original query • Reweighing the terms in the expanded query • Using feedback from user actions and/or analysis of local/global document set

Query OperationsPaper 1: http://doi.acm.org/10.1145/243199.243202Paper 2:http://doi.acm.org/10.1145/511446.511489 Janak Mathuria

Motivation • Word mismatch • Vocabularies of users and authors may vary • Users often use different words to retrieve information than the authors to express the concepts • Short Queries • Users typically submit very short queries to retrieve information; average query size on the web is less than two words • The keywords in the query alone may not suffice to retrieve relevant information

Traditional Approaches • Relevance feedback • Perhaps the most popular • Expansion and term reweighing based on user selecting certain retrieved documents as relevant • Easy to understand; provides controlled process to emphasize or de-emphasize terms • Automatic local analysis • Documents retrieved for the original query are used to expand the query • Thus all retrieved documents are considered relevant • Requires access to the document contents and not just the term indices

Traditional Approaches • Automatic global analysis • Use global context of a concept to determine similarities between concepts • Concepts can be noun groups (up to ‘n’ adjacent nouns) • Context can be limited to within a certain vicinity of the concept or the entire document • A ranked list of phrasal concepts is generated for a query • A certain number of the top-ranked phrasal concepts are added to expand the query

Local Context Analysis • Proposed by Xu and Croft in the paper Query Expansion Using Local and Global Document Analysis (1996) • Combines global and local analysis as follows • Considers passages rather than entire documents since documents may be very long and may contain multiple concepts that may not be related • Standard IR system is used to retrieve a certain number of top-ranked passages • Ranks each concept in the top-ranked passages in terms of similarity with the query using a form of tf-idf ranking • Uses a certain number of top-ranked concepts to expand the query • Uses context (in the form of passages) and concepts as in global analysis • Uses top-ranked passages for query expansion as in local analysis

LCA Results • Experiments indicate that in most cases LCA performs better than both global and local analysis • Improvement using global analysis were in the range of 5%-10% • Improvement using local analysis were in the range of 15%-20% • Improvement using LCA were in the range of 20%-25% • Certain concepts may be filtered out by global analysis as they may occur very frequently in the corpus though they may not occur frequently in the local document set • Surprisingly very little overlap was found between the expansion terms using local analysis and LCA

Expansion using Query Logs • Proposed by Cui, Wen, Nie and Ma in the paper Probabilistic Query Expansion Using Query Logs (2002) • Query logs contain information about what queries were posed and what documents from the resulting set were selected (as in “clicked”) by the user • QL is based on the assumption that • Documents selected by the user were relevant to the query • Terms in these documents are strongly related to the terms in the query • Computes the co-relation between the query terms and document terms • Co-relations are calculated for concepts as well as words • Uses the co-relations to expand queries

Expansion Using Query Logs • Exhibits the following important properties • Term co-relations can be computed offline • Term co-relations reflect the preference of most users • Term co-relations evolve along with the accumulation of query logs • Results from the experiments conducted by the authors indicate that • Improvements using LCA were in the range of 20%-25%; this is consistent with the finding by Xu and Croft • Improvements using QL were in the range of 70%-75%

Similarities • Both attempt to improve retrieval through query expansion • Both are variations of relevance feedback and can be fully automated • LCA automates relevance feedback by not taking user selection into account at all; it considers all retrieved passages as relevant • QL automates relevance feedback by taking into account historical user selection to determine relevant documents • Both approaches make use of noun phrases as concepts • QL uses word as well as concept co-relations • LCA uses only concept co-relations

Differences • QL uses information outside the corpus for query expansion • QL is adaptive • If the corpus remains static, LCA will yield the same results; QL will adapt to yield different results • QL captures changes in co-relations over time more precisely • QL requires minimal computations at run-time and does not require access to documents • LCA requires computations as well as access to documents at run-time • QL approach does not use context for query expansion

Critique • The results indicate that QL compares favorably to LCA but • The comparisons appear to be based on queries that have large amounts of query logs available • LCA will perform better than QL when adequate query logs are not available • In QL, more commonly used co-relations may cannibalize less commonly used co-relations • This may not be material for commercial IR systems but may be material from a pure IR point of view • The assumption that selected documents are relevant may not always hold • A distinction should probably be made between documents selected as in clicked vs. documents selected and read

Conclusions • By making use of information outside the corpus, QL does improve over traditional automated query expansion approaches • QL appears to be easier to understand as well as easier to implement • In the same way that LCA borrows concepts from global analysis to extend local feedback, it may be worthwhile to extend QL by borrowing the following concepts from global analysis • Use context in calculating word and concept co-relations • Instead of relying entirely on the availability adequate query logs, use global analysis to obtain initial concept co-relations and then refine these co-relations as more query logs are available

Enhancing Information Retrieval through Query Operations

Enhancing Information Retrieval through Query Operations

Presentation Transcript

Introduction to introduction to introduction to … Optimization

INTRODUCTION/ INTRODUCTION

Introduction

INTRODUCTION

Introduction

Introduction