280 likes | 415 Vues
This document explores the evolution of information retrieval (IR) systems, focusing on keyword-based approaches, their limitations, and the emergence of conceptual structures in modern topical IR. It discusses research at Fondazione Ugo Bordoni, addressing the context, concepts, and tasks requiring knowledge structures. The effectiveness of vector-based models, term weighting methods like tf-idf and BM25, and innovative frameworks are outlined. Challenges of past conceptual IR approaches are reviewed, with a focus on future directions toward enhancing user experience and integrating diverse search strategies.
E N D
Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Roma carpinet@fub.it
Overview • Keyword-based IR and early conceptual approaches • Context and concepts in modern topical IR • Emerging IR tasks requiring knowledge structures • Research at FUB • Conclusions
Documents Query Vectors of weighted keywords Vector of weighted keywords Matching Retrieved documents Vector-based IR
Term weighting • tf.idf and vector space model (Salton) very popular • in70’s and 80’s • BM25 (Robertson) has been the state of the art • in the 90’s • Several recent term-weighting functions based on • statistical language modeling (Ponte, Lafferty) • A new weighting framework based on deviation • from randomness + information gain (FUB + UG)
Inherent limitations of keyword-based IR • Vocabulary problem • Relations are ignored
Early approaches to conceptual IR • n-grams(Salton 1975, Maarek 1989) • parse tree(Dillon 1983, Metzler 1989) • case relations(Fillmore 1968, Somers 1987) • conceptualgraphs(Dick 1991)
Why early conceptual IR not successful • No best representation scheme • Manual coding too costly • Automated coding too hard • Training required both for the indexer and the user • Effectiveness not clearly demonstrated • Retrieval task often not appropriate
Overview • Vector-based IR and early conceptual approaches • Context and concepts in modern topical IR • Emerging IR tasks requiring knowledge structures • Research at FUB • Conclusions
Evolution of topical IR • Very short queries • Heterogeneous collections • Unreliable sources • Interactive sessions
Docs Query Context Indexing Indexing Ranking Visualization Interaction Use Model of modern topical IR
Ranking based on interdocument similarity • Cluster hypothesis (van Rijsbergen 1978) • Approaches • - Matching the query against document clusters (Willet 1988) • - Matching the query against transformed document • representations (GVSM, Wong 1987, LSI, Deerwester 1990) • Computing the conceptual distance between query and • documents (Order-theoretical ranking, Carpineto 2000)
4 KBS 3 1 1 CREDIT 3 KBS BANK FINANCE NNS (D5) 2 NNS 0 4 FINANCE 2 BANK FINANCE CREDIT NN S KBS WATERS KBS BANK (Query) (D6) (D4) 2 3 NNS NNS BANK BANK RIVER ACCOUNT (D2) (D3) 1 1 NNS NNS FINANCE FINANCE CREDIT BANK KBS ACCOUNT (D7) (D1) Order-theoretical ranking
Performance of order-theoretical ranking • Better than hierarchic clustering and comparable to • best matching on the whole collection • Markedly better than both hierarchic clustering and • best matching on non-matching relevant documents • Order-theoretical ranking does not scale up well but • it is synergistic with best matching document ranking
Overview • Vector-based IR and early conceptual approaches • Context and concepts in modern topical IR • Emerging IR tasks requiring knowledge structures • Research at FUB • Conclusions
Question Answering Task: Closed-class questions in unrestricted domains with no guarantee of answer and result possibly scattered over multiple documents
Question Answering • Approach: • Recognize type of queries • Retrieve relevant documents • Find sought entities near question words • Fall back to best-matching passage • retrieval in case of failure
Web Information Retrieval Current tasks: named-entity finding task topic distillation task • Approach: • Use of multiple methods • Combination of results via interpolation and • normalization schemes
XML document retrieval Goal: Use document structure to improve precision and recall of unstructured queries “concerts this weekend at Sofia under 20 euros” • Approaches: • Automatic inference of query structure • Semi-automatic query annotation • Hybrid query languages
Overview • Vector-based IR and early conceptual approaches • Context and concepts in modern topical IR • Emerging IR tasks requiring knowledge structures • Research at FUB • Conclusions
Recommender systems “Related keyword” feature versus Context-dependent query reformulation
Combining text retrieval and text mining with concept lattices Goal Integration of multiple search strategies (querying, browsing, thesaurus climbing, bounding) into a unique Webinterface
Conclusions The use of conceptual structures surfaces in traditional topic relevance retrieval and it is at the heart of many non-topical retrieval tasks Towards conceptual search • Understand term meaning • Adapt to the user • Can translate between applications • Explainable • Capable of filtering and summarization