170 likes | 296 Vues
This resource delves into the fundamentals of Classical Information Retrieval (IR), focusing on textual information retrieval systems developed over decades, especially in the context of the World Wide Web. It covers key topics such as document representation, user queries, ranking algorithms, and the interaction between users and IR systems. The text highlights the significance of IR models, evaluation techniques, and system architecture while providing insights into effective searching, routing, and browsing within digital libraries and multimedia environments.
E N D
Internet Resources Discovery (IRD) Classic Information Retrieval (IR) T.Sharon - A.Frank
Classical IR • Deals with Textual Information Retrieval. • Exists for a few decades, mostly for text repositories. • Pushed strongly with the development of the WWW for search engines. T.Sharon - A.Frank
IR Topics and their Relationships APPLICATIONS FOR IR HUMAN COMPUTER-INTERACTION FOR IR TEXTUAL IR Retrieved Models &Evaluation BibliographicSystems Interfaces &Visualization Improvements onRetrieval The Web EfficientProcessing Digital Libraries Multimedia Modeling& Searching IR Vocabulary: http://www.cs.jhu.edu/~weiss/glossary.html T.Sharon - A.Frank
Basic Architecture of an IR System Documents Queries Document Representation Query Representation Comparison T.Sharon - A.Frank
Interaction of the User with the IR System Retrieval database Browsing T.Sharon - A.Frank
What is a Query? • Input: • query terms/words, should appear in the text • possibly conditions between them • Output: • relevant documents • possibly ranked T.Sharon - A.Frank
Information Retrieval Systems • Generic information retrieval system select and return to the user desired documents from a large set of documents in accordance with criteria specified by the user. • Retrieval Functions • document search (ad-hoc)the selection of documents from an existing collection of documents. • document routing (filtering)the dissemination of incoming documents to appropriate users on the basis of user interest profiles. T.Sharon - A.Frank
The Process of Retrieving Information Text UserInterface Text Userneed Text Operations Logical view QueryOperations Indexing DB ManagerModule Userfeedback Inverted file Searching Index Retrieved docs TextDatabases Ranking T.Sharon - A.Frank
IR Ranking • Ranking algorithms • The central problem regarding IR systems is the issue of predicting which documents are relevant and which are not. • Ranking algorithms are at the core of IR systems. • A ranking algorithm operates on basic premises regarding document relevance according to distinct IR model. T.Sharon - A.Frank
A Taxonomy of IR Models Set Theoretic Classic Models Fuzzy Extended Boolean Boolean Vector Probabilistic User Task Retrieval: Search Routing Algebraic Structured Models Generalized Vector Latent Semantic Index Neural Networks Non-Overlapping Lists Proximal Nodes Browsing Browsing Probabilistic Inference Network Belief Network Flat Structure Guided Hypertext T.Sharon - A.Frank
Retrieval Models Associations Logical View of Documents U S E R T A S K T.Sharon - A.Frank
Query Language (1) • Keyword-based Querying • Single-word Queries • Context Queries • Phrase • Proximity • Boolean Queries • Natural Language T.Sharon - A.Frank
Query Language (2) • Pattern Matching • Words • Prefixes • Suffixes • Substring • Ranges • Allowing errors • Regular expressions T.Sharon - A.Frank
Query Language (3) • Structural Queries • Form-like fixed structures • Hypertext structure • Hierarchical structure T.Sharon - A.Frank
Structural Queries • form-like fixed structure, (b) hypertext structure, and (c) hierarchical structure T.Sharon - A.Frank
Hierarchical Structure An example of a hierarchical structure: the page of a book, its schematic view, and a parsed query to retrieve the figure T.Sharon - A.Frank