40 likes | 165 Vues
This document discusses the implementation of the Vector Space Model (VSM) in Information Retrieval and how Term Association (TA) can enhance it. It focuses on the query "microsoft corporation" and illustrates the concept of Inverted Lists, where documents are associated with keywords rather than vice versa. The union and intersection of keyword lists are used to retrieve relevant documents efficiently. The paper emphasizes the trade-off between space occupied in data structures and retrieval efficiency, providing insights into organizing documents and keywords effectively in a VSM framework.
E N D
Implementation of Vector Space Model March 27, 2006
How TA Can Be Used in Vector Space Model? • Let consider a query with keyword microsoft and corporation, q = (microsoft, corporation) • Create table for each keyword, e.g., • These lists are called “Inverted Lists” docid Tfmicosoft * Idfmicrosoft docid Tfcorporation * Idfcorporation 374 289 43.2 42.1 175 456 23.5 20.1 - Space occupied = O(# of non-zero entries in the matrix) - So its not cheap in terms of space
How TA Can Be Used in Vector Space Model? • Inverted List • In original database words are generated for given documents • In Inverted List, documents are generated for given words; that’s why this is called Inverted List
How TA Can Be Used in Vector Space Model? • Inverted List • Union of Listmicrosoft and Listcorporation • Keep list sorted by document id • Intersection of Listmicrosoft and Listcorporation • Arrange keywords from more specific to the least