1 / 17

Inf 722 Information Organisation

Inf 722 Information Organisation. Class notes: Information Retrieval Jagdish S. Gangolly. FOA Process. FOA Process Asking a question (Query formulation) Constructing an answer (retrieval algorithms) Assessing the answer (feedback on relevance). FOA Process. Query language

azra
Télécharger la présentation

Inf 722 Information Organisation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inf 722 Information Organisation Class notes: Information Retrieval Jagdish S. Gangolly Inf 722 Information Organisation (Fall 2007) (Gangolly)

  2. FOA Process • FOA Process • Asking a question (Query formulation) • Constructing an answer (retrieval algorithms) • Assessing the answer (feedback on relevance) Inf 722 Information Organisation (Fall 2007) (Gangolly)

  3. FOA Process • Query language • Natural or artificial • Vocabulary • Syntax: operators, arguments • Query expansion, specialization, disambiguation, relevance feedback Inf 722 Information Organisation (Fall 2007) (Gangolly)

  4. FOA Process • Constructing the answer • Information need accurately translated in the query? • How to provide answer in a form suitable to the user? • Provide background to the user so (s)he can verbalise the information need better? • How to represent the query as well as the corpus efficiently and effectively Inf 722 Information Organisation (Fall 2007) (Gangolly)

  5. FOA Process • Constructing the answer (Contd) • Generate a set of index terms which render the documents in the collection as different as possible • Conflation algorithms • Removal of function/fluff/stop words (usually from closed class words) • Stripping suffixes (lemmatization) • Detection of equivalent/associated words Inf 722 Information Organisation (Fall 2007) (Gangolly)

  6. FOA Process • Facets of documents: • Structure (dtd) • Format (css, xsl) • Content (xsd) • Unit of interest • Tagging of corpora • content tagging, grammatical tagging Inf 722 Information Organisation (Fall 2007) (Gangolly)

  7. FOA Process • Step 1: Selection of corpora to build • Population from which documents to be included are selected (domain, genre,..) • Step 2: Selection of Tagging, if necessary • grammatical or other tagging schemes Inf 722 Information Organisation (Fall 2007) (Gangolly)

  8. FOA Process • Step 3: Indexing • Index: doci {kwj} • Index-1: {kwj}  doci • Extracting lexical features: • Step a: Selection of tokens, separators • Step b: Stemming decisions on number, gender (for some languages), hyphenation, phrases, idioms, morphological features,… • Step c: Removal of stop words using a list Inf 722 Information Organisation (Fall 2007) (Gangolly)

  9. FOA Process • Use of Zipf’s Law in indexing Inf 722 Information Organisation (Fall 2007) (Gangolly)

  10. FOA Process • Zipf’s Law Inf 722 Information Organisation (Fall 2007) (Gangolly)

  11. FOA Process • Explanations of Zipf’s Law • Zipf: Principle of Least Effort • Mandelbrot: A more general version of Zipf law, and the similarity with cantor dust (fractals) Inf 722 Information Organisation (Fall 2007) (Gangolly)

  12. FOA Process • Word occurrences as Poisson process and the detection of stop words Inf 722 Information Organisation (Fall 2007) (Gangolly)

  13. FOA Process • Resolving power of words in discrimination between documents • relationship between word frequencies and word significance (non function words), I.e., words are more frequently used to signify their importance • To be index terms, words must help discriminate between documents Inf 722 Information Organisation (Fall 2007) (Gangolly)

  14. FAO Process • Precision v. Recall Inf 722 Information Organisation (Fall 2007) (Gangolly)

  15. FOA Process • Specificity v. Exhaustivity • An index is specific if it reflects the information needs of the users • An index is exhaustive if it reflects all topics covered by the documents • There is tension between the two Inf 722 Information Organisation (Fall 2007) (Gangolly)

  16. FOA Process • word frequency: the number of times that a word is used in a document • inverse document frequency: the number of documents in the corpus in which a word is used. • Robertson - Sparck-Jones weighting Inf 722 Information Organisation (Fall 2007) (Gangolly)

  17. Vector Space Model Vector Space model: Inf 722 Information Organisation (Fall 2007) (Gangolly)

More Related