Simple, Proven Approaches To Text Retrieval

Simple, Proven Approaches To Text Retrieval S. E. Robertson & K.Sparck Jones Presenters: Tuncer Turhan Yakup Korkmaz Ömer Köksal

Term matching / weighting • Terms and matching • Sources for term weighting • Collectionfrequency n = the number of documents term t(i) occurs in N = the number of documents in the collection CFW (i) = log N - log n • Termfrequency TF (i,j) = the number of occurrences of term t(i) in document d(j)

Term matching / weighting • Sources for term weighting (Continued) • Documentlength DL (j) = the total of term occurrences in document d(j) NDL (j) = (DL (j)) / (Average DL for all documents) • Combiningtheevidence CW (i,j) = [ CFW (i) * TF (i,j) * (K1+1) ] / [ K1 * ( (1-b) + (b * (NDL (j)) ) ) + TF (i,j) ] K1 and b are tuning constants.

Iterative searching • Relevanceweights r = the number of known relevant documents term t(i) occurs in R = the number of known relevant document for a request RW (i) = log [ ( (r+0.5)(N-n-R+r+0.5) ) / ( (n-r+0.5)(R-r+0.5) ) ] • Queryexpansion OW (i) = r * RW (i)

Iterative searching • Iterative combination CIW (i,j) = [ RW (i) * TF (i,j) * (K1+1) ] / [ K1 * ( (1-b) + (b * (NDL (j)) ) ) + TF (i,j) ]

Details - Elaborations • Firstrequests • Longerqueries QACW (i) = QF(i) * CW(i,j) QACIW (i) = QF(i) * CIW(i,j) • Elaborations

Thank you for listening…

Simple, Proven Approaches To Text Retrieval

Simple, Proven Approaches To Text Retrieval

Presentation Transcript

Introduction to Text Retrieval

Text Based Information Retrieval - Text Mining

Information Retrieval and Text Mining

CS276A Text Retrieval and Mining

CS276A Text Retrieval and Mining

CS276A Text Retrieval and Mining

CS276A Text Retrieval and Mining

Simple approaches to difficult topics

Active Learning in Text Retrieval

Visualization in Text Information Retrieval

CS276A Text Retrieval and Mining

Term Weighting approaches in automatic text retrieval.

Conventional Text-Retrieval Systems

Conventional Text-Retrieval Systems

Text-retrieval Systems

IFT6255: Information Retrieval Text classification

Structured Text Retrieval Models

CS276A Text Retrieval and Mining

Conventional Text-Retrieval Systems

Text retrieval systems