130 likes | 143 Vues
CS 533 – 5 min. Presentations M. Sami Arpa Enes Taylan
E N D
CS 533 – 5 min. Presentations M. Sami Arpa Enes Taylan Amit Singhal, Chris Buckley, and Mandar Mitra. 1996. Pivoted document length normalization, In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '96). ACM, New York, NY, USA, 21-29. DOI=10.1145/243199.243206 http://doi.acm.org/10.1145/243199.243206
Pivoted Document Normalization Subject: Automatic information retrieval systems work with documents of varying lengths in a text collection.
Pivoted Document Normalization Problem: Long documents have advantage in retrieval over the short documents because of: - Higher term frequencies - More terms
Pivoted Document Normalization Previous Solutions: Document length normalization, - Provides fairly retrieving documents of all lengths. - Cosine normalization - Maximum tf normalization - Byte length normalization
Pivoted Document Normalization Problem with Previous Solutions: Probability of retrieval and probability of relevance has different slopes, because of normalization factor.
Pivoted Document Normalization New approach: Pivoted Document Normalization
Pivoted Document Normalization Likelihood of relevance and retrieval: - Order documents in a collection by their lengths - Divide them into several equal sized “bins” - Compute probability of a randomly selected relevant/retrieved document belonging to a certain bin.
Pivoted Document Normalization Pivoted Normalization Scheme: - “The probability of retrieval of a document is inversely related to the normalization factor.” - To increase the chances of some documents to be retrieved, decrease the value of norm. factor or opp.
Pivoted Document Normalization Method: - Use a previous normalization method (like cosine or byte size) to initially retrieve some documents. - Find a tilting amount from previous normalization
Pivoted Document Normalization Method:
Pivoted Document Normalization Results:
Pivoted Document Normalization • Conclusion: • If documents of different lenghts are retrieved with equal chances, retrieval effectivess increases. • Pivoted normalization technique could make previously developed normalization techniques more powerful .