60 likes | 77 Vues
Explore factors influencing users' repeat queries in web search using Yahoo search logs from 114 users over a year. Techniques include query normalization, SVM classification, and more. Examine the impact of rank changes and memory on re-finding information.
 
                
                E N D
Information Re-Retrieval: Repeat Queries in Yahoo’s Logs Jaime Teevan, Eytan Adar, Rosie Jones, Michael A. S. PottsSIGIR 2007
Motivation • Re-finding information is a common activity of Web search • What is the intention of re-finding information? • What factors favor/indicate user’s re-finding of information?
Dataset • 114 Yahoo users search trace over 1 year (Aug 2004 – July 2005) • 115 queries / trace • Considered as repeat when separated > 30 minutes • 119 volunteers in a controlled experiment • users are asked to repeat one query made 30 mins to 1 hour ago
Techniques used • Normalizing query terms • Capitalization, stop words removal, duplicate words removal, extra white space, stemming • Word order (e.g. “new york department of state” and “department of state new york”) • Non-alphanumerics (e.g. “sub-urban” vs “sub urban”) • Word merge (e.g. “wal mart” vs “walmart”) • Domain (e.g. hotmail vs hotmail.com) • Words swap (e.g. “american embassy london” vs “american consulate london”) • SVM classifier • Applied to predict whether a result will be clicked again
Discovery • Navigation query is one major type of re-finding information • Bank, news, mail • .com, .edu, .net • Rank changes affects re-finding
Discovery • Memory fades • Control experiment30% are mis-remembered (36/119)27 out of 36 are equivalent after normalization • Yahoo Logs  • Indicators of repeat click • # clicks in first query • # clicks in previous query • # unique clicks in previous query