1 / 22

Behavior-driven clustering of queries into topics

Behavior-driven clustering of queries into topics. Luca Maria Aiello Debora Donato Umut Ozertem Filippo Menczer. CIKM 2011, Glasgow. Granularity levels. Query Session Goal Mission Topic. Concise representation. Aggregation. Meaningful semantics. USER PROFILING IN SEARCH ENGINES.

dixon
Télécharger la présentation

Behavior-driven clustering of queries into topics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Behavior-driven clustering of queries into topics Luca Maria Aiello Debora Donato UmutOzertem FilippoMenczer CIKM 2011, Glasgow

  2. Granularity levels Query Session Goal Mission Topic Concise representation Aggregation Meaningful semantics USER PROFILING IN SEARCH ENGINES CIKM 2011

  3. A search mission can be identified as a set of queries that express a complex search need, possibly articulated in smaller goals A topic is a mental object or cognitive content, i.e., the sum of what can be perceived, discovered or learned about any real or abstract entity. MISSIONS AND TOPICS CIKM 2011

  4. Queries in the same mission Queries in consecutive missions Donato et. al: Do you want to take notes? Identifying research missions in Y! search pad. WWW’10 Taxonomies User behavior and intent Same topic Different topic QUERY STREAM DECOMPOSITION CIKM 2011

  5. MERGING MISSIONS CIKM 2011

  6. Gradient Boosted Decision Tree (GBDT) • Aggregation (min, max, avg, std) of 62 query pair features AUC 0.95 10X cross validation on 500K pairs TOPIC DETECTOR STATS CIKM 2011

  7. Topic detector appliedtopairsofquerysets • O(log|M|·|M|2) (heavily parellelizable) 1. Missions of the same user  supermissions 2. Query sets of different users  higher-level topics GREEDY AGGLOMERATIVE TOPIC EXTRACTION (GATE) CIKM 2011

  8. 40K users EVALUATION 3 months Y! log

  9. URL cover graph 2 • OSLOM community detection algorithm • Weighted undirected graph • Maximizing local fitness function of clusters • Automatic hierarchy detection Lancichinetti et. al: Finding statistically significant communities in networks. PLoS ONE, 2011. EVALUATION: BASELINE CIKM 2011

  10. Fraction of queries considered in the clustering phase GATE: 1 OSLOM 0.2 URL cover graph connected components size distribution EVALUATION: QUERY SET COVERAGE CIKM 2011

  11. Fraction of queries that remains isolated in singleton GATE: 0.55-0.27 OSLOM 0.88 EVALUATION: SINGLETON RATIO CIKM 2011

  12. Topics aggregated in two consecutive steps or levels GATE: 500k OSLOM:100K EVALUATION: AGGREGATION ABILITY CIKM 2011

  13. Coverage • Number of unique clicked URLs for the query • Purity • Average pointwise mutual information of pairs of query-related relevant terms • Relevant terms are extracted from top clicked results using a predefined dictionary EVALUATION: PURITY vs. COVERAGE CIKM 2011

  14. EVALUATION: PURITY vs. COVERAGE CIKM 2011

  15. EVALUATION: PURITY vs. COVERAGE CIKM 2011

  16. User PROFILING

  17. Missions Topic Detector Topics User topical profile 1.9 0.0 0.7 3.2 0.0 0.41 0.0 2.9 0.24 0.35 USER PROFILING FROM TOPICS CIKM 2011

  18. Sequenceofmissionsof the profileduser vs. sequenceof a randomone • Sequence-profile match usingtopic detector • Success: 0.65 (0.72 lessfrequent, 0.55 mostfrequent) PROFILES FOR “PREDICTION” CIKM 2011

  19. New behavior-driven notion of topics • Bottom-up topic extraction algorithm • Favorable comparison with graph-based clustering • Effective user profiling • Other baselines • More accurate predictions CONCLUSIONS CIKM 2011

  20. ACKNOWLEDGMENTS FilMenczer Prof. Informatics @ IU Director CNetS @IU EmreVelisapaoglu Yahoo! SearchSciences Yahoo! Labs @ Sunnyvale UmutOzertem Yahoo! SearchSciences Yahoo! Labs @ Sunnyvale Debora Donato Yahoo! SearchSciences Yahoo! Labs @ Sunnyvale

  21. Taxonomies User behavior and intent CIKM 2011

More Related