1 / 31

Understanding Temporal Intent of User Query based on Time-based Query Classification

Understanding Temporal Intent of User Query based on Time-based Query Classification. Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日. Outline. Why Temporal Intent Detection? Query Temporal Pattern Taxonomy

markku
Télécharger la présentation

Understanding Temporal Intent of User Query based on Time-based Query Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding Temporal Intent of User Query based on Time-based Query Classification Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013年11月18日

  2. Outline • Why Temporal Intent Detection? • Query Temporal Pattern Taxonomy • Query Pattern Detection Framework • Experiment Results • Application • Conclusion and Future Work

  3. Outline Why Temporal Intent Detection? Query Temporal Pattern Taxonomy Query Pattern Detection Framework Experiment Results Application Conclusion and Future Work

  4. Why Temporal Intent Detection? • Richard McCreadie SIGIR 2013 • Users tend to prefer rankings that integrate tweets or newswire articles soon after an event breaks, and blogs and Wikipedia pages become more useful over time. • Hideo Joho WWW 2013 • 48.2% seek for information about the same day as they perform the search; • 32.7% look for past information; • 8.1% look for future information; • 10.9% say that their information needs do not have specific temporal attributes. Automatic temporal intent detection is very significant for time-sensitive information retrieval, temporal diversity etc.!

  5. Outline Why Temporal Intent Detection? Query Temporal Pattern Taxonomy Query Pattern Detection Framework Experiment Results Application Conclusion and Future Work

  6. DifferentTemporal PatternsImply DifferentTemporal Intents Kulkarni A et al. (WSDM 2011) find some temporal patterns of query through mining query logs. Query frequency Curves from Google Trend However, they do not propose methods to identify those patterns automatically. In this paper, we propose an approach to identify the different temporal patterns automatically.

  7. Query Temporal Pattern Taxonomy Java JDK Haiti Earthquake Earthquake Christmas Present Clearly, we can use spikes to detect query temporal patterns.

  8. What is aSpike? • A spike is a set of continuous points on the query frequency curve that burst singularly. Generally, it represents an event. • Spikes are hard to be detected effectively and precisely. Specially, we found it not effective to learn a cutting line to identify all spikes. Japan earthquake Southeast Asia Earthquake Haiti earthquake Pakistan earthquake Virginia earthquake China earthquake

  9. Outline Why Temporal Intent Detection? Query Temporal Pattern Taxonomy Query Pattern Detection Framework Experiment Results Application Conclusion and Future Work

  10. Query Pattern Detection Framework Query Classification System Preprocess Feature Extraction Classifier (SVM) Query Pattern Query frequency curves Query Log Training Set Query

  11. (1). Preprocess According to time series analysis, any curve contains three components. Trend Component Seasonal Component Random Component This is what we care in this paper. So we should remove Trend Component. Use polynomial regression to model Trend Component.

  12. (1). Preprocess We use Student-t Distribution instead of Gaussian Distribution because we do not have exact training data pair (X, mt). We have to use (X,F) instead. Thus, St and Yt components become noise when training. Student-t Distribution is more robust to noise than Gaussian Distribution. Student-t without noise both work well Gaussian noise From PRML Log likelihood loss function

  13. (1). Preprocess Original Query Curve Trend Component Seasonal & Random Component

  14. (2). Feature Extraction For preprocessed query frequency curves, we define following features. • Mean • Standard Deviation • MR (Max Rate) • SR (Spike Rate) Basic Features • DQoT • DOQ • DAMQ • DPMQ Curve Distance Features • Cutoff • Spikes • PD(Periodic Deviation) Regression Features

  15. MR (Max Rate)

  16. SR (Spike Rate) m is half the period of a spike. OQ QoT MQ

  17. SR (Spike Rate) How to determine the value of m?

  18. Distance between Two Curves Fiq :shifting time series Fi by q time units. || || :the l2 norm. This measure finds the optimal alignment (translation q) and the scaling coefficient α for matching the shapes of the two time series. It is difficult to find the optimum solution. In practice, we shift all possible q to find the approximation solution. Jaewon Yang and Jure Leskovec. Patterns of temporal variation in online media. WSDM, 2011.

  19. Distance between Two Curves Jaewon Yang and Jure Leskovec. Patterns of temporal variation in online media. WSDM, 2011.

  20. DQoT DOQ DAMQ DPMQ • DQoT: Average distance from annotated QoT curves. • DOQ: Average distance from annotated OQ curves. • DAMQ: Average distance from annotated AMQ curves. • DPMQ: Average distance from annotated PMQ curves. Similar to KNN but cost much less time.

  21. Cutoff Spikes PD Above 8 features are combined to learn a cutting off line What about training data? (F, Cutoff) pair is not known. Spikes: Number of spikes …… …… …… We can use annotated pair (F, Pattern Category) to approximate (F, Cutoff). For this curve, because we annotate it as MQ, the cutoff value line in the pink area. …… …… …… …… PD: Measure periodicity

  22. Outline Why Temporal Intent Detection? Query Temporal Pattern Taxonomy Query Pattern Detection Framework Experiment Results Application Conclusion and Future Work

  23. Experiment Results • 5,000 queries from Query Track 07-09 of TREC. • Corresponding query frequency files from Google Trends. • Manually annotate categories of these queries in terms of their frequency curves. • 5-fold PMQ QoT AMQ OQ Classification Performance Comparison for Different Query Categories

  24. Feature Effectiveness Analysis

  25. Outline Why Temporal Intent Detection? Query Temporal Pattern Taxonomy Query Pattern Detection Framework Experiment Results Application Conclusion and Future Work

  26. Application – Temporal Diversity Temporal intents of user query are uncertain, we should diversify the search results in time dimension in order to cover more important time unit of user query. Subtopic Coverage Temporal Intent Coverage Novelty

  27. Application – Temporal Diversity MMR SIGIR’98 xQuAD WWW’10 IA-Select WSDM’09 LM+T+D SIGIR’13 RM+T+S+D Our method

  28. Outline Why Temporal Intent Detection? Query Temporal Pattern Taxonomy Query Pattern Detection Framework Experiment Results Application Conclusion and Future Work

  29. Conclusion • We shift the problem of temporal intents detection to classification problem. • We propose effective features to detect temporal intents effectively. • We imply temporal intents results to temporal diversity and achieve high performance.

  30. Future Work • More Effective Features • Data sparse problem for long queries

  31. Thanks a lot for your attention!

More Related