1 / 27

Context-Aware Query Classification

Context-Aware Query Classification. Huanhuan Cao 1 , Derek Hao Hu 2 , Dou Shen 3 , Daxin Jiang 4 , Jian-Tao Sun 4 , Enhong Chen 1 and Qiang Yang 2 1 University of Science and Technology of China, 2 Hong Kong University of Science and Technology, 3 Microsoft Corporation

gary-dean
Télécharger la présentation

Context-Aware Query Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Context-Aware Query Classification Huanhuan Cao1, Derek Hao Hu2, Dou Shen3, Daxin Jiang4 , Jian-Tao Sun4 , Enhong Chen1 and Qiang Yang2 1University of Science and Technology of China, 2Hong Kong University of Science and Technology, 3Microsoft Corporation 4Microsoft Research Asia

  2. Motivation • Understanding Web user's information need is one of the most important problems in Web search. • Such information could generally help improving the quality of many Web search services such as: • Ranking • Online advertising • Query suggestion, etc.

  3. Challenges • The main challenges of query classification: • Lack of feature information • Ambiguity • Multiple intents • The first problem has been studied widely: • Query expansion by top search results • Leverage a web directory • However, the second and the third problems are far away from being closed.

  4. Why context is useful? • Context means the previous queries and clicked URLs in the same session given a query. • It’s assumed that: • Context has semantic relation with the current query. • Context may help to label appropriate categories for current query. • It makes sense to exploit context for specifying the current query.

  5. Example

  6. Example

  7. Example

  8. Overview • Problem statement • Model query context by CRF • Features of CRF • Experiment • Conclusion and future work

  9. Problem Statement: Context • In a user search session, suppose the user has raised a series of queries as q1q2…qT-1and clicked some returned URLs U1U2…UT-1; • If the user raises a query qTat time T, we call q1q2…qT-1 and U1U2…UT-1 as query context of qT • And we call qt t (t ∈[1, T - 1])as contextual queries of qT .

  10. Query Context Query Context of {Q_T}

  11. Problem Statement: QC with context and Taxonomy • The objective of query classification (QC) with context is to classify a user query qTinto a ranked list of K categoriescT1, cT2, ..., cTK, among Nccategories{c1,c2,…,cNc}, given the context of qT . • A target taxonomy Υ is a tree of categories where {c1,c2,…,cNc}are leaf nodes of this tree.

  12. Modeling Query Context by CRF where q represents q1q2…qt

  13. Why CRF? • The two main advantages of CRF are: • 1) It can incorporate general feature functions to model the relation between observations and unobserved states; • 2) It doesn't need prior knowledge of the type of conditional distribution. • Given 1), we can incorporate some external web knowledge. • Given 2), we don’t need any assumptions of the type of p(c|q).

  14. Features of CRF • When we use CRF to model query context, one of the most important part is to choose effective feature functions. • We should consider: • Relevance between queries and category labels for leveraging local information of queries; • Relevance between adjacent labels for leveraging contextual information.

  15. Relevance between queries and category labels • Term occurrence • The terms of qtare obvious features for supporting ct • Due to the limited size of training data, many useful terms indicating category information may be uncovered. • General label confidence • Leverage an external web directory such as Google Directory; • where M meansthe number of returned results and Mct,qt means the number of returned results with label ct after mapping.

  16. Relevance between queries and category labels • Click-aware label confidence • Combining the click-information with the knowledge of a external web directory; • CConf(ct ,ut) can be calculated by multiple approaches. • Here, we use VSM to calculate cosine similarity between term vectors of ct and ut

  17. Relevance between Adjacent Labels • Direct relevance between adjacent labels • Occurrence of adjacent label pair <ct-1,ct> • The weight implies how likely the two labels co-occur • Taxonomy based relevance between adjacent labels • Limited by the sampling approach and size of the training data, some reasonable adjacent label pairs may not occur proportionally or even not occur at all. • Consider indirect relevance between adjacent labels by considering the taxonomy.

  18. Experiment • Data set: • 10,000 random selected sessions from one day’s search log of a commercial search engine. • Three labelers firstly label all possible categories with KDDCUP’05 taxonomy for each unique query of the training data.

  19. Examples of multiple category queries A large ratio of multiple category queries implies the difficulty of QC without context.

  20. Label Sessions • Then the three human labelers are asked to cross label each session of the data set with a sequence of level-2 category labels. • For each query, a labeler gives a most appropriate category label by considering: • Query itself; • The query context; • Clicked URLs of the query.

  21. Tested Approaches • Baselines: • Non context-aware baseline: Bridging classifier(BC) proposed by Shen et al. • Naïve context-aware baseline: Collaborating classifier(CC). Combine a test query and the previous query to classify with BC. • CRFs: • CRF-B: CRF with basic features including term occurrence, general label confidence and direct relevance between adjacent labels) • CRF-B-C: CRF with basic features + click-aware label confidence) • CRF-B-C-T: CRF with basic features + click-aware label confidence + taxonomy based relevance)

  22. Evaluation Metrics • Given a test session q1q2…qT, we let the qTbe the test query and let queries q1q2…qT-1 and corresponding clicked URL sets U1U2…UT-1 be the query context. • For qT ,we evaluate a tested approach by: • Precision(P): δ(cT ∈ CT,K)/K • Recall(R): δ(cT ∈ CT,K) • F1 score(F1 ): 2*P*R/(P+R) Where cT meansthe ground truth label and CT,K means a set of the top K labels. δ(*) is a Boolean function of indicating whether * is true (=1) or false (=0).

  23. Overall results 1) The naïve context-aware baseline consistently outperforms the non context-aware baseline. 2) CRFs consistently outperform the two baselines. 3) CRF-B-C-T > CRF-B-C > CRF-B: click information and taxonomy based relevance are useful.

  24. Case study Context about travel Click a travel guide web page Give the most appropriate label in the first position

  25. Efficiency of Our Approach • Offline training: • Each iteration takes about 300ms • Time cost of training a CRF is acceptable • Online cost: • Calculating features • Label confidence

  26. Conclusion and Future work • In this paper, we propose a novel approach for query classification by modeling query context via CRFs. • Experiments on a real search log clearly show that our approach outperforms a non context-aware baseline and a naive context-aware baselines. • Current approach cannot leverage the contextual information of the beginning queries of sessions, which make us carry on our following researches for leveraging more contextual information out of sessions.

  27. Thanks

More Related