1 / 25

A Language Modeling Approach for Temporal Information Needs

A Language Modeling Approach for Temporal Information Needs. Klaus Berberich, Srikanta Bedathur, Omar Alonso1, and Gerhard Weikum Max-Planck Institute for Informatics, Saarbrücken , Germany. Motivation. Documents contain temporal information in the form of temporal expressions. Motivation.

zita
Télécharger la présentation

A Language Modeling Approach for Temporal Information Needs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Language Modeling Approach forTemporal Information Needs Klaus Berberich, Srikanta Bedathur, Omar Alonso1, and Gerhard Weikum Max-Planck Institute for Informatics, Saarbrücken, Germany

  2. Motivation • Documents contain temporal information in the form of temporal expressions

  3. Motivation • Users have temporal information needs • Query: Prime Minister United Kingdom 2000 PROBLEM • Traditional information retrieval systems do not exploit the temporal content in documents • Inherent uncertainty Temporal expressions are more than common terms APPROACH Integrate temporal dimension into a language model based retrieval framework

  4. Language Modeling for Information Retrieval • Language Model: A statistical model to generate text • Language Modeling: The task of estimating the statistical parameters of a language model • Language Modeling for IR: Problem of estimating the likelihood that a query and a document could have been generated by the same language model

  5. Document Model • Document d = {dtext , dtemp} • dtext: a bag of textual terms • dtemp : a bag of temporal expressions • A temporal expression is considered as the quadrupule T = (tbl, tbu, tel, teu) • Lower and upper bounds for begin and end time • “in 1998” (1998/01/01, 1998/12/31, 1998/01/01, 1998/12/31) • T can refer to any time interval [b, e] with the constraint b ≤ e

  6. Query Model • Query q = { qtext, qtemp } • qtext : set of textual terms • qtemp : set of temporal expressions • prime minister united kingdom 2000 • Two modes based on how temporal expressions from the query input • Inclusive mode qtext = {prime, minister, united, kingdom, 2000} • Exclusive mode qtext = {prime, minister, united, kingdom} qtext qtemp

  7. Query-Likelihood Approach • Idea: Assign higher relevance to a document, if it contains more temporal expressions that match more closely to the temporal expressions from the user‘s query • Assume that qtext and qtemp are produced independently • Query temporal expressions occur independently

  8. Generation of temporal expressions from document d • Temporal expression ‘T’ is drawn at uniform random from the temporal expressions in the document • Zero probability problem • if one of the query temporal expressions has zero probability of being generated from the document, the probability of generating the query from this document is zero

  9. Jelinek-Mercer smoothing • linear interpolation of the maximum likelihood model with the collection model, using a coefficient λ Temporal part of the document collection treated as a single document Tunable mixing parameter

  10. Requirements for the Generative Model • Specificity • A query temporal expression is more likely to be generated from a temporal expression that closely matches it • Q : “from the 1960s until the 1980s” • T: “in the second half of the 20th century” • T’: “in the 20th century”

  11. Requirements for the Generative Model • Coverage • A larger overlap with the query temporal expression is preferred • Q: “in the summer of 1999” • T: “in the first half of 1999” • T’: “in the second half of 1999”

  12. Requirements for the Generative Model • Maximality • P(Q|T) should be maximal for T = Q • Probability of generating a query temporal expression from a temporal expression matching it exactly must be the highest

  13. Uncertainty Ignorant Language Model • A temporal expression can only generate itself • Misses the fact that temporal expression T and query temporal expression Q may refer to same time interval even though Value assumes 1 iff T= Q

  14. Uncertainty-aware Language Model • Approach assumes equal likelihood for each time interval [qb, qe] that Q can refer to • Generating time interval [qb, qe] from temporal expression T value = 1 iff

  15. Uncertainty-aware Language Model • Q and T inherently uncertain • Model assumes equal likelihood for all possible time intervals that Q and T can refer to

  16. Efficient Computation • Enumerating all time intervals T and Q can refer to, before computing not practical • Temporal expression • if • |T| can be computed as • Begin point is compatible with end point e

  17. If , compute |T| as • Captures only end points are compatible with a fixed begin point b

  18. Query Temporal expression • For computing each time interval • can be computed by considering

  19. Experimental Evaulation • Lm( ) : Unigram language model with Jelinek-Mercer smoothing • LmT-IN( , ) : Uncertainty-ignorant method using inclusive mode • LmT-EX( , ) : Uncertainty-ignorant method using exclusive mode • LmtU-IN( , ) : Uncertainty-aware method using inclusive mode • LmtU-EX( , ) : Uncertainty-aware method using exclusive mode

  20. Queries Categorized by Topic and Temporal Granularity • New York Times Annotated Corpus

  21. Future Work • Current Focus: Temporal information needs disclosed by an explicit temporal expression in the user’s query • Future: Dealing with queries that have implicit temporal intent • Query: “bill clintonarkansas” alludes to Bill Clinton’s time as Governor of Arkansas between 1971 and 1981

  22. Extra Slides • Input: bostonjuly 4 2002 • Inclusive mode: qtext = {boston, 4, 2002} • Exclusive mode: qtext = {boston}

  23. How does Bing classify queries as temporal? New Patent: Time-Shift: how Bing changes results based on temporal events • If the frequency of searches for certain queries increases, this can indicate a temporal nature if this happens around certain dates. • Increases of mentioned in blogs, microblogging services and increased updates in online encyclopedias can indicate a temporal nature. • A sharp increase in a click-through rate, abandonment, or reformulation of a query may indicate that a meaning for the query has shifted temporally. • To find out if a search query is still temporal, Bing might sometimes show alternative results to a number of searchers to see how they interact with the results.

  24. Query-Likelihood Approach • Ponte and Croft‘s Model • Each document has a language model associated • Query is a random process • Documents are ranked according to the likelihood that the query would be generated by the language model estimated for each document • Unigram language model with Jelinek-Mercer smoothing

More Related