250 likes | 380 Vues
A Language Modeling Approach for Temporal Information Needs. Klaus Berberich, Srikanta Bedathur, Omar Alonso1, and Gerhard Weikum Max-Planck Institute for Informatics, Saarbrücken , Germany. Motivation. Documents contain temporal information in the form of temporal expressions. Motivation.
E N D
A Language Modeling Approach forTemporal Information Needs Klaus Berberich, Srikanta Bedathur, Omar Alonso1, and Gerhard Weikum Max-Planck Institute for Informatics, Saarbrücken, Germany
Motivation • Documents contain temporal information in the form of temporal expressions
Motivation • Users have temporal information needs • Query: Prime Minister United Kingdom 2000 PROBLEM • Traditional information retrieval systems do not exploit the temporal content in documents • Inherent uncertainty Temporal expressions are more than common terms APPROACH Integrate temporal dimension into a language model based retrieval framework
Language Modeling for Information Retrieval • Language Model: A statistical model to generate text • Language Modeling: The task of estimating the statistical parameters of a language model • Language Modeling for IR: Problem of estimating the likelihood that a query and a document could have been generated by the same language model
Document Model • Document d = {dtext , dtemp} • dtext: a bag of textual terms • dtemp : a bag of temporal expressions • A temporal expression is considered as the quadrupule T = (tbl, tbu, tel, teu) • Lower and upper bounds for begin and end time • “in 1998” (1998/01/01, 1998/12/31, 1998/01/01, 1998/12/31) • T can refer to any time interval [b, e] with the constraint b ≤ e
Query Model • Query q = { qtext, qtemp } • qtext : set of textual terms • qtemp : set of temporal expressions • prime minister united kingdom 2000 • Two modes based on how temporal expressions from the query input • Inclusive mode qtext = {prime, minister, united, kingdom, 2000} • Exclusive mode qtext = {prime, minister, united, kingdom} qtext qtemp
Query-Likelihood Approach • Idea: Assign higher relevance to a document, if it contains more temporal expressions that match more closely to the temporal expressions from the user‘s query • Assume that qtext and qtemp are produced independently • Query temporal expressions occur independently
Generation of temporal expressions from document d • Temporal expression ‘T’ is drawn at uniform random from the temporal expressions in the document • Zero probability problem • if one of the query temporal expressions has zero probability of being generated from the document, the probability of generating the query from this document is zero
Jelinek-Mercer smoothing • linear interpolation of the maximum likelihood model with the collection model, using a coefficient λ Temporal part of the document collection treated as a single document Tunable mixing parameter
Requirements for the Generative Model • Specificity • A query temporal expression is more likely to be generated from a temporal expression that closely matches it • Q : “from the 1960s until the 1980s” • T: “in the second half of the 20th century” • T’: “in the 20th century”
Requirements for the Generative Model • Coverage • A larger overlap with the query temporal expression is preferred • Q: “in the summer of 1999” • T: “in the first half of 1999” • T’: “in the second half of 1999”
Requirements for the Generative Model • Maximality • P(Q|T) should be maximal for T = Q • Probability of generating a query temporal expression from a temporal expression matching it exactly must be the highest
Uncertainty Ignorant Language Model • A temporal expression can only generate itself • Misses the fact that temporal expression T and query temporal expression Q may refer to same time interval even though Value assumes 1 iff T= Q
Uncertainty-aware Language Model • Approach assumes equal likelihood for each time interval [qb, qe] that Q can refer to • Generating time interval [qb, qe] from temporal expression T value = 1 iff
Uncertainty-aware Language Model • Q and T inherently uncertain • Model assumes equal likelihood for all possible time intervals that Q and T can refer to
Efficient Computation • Enumerating all time intervals T and Q can refer to, before computing not practical • Temporal expression • if • |T| can be computed as • Begin point is compatible with end point e
If , compute |T| as • Captures only end points are compatible with a fixed begin point b
Query Temporal expression • For computing each time interval • can be computed by considering
Experimental Evaulation • Lm( ) : Unigram language model with Jelinek-Mercer smoothing • LmT-IN( , ) : Uncertainty-ignorant method using inclusive mode • LmT-EX( , ) : Uncertainty-ignorant method using exclusive mode • LmtU-IN( , ) : Uncertainty-aware method using inclusive mode • LmtU-EX( , ) : Uncertainty-aware method using exclusive mode
Queries Categorized by Topic and Temporal Granularity • New York Times Annotated Corpus
Future Work • Current Focus: Temporal information needs disclosed by an explicit temporal expression in the user’s query • Future: Dealing with queries that have implicit temporal intent • Query: “bill clintonarkansas” alludes to Bill Clinton’s time as Governor of Arkansas between 1971 and 1981
Extra Slides • Input: bostonjuly 4 2002 • Inclusive mode: qtext = {boston, 4, 2002} • Exclusive mode: qtext = {boston}
How does Bing classify queries as temporal? New Patent: Time-Shift: how Bing changes results based on temporal events • If the frequency of searches for certain queries increases, this can indicate a temporal nature if this happens around certain dates. • Increases of mentioned in blogs, microblogging services and increased updates in online encyclopedias can indicate a temporal nature. • A sharp increase in a click-through rate, abandonment, or reformulation of a query may indicate that a meaning for the query has shifted temporally. • To find out if a search query is still temporal, Bing might sometimes show alternative results to a number of searchers to see how they interact with the results.
Query-Likelihood Approach • Ponte and Croft‘s Model • Each document has a language model associated • Query is a random process • Documents are ranked according to the likelihood that the query would be generated by the language model estimated for each document • Unigram language model with Jelinek-Mercer smoothing