1 / 42

Beyond Bags of Words: A Markov Random Field Model for Information Retrieval

Beyond Bags of Words: A Markov Random Field Model for Information Retrieval. Don Metzler. Bag of Words Representation. 71 the 31 garden 22 and 19 of 19 to 19 house 18 in 18 white 12 trees 12 first 11 a 11 president 8 for 7 as 7 gardens 7 rose 7 tour 6 on 6 was 6 east

ellie
Télécharger la présentation

Beyond Bags of Words: A Markov Random Field Model for Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Beyond Bags of Words:A Markov Random Field Model for Information Retrieval Don Metzler

  2. Bag of Words Representation 71 the 31 garden 22 and 19 of 19 to 19 house 18 in 18 white 12 trees 12 first 11 a 11 president 8 for 7 as 7 gardens 7 rose 7 tour 6 on 6 was 6 east 6 tours 5 planting 5 he 5 is 5 grounds 5 that 5 gardener 4 history 4 text-decoration 4 john 4 kennedy 4 april 4 been 4 today 4 with 4 none 4 adams 4 spring 4 at 4 had 3 mrs 3 lawn … … 71 the 31 garden 22 and 19 of 19 to 19 house 18 in 18 white 12 trees 12 first 11 a 11 president 8 for 7 as 7 gardens 7 rose 7 tour 6 on 6 was 6 east 6 tours 5 planting 5 he 5 is 5 grounds 5 that 5 gardener 4 history 4 text-decoration 4 john 4 kennedy 4 april 4 been 4 today 4 with 4 none 4 adams 4 spring 4 at 4 had 3 mrs 3 lawn … …

  3. Bag of Words Representation

  4. Bag of Words Models

  5. Binary Independence Model

  6. Unigram Language Models • Language modeling first used in speech recognition to model speech generation • In IR, they are models of text generation • Typical scenario • Estimate language model for every document • Rank documents by the likelihood that the document generated the query • Documents modeled as multinomial distributions over a fixed vocabulary

  7. Unigram Language Models Query: Document: Ranking via Query Likelihood P(| ) = P( | )∙P( | )∙P( | ) Estimation P( | ) P( | θ ) P( | θ ) P( θ )

  8. Bag of Words Models • Pros • Simple model • Easy to implement • Decent results • Cons • Too simple • Unrealistic assumptions • Inability to explicitly model term dependencies

  9. Beyond Bags of Words

  10. Tree Dependence Model • Method of approximating complex joint distribution • Compute EMIM between every pair of terms • Build maximum spanning tree • Tree encodes first order dependencies A B C E D

  11. n-Gram Language Models

  12. n-Gram Language Models Query: Document: Ranking via Query Likelihood P(| ) = P( | )∙P( | , )∙P( | , ) Estimation

  13. Dependency Models • Pros • More realistic assumptions • Improved effectiveness • Cons • Less efficient • Limited notion of dependence • Not well understood

  14. State of the Art IR

  15. Desiderata • Our desired model should be able to… • Support standard IR tasks (ranking, query expansion, etc.) • Easily model dependencies between terms • Handle textual and non-textual features • Consistently and significantly improve effectiveness over bag of words models and existing dependence models • Proposed solution: Markov random fields

  16. Markov Random Fields • MRFs provide a general, robust way of modeling a joint distribution • The anatomy of a MRF • Graph G • vertices representing random variables • edges encode dependence semantics • Potentials over the cliques of G • Non-negative functions over clique configurations • Measures ‘compatibility’

  17. Markov Random Fields for IR

  18. Modeling Dependencies

  19. Parameter Tying • In theory, a potential function can be associated with every clique in a graph • Typical solution is to define potentials over maximal cliques of G • Need more fine-grained control over our potentials • Use clique sets • Set of cliques that share a parameter and potential function • We identified 7 clique sets that are relevant to IR tasks

  20. Clique Sets Single term document/query cliquesTD = cliques w/ one query term + Dψ(domestic, D), ψ(adoption, D), ψ(laws, D) Ordered terms document/query cliquesOD = cliques w/ two or more contiguous query terms + Dψ(domestic, adoption, D), ψ(adoption, laws, D),ψ(domestic, adoption, laws, D) Unordered terms document/query cliquesUD = cliques w/ two or more query terms (in any order) + Dψ(domestic, adoption, D), ψ(adoption, laws, D), ψ(domestic, laws, D), ψ(domestic, adoption, laws, D)

  21. Clique Sets Single term query cliquesTQ = cliques w/ one query termψ(domestic), ψ(adoption), ψ(laws) Ordered terms query cliquesOQ = cliques w/ two or more contiguous query termsψ(domestic, adoption), ψ(adoption, laws), ψ(domestic, adoption, laws) Unordered terms query cliquesUQ = cliques w/ two or more query terms (in any order) ψ(domestic, adoption), ψ(adoption, laws), ψ(domestic, laws), ψ(domestic, adoption, laws) Document cliqueD = singleton clique w/ Dψ(D)

  22. Features

  23. Parameter Estimation • Given a set of relevance judgments R, we want the maximum a posteriori estimate: • What is P( Λ | R )? P( R | Λ ) and P( Λ )? • Depends on how model is being evaluated! • Want P( Λ | R ) to be peaked around the parameter setting that maximizes the metric we are interested in

  24. Parameter Estimation

  25. Evaluation Metric Surfaces

  26. Feature Induction

  27. Query Expansion

  28. Query Expansion

  29. Query Expansion Example Original query: hubble telescope achievements

  30. Query Expansion Example Original query: hubble telescope achievements

  31. Applications

  32. Application: Ad Hoc Retrieval

  33. Ad Hoc Results

  34. Application: Web Search

  35. Application: XML Retrieval

  36. Example XML Document <PLAY> <TITLE>The Tragedy of Romeo and Juliet</TITLE> <ACT><TITLE>ACT I</TITLE> <PROLOGUE><TITLE>PROLOGUE</TITLE> <SPEECH> <SPEAKER>NARRATOR</SPEAKER> <LINE>Two households, both alike in dignity,</LINE> <LINE>In fair Verona, where we lay our scene,</LINE> <LINE>From ancient grudge break to new mutiny,</LINE> <LINE>Where civil blood makes civil hands unclean.</LINE> <LINE>From forth the fatal loins of these two foes</LINE> <LINE>A pair of star-cross’d lovers take their life;</LINE>… </SPEECH> </PROLOGUE> <SCENE><TITLE>SCENE I. Verona. A public place.</TITLE> <SPEECH> <SPEAKER>SAMPSON</SPEAKER> <LINE>Gregory, o’ my word, we’ll not carry coals.</LINE> … </PLAY>

  37. Content and Structure Queries //scene[about(., poison juliet)]Return scene tags that are about poison and juliet. //*[about(., plague both houses)]Return elements (of any type) about plague both houses. //scene[about(., juliet chamber)]//speech[about(.//line, jewel)]Return speech tags about jewel where the scene is about juliet chamber. • NEXI query language • derived from XPath • allows mixture of content and structure

  38. Application: Text Classification

  39. Conclusions

More Related