1 / 12

Information extraction from Queries

This paper explores novel methods for information extraction from queries, focusing on probabilistic query modeling and message passing for inference within single query models. We analyze a large dataset of 100 million unique live search queries collected over 10 months, conducting preliminary experiments on specific subsets related to actors, cars, and national parks. Key techniques include handling noise in data, dependency of templates on attributes, and integrating traditional entity extraction methods with advanced tagging and disambiguation strategies.

ramla
Télécharger la présentation

Information extraction from Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information extraction from Queries Ed Snelson, Joaquin Quiñonero Candela, Ralf Herbrich, Thore Graepel

  2. Information extraction from queries

  3. Templates

  4. Probabilistic query modelling

  5. Key details • EP message passing for inference within single query model • ADF single pass through queries • Sparse messages within query • Bootstrap from initial seed sets of instances/attributes • Directed processing of queries based on current top beliefs

  6. Data • 10 months, Live Search query logs • 100 Million unique queries, with associated counts • Preliminary experiments on small specific subsets • e.g. 50,000 unique queries related to actors, cars and national parks

  7. Seed lists

  8. Actors

  9. Cars

  10. National Parks

  11. Templates

  12. Future improvements • Class/Attribute dependent templates • A garbage class to deal with “noise” • Reducing sensitivity to order of processing initial queries • Disambiguation, synonyms etc. • Use of part-of-speech tagger • Combination with standard hand-crafted entity extraction techniques

More Related