Challenges in Web Search

Challenges in Web Search Amit Singhal

Web Search • Crawl, Index, Search • Crawl and Index • freshness • coverage (page selection, deep web) • Search • adversarial IR, trust • evaluation • partitioning the query space

Crawl and Index • Freshness • pages are deleted, created, changed • How to keep the index fresh? • Coverage • which 2.5B pages to index? • lot of useful information in databases • How to index “hidden” content?

Search • Adversarial IR • all useful signals are spammed

Search • Trust • how much can we trust a site • an article hosted at BBC is much more trustworthy than the same article hosted at yet-another-news-company.com • How trustworthy is a site, and how to use this information in ranking?

Search • Evaluation • the collection changes continuously • rel. pages become non-rel., and vice-versa • can’t easily freeze a copy • relevance is a function of rendering • need all images, all redirects, CSS, … • linkage characteristics change over time • query space is huge (over 150M/day) • most popular query: 0.037%, 10th most popular: 0.011% • need a very large query set, expensive • How to evaluate given changing collection and a very big query space?

Search • Ranking in a huge query space • specific methods work well for specific query types • e.g strong proximity helps for people names • identify query type and use type-specific ranking algorithms • How to partition the query space into meaningful and useful partitions?

Web Search • How to keep the index fresh? • How to index “hidden” content? • How trustworthy is a site, and how to use this information in ranking? • How to evaluate given changing collection and a very big query space? • How to partition the query space into meaningful and useful partitions? • It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts. Sir Arthur Conan Doyle(1859 - 1930)

Challenges in Web Search

Challenges in Web Search

Presentation Transcript

Search web

Web Search Basics

Web Search

Web Search

Scalability and Efficiency Challenges in Large-Scale Web Search Engines

Web search

Web Search

Web Search

Web Search

Web Accessibility Challenges in Multilingual web access

Web Search

Web Search

Web Search

Web Search

Web Search

Web search engines

Overcoming Job Search Challenges