1 / 17

Toward Next Generation Search: Business, Product, Science, Infrastructure, and Talent

Toward Next Generation Search: Business, Product, Science, Infrastructure, and Talent. William I. Chang Chief Scientist Baidu.com wchang @ baidu.com. History. Outline. Synopsis of Web search evolution Themes and principles Challenges and opportunities

sybil-gibbs
Télécharger la présentation

Toward Next Generation Search: Business, Product, Science, Infrastructure, and Talent

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Toward Next Generation Search: Business, Product, Science, Infrastructure, and Talent William I. Chang Chief Scientist Baidu.com wchang @ baidu.com

  2. History Outline • Synopsis of Web search evolution • Themes and principles • Challenges and opportunities • Possible steps toward the next generation of search

  3. History First Generation: ~1996-2000 Three Laws of Search (Infoseek ~1996-8) 1. Phrase are more basic than single words 2. Confidence in the source is more important than the content 3. “Facet” is more powerful than the relational model Fundamental Theorem of Search: the search space can be factored, where each dimension is a taxonomy • NLP • Billions of terms, proper names • Lexical analysis and special cases: capitalization, contraction, acronyms, possessives, etc. • Word stemming, phrase stemming • Phrase extraction and query-rewriting, e.g. home run record • Leveraging user input and community recommendation • Query suggestion by log-mining • Selection and ranking using link analysis and anchortext indexing • Birth of Adversarial Information Retrieval (anti-spam)

  4. History Second Generation: ~2001-present “The Internet is a place where one can always find someone to help answer any question or get anything done.” • Web Oracle model (proposed in 1998) • Online communities: BBS, Mailing list, eGroup, Usenet News… • FAQ documents on the Internet • FAQ Finder & Builder as a community killer-app • Intelligent search • User-generated content: blogs, MySpace, YouTube… • Tagging • Communities around knowledge: Wikipedia, Baidupedia… • Question-answering communities: • Navers, Yahoo! Answers, Sina iAsk, Baidu iKnow… • People search: LinkedIn, Facebook…

  5. History Third Generation “The Internet is a matching network.” • Personalized search results • Based on locale, personal profile, search and browse history • Personal ranking function, source selection, keyword filtering • Personal search agent: spider, summarizer, Q&A agent • Integration of search and recommendation (pull and push) • Subscription through automatic personalization • Content, media, events, products and services… • Matching things with people, etc. • Shopping assistant, information integrator • Predictive recommendation with feedback • Always-on and environment-aware • Do you like this? Make it more (or less) custom, please. • Is your taste like mine? How to evaluate the evaluator?

  6. History Summary • First Generation • Searching for information or content using NLP techniques • Based on community recommendation of content or keywords • Little or no personalization • Second Generation • Aimed at resolving problems or finding people, entertainment • Centered around community-created content • Group customization • Third Generation • More integrated into people’s daily lives and needs • Predictive, locale and environment-aware • True personalization

  7. History Principles • Phrases are the conceptual units • Accurate name extraction and matching • Query rewriting & suggestion, “no quotes”, typos are OK • Understanding user needs, semantic match, machine translation • Confidence in the source • Leverage community recommendation to filter content • Tagging, blogging, SMS forwarding • Community-created content are more interesting • People helping people • Answer any question or get anything done • Internet is more and more part of people’s everyday lives • Ubiquitous, always on, environment aware • Universal messaging/delivery of content, better integration • On-the-spot advice e.g. personalized shopping

  8. History Challenges for Search • Search ranking function is an incredibly complex, possibly non-decomposable multi-objective optimization problem: • Recall-precision tradeoff • Weighting of multiple terms • Textual quality and specificity, information richness • Unique and original content • Popularity vs authority • Freshness, timeliness • User needs, domain specific query • Search engine is a database of massive size that needs to be continually refreshed and near-real-time updated, with high QofS requirements (response time, uptime) and throughput/efficiency requirements (search itself is free). • Search has to be built around user behavior analysis of massive scale, in order to respond to constantly changing WWW environment. This has to be automated, self-adaptive, and near-real-time.

  9. History Challenges for a Search Service • (China) Each year, data size doubles and user-base doubles (2x2=4), placing financial strains on service providers. Data centers and electricity are scarce resources. • Many distributed systems in operation, but they need to be flexible and reconfigurable, without sacrificing efficiency (much). • How to beneficially direct traffic between search and other services? What types of advertising will users accept? How to be context sensitive and user-sensitive?

  10. History Challenges for Society and the Internet • WWW as social network has become balkanized. We need new “people” search engines that let people find and help other people, yet protect privacy and reputation. • (China) The emergence of nascent commerce infrastructure poses huge challenges. Commerce platforms need to support safe transactions, advertising and brand marketing, and need to seamlessly integrate online and offline services. • Education, government, media, and Internet as agents for social engineering?

  11. History Toward Next Generation Search: Business • Transparency of advertising effectiveness vs secrecy of matching algorithm • Ad targeting and audience segmentation • Convergence of different forms of online advertising: search, display, contextual, behavioral • Convergence of online and traditional advertising: brand marketing, local advertising (classifieds, yellowpages), direct marketing • Integration of online and offline services • Ubiquitous, mobile applications

  12. History Toward Next Generation Search: Product • Ease of use: AND vs OR, “soft” AND, synonyms and concept search • Query term suggestion (does it hurt?) • Community Q&A: mining FAQs, routing to experts, “Wiki-Answers” • Factoid extraction • Open platform to accommodate topic/user/task-specific search engines

  13. History Toward Next Generation Search: Science • Relevance vs user satisfaction • Session behavior and modeling: term additions • Result diversity; avoidance of “abandonment” • How to evaluate the efficiency of incremental information discovery by a search engine • TF*IDF revisited

  14. History Toward Next Generation Search: Infrastructure • FLASH memory SSD: fast read, slow write • Data analysis platform • Development platform • Internal- and external use of P2P technologies • Search engine as a platform

  15. History Toward Next Generation Search: Talent • Recruitment • Talent development

  16. History In Conclusion Ever-increasingly leverage user and community collective intelligence, in a manner that is self-adaptive, scalable, and (near) real-time, in order to support ubiquitous, integrated online and offline services.

  17. Thank you wchang @ baidu.com

More Related