1 / 14

Smarter Search Engines

Smarter Search Engines. Using Personalization to Improve Search Results Eugene Cushman Dan Murphy George Stuart Advised by Professor Mark Claypool. The Problem. There are billions of web pages on the Internet They vary greatly in quality Growth is Exponential

agoodlett
Télécharger la présentation

Smarter Search Engines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Smarter Search Engines Using Personalization to Improve Search Results Eugene Cushman Dan Murphy George Stuart Advised by Professor Mark Claypool

  2. The Problem • There are billions of web pages on the Internet • They vary greatly in quality • Growth is Exponential • Search engines must adapt to keep up

  3. Existing Systems • Google • Layered Architecture • PageRank™ • GroupLens • Applied to USENET • Different domain space • Uses collaborative filtering

  4. Personalization • “Qualitative” rankings • Example: “Good Low-Fat Dessert Recipes” • Example: “Theories of dinosaur extinction” • Contrast with specific, factual searches • Example: “The batting lineup for the Boston Red Sox on October 28, 1986” • Exploratory versus “narrow-band” searches

  5. Collaborative Filtering • Uses aggregate data to predict user preference • User A like Foo • User B trusts User A’s preference • User B can be predicted to prefer Foo • (extremely simplified) • Algorithms • Pearson Correlation Coefficient

  6. Foible: the best of both worlds • Foible integrates disparate technologies to provide a powerful web-searching experience • Search Engine Indexing • Collaborative Filtering • Results in demonstrable improvement in search results

  7. Foible Architecture • Spider • Analyzer • Cache • Collaborative Engine • Search Engine • Web Interface

  8. Web Spider • Parallelized Depth-first crawl of web • Create lists of nodes by parsing HTML, looking for links • Starts with link-heavy “seed node” • Custom seed node incorporating search results on “dinosaurs” from Yahoo, Google, and others • Foible Statistics • Over 27,000 web pages crawled • In excess of 500 Megs of web data cached • Total database size of 1 Gigabyte • 7.269 Million rows in Word Frequency table

  9. Analyzer • Parses HTML to create describe attributes of web page • Document Size, Number of Sentences • Reading Level (Fog, Flesch-Kincaid) • Number of Images • Content-to-HTML ratio • Number of Links • Precomputes word-frequency tables

  10. Collaborative Searching • Three components of search algorithm • Word Frequency • Profile Correlation • Recommender System • Computes ranking of all pages • Returns results to user

  11. User Study • Approximately 50 Users • 20 Completed study in its entirety • Consisted of 5 Searches • Predefined broad topics • Users provided explicit feedback • Search results presented in two column format • Enhanced Collaborative Results • Control – Word Frequency Only

  12. User Study Data 1

  13. User Study Data 2

  14. Results and Conclusion • Users unanimously prefer collaborative ratings to non-collaborative • Smarter searches produced pages ranked in better order according to study • Introducing collaborative filtering into traditional search engine technology results in better search results!

More Related