1 / 24

Design of a Click-tracking Network for Full-text Search Engine

Design of a Click-tracking Network for Full-text Search Engine. Group 5: Yuan Hu, Yu Ge , Youwen Gong, Zenghui Qiu and Miao Liu. Outline. Introduction Objective Project diagram Web Crawling Indexing schema Ranking strategies PageRank Algorithms Neural Network

zeke
Télécharger la présentation

Design of a Click-tracking Network for Full-text Search Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design of a Click-tracking Network for Full-text Search Engine Group 5: Yuan Hu, Yu Ge, Youwen Gong, ZenghuiQiu and Miao Liu

  2. Outline • Introduction • Objective • Project diagram • Web Crawling • Indexing schema • Ranking strategies • PageRank Algorithms • Neural Network • Content-Based Ranking • Software and Reference

  3. Introduction • Full-text Search Engine • search on key words • rank results • What is in a Search Engine? • Crawling • Indexing • Ranking results of query

  4. Objective • Design a full-text search engine • Rank search results in different ways

  5. Project Diagram Website Crawling Content-Based Ranking Text & urls Indexing Click-Tracking Network Database PageRank Algorithms Ranked results Query Function

  6. Web Crawling Main page: http://en.wikipedia.org/wiki/Machine_learning Depth 1: crawling all the url links on the main page http://en.wikipedia.org/wiki/Machine_learning#Decision_tree_learning …… Depth 2: crawling all the url links found in depth 1 http://en.wikipedia.org/wiki/Decision_tree_learning#Information_gain …… # Implemented with Python urllib2 module and BeautifulSoup API

  7. URL Main Page URL Depth 1 LINK LINK URL Depth 2

  8. Schema for Basic Index # Implemented with SQLite

  9. Results for Multiple-words Query Words Combination Word location Same url _id Query function ! Notice that all the url_ids returned are not ranked..

  10. http://www.rasch.org/rmt/rmt232a.htm PageRank Algorithm • Developed by Larry Page at Stanford U. in 1996. • How important that page is. • The importance of the page is calculated from all the other pages that link to it. http://www.rasch.org/rmt/rmt232a.htm

  11. How to Calculate PR • d:damping factor, 0<d<1, 0.85. • PR(B), ……..,PR(D)…. : PageRank value of each webpage linking to page A. • L(B),…….,L(D),….. : The number of links going out of page B,……D…..

  12. Example PR(A) = 0.15 + 0.85 * ( PR(B)/links(B) + PR(C)/links(C) +PR(D)/links(D) ) = 0.15 + 0.85 * ( 0.5/4 + 0.7/4 + 0.2/1 ) = 0.15 + 0.85 * ( 0.125 + 0.175 + 0.2) = 0.15 + 0.85 * 0.465 = 0.575

  13. http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htmhttp://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm How to Update the PR Value If we don’t know what their PR should be to begin with, just assign an initial PR value for every page. 20 Iterations Update http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm

  14. Results for PageRank PageRank values

  15. Neural Network Why? • Make reasonable guess about results for queries that they have never seen before. Click-tracking • The weights are updated based on the search results which the user clicked.

  16. Neural Net Work • Step1: Setting Up the Database • Step2: Feeding Forward Activation • Step3: Training with BackPropagation How Neural Network works? Solid line: Strong connections Bold text: Active node

  17. Step1: Setting Up the ANN Database • Create a table for hidden layer(red box) • Create two tables for the connections(green boxes)

  18. Step2: Feeding Forward Activation • Objective: activate the ANN. • Take words as inputs • Activate the links in the network • Give outputs for URL • Hyperbolic tangent function X-axis: total input to the node

  19. Step3: Training with Backpropagation • Train the network every time someone performs a search and choose one of the links • The same algorithm covered in class. • Learning rate = 0.5

  20. Results For Neural Network Step 1: From ID Strength Hidden node To ID Step 2: relevance of URL input URL Step 3: Training with one query

  21. Results For Neural Network(contd) Step 3: Training with more queries

  22. Content-Based Ranking Basic Idea: Calculate a score based only on the query and the content of the page • Word frequency • Document location • Word distance

  23. Software Reference • Ubuntu 11.04 • Python 2.7.3 • SQLite • Collective Intelligence- Toby Segaran • SQLite Tutorial - ZetCode • Dive into Python – Mark Pilgrim

  24. Thank you.

More Related