1 / 17

Random Walking on the World Wide Web Project Presentation

Random Walking on the World Wide Web Project Presentation. Team members: Levin Boris Laserson Itamar Instructor Name: Gurevich Maxim. Introduction. Statistics about web-pages are very important Use a random sample of web pages to approximate: search engine coverage

alvin-wyatt
Télécharger la présentation

Random Walking on the World Wide Web Project Presentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Random Walking on the World Wide WebProject Presentation Team members: Levin Boris Laserson Itamar Instructor Name: Gurevich Maxim

  2. Introduction • Statistics about web-pages are very important • Use a random sample of web pages to approximate: • search engine coverage • domain name distribution (.com, .org, .edu) • average number of links in a page • average page length • The Goal : Develop a cheap method to sample uniformly from the Web

  3. Random Walker • Random walk on a graph provides a sample of nodes • Graph is undirected and regular sample is uniform Problem: The Web is neither undirected nor regular • Incrementally create an undirected regular graph with the same nodes as the Web • Perform the walk on this graph

  4. WebWalker 3 5 amazon.com 3 2 • Follow arandom out-link or a random in-linkat each step • Useweighted self loopsto even out pages’ degrees 3 0 4 netscape.com 0 1 4 3 3 2 1 1 3 2 2 2 w(v) = degmax - deg(v) 4

  5. WebWalker • A random walk on a connected undirected regular graph converges to a uniform stationary distribution. • Pseudo code: Webwalker(v): - Spend expected degmax/deg(v) steps at v - Pick a random link incident to v (either v  u or u  v) Webwalker(u).

  6. MD and MH Algorithms Maximum-Degree • The algorithm works by adding self loops to nodes. • Causing random walk to stay at these WebPages (nodes). • And by that fixing the bias in the trial distribution. Metropolis-Hastings • The Algorithm gives preference to smaller documents by reducing the step probability to large documents. • This fixes the bias caused by large documents with a large number of pareses.

  7. Project description • Implement the WebWalker algorithm • Design a simulation frame work • Compare the results to the Search Based random walks from our previous project • Analyzing and displaying the results

  8. Software – Class Diagram (utility)

  9. Software – Class Diagram (1)

  10. Software – Class Diagram (2)

  11. Software – Class Diagram (Result Analyzer)

  12. Designing the Simulation Frame Work • Planning a series of simulations testing different parameters of the algorithms • Considering “bottlenecks” like the Yahoo daily query limit and H.D space. • Measuring the effect of each parameter on the algorithm • Running the simulations at the software lab on several computers at a time

  13. Analysis Criteria • Similarity • Unique Hosts Visited • Final Similarity • Convergence

  14. Results – Similarity

  15. Results – Unique hosts visited

  16. Results – Convergence

  17. Results - SE vs. WW

More Related