1 / 19

Predicting Content Change On The Web

Predicting Content Change On The Web. BY : HITESH SONPURE. GUIDED BY : PROF. M. WANJARI. OUTLINE. Introduction Related Work Main Focus Problem Formulation and Targets Foundational Methodologies and Algorithms Experimental Setup And Result Application Conclusions Further plans.

oakley
Télécharger la présentation

Predicting Content Change On The Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI

  2. OUTLINE • Introduction • Related Work • Main Focus • Problem Formulation and Targets • Foundational Methodologies and Algorithms • Experimental Setup And Result • Application • Conclusions • Further plans

  3. INTRODUCTION • The ability to predict key types of changes can be used in a variety of setting. • In particular, the content of a page enables better prediction of its change. • Pages that are related to the prediction page may also change in similar.

  4. Related Work • Incremental Web Crawling Setting- Recrawling a web page is linked to the probability of its change. • User Centric Utility- Utility Weights each page. • Several works Use Past change frequency and change recency of a page.

  5. Related Work • Prediction based on content based features. • Type of correlation structure at the website level by using a sample of web pages from a website. • Extends above idea by clustering pages based on static and dynamic content features.

  6. Focus • The task of predicting significant changes rather than any change to a web page. • Develop a wide array of dynamic content based features that may be useful for the more general temporal mining case beyond crawling. To predict Dynamic Content Change On The Web, so that one can improves a variety of retrieval and web related components.

  7. Focus 3. Explore a wide variety of methods to identify related pages including content , web graph distance and temporal content similarity. 4.Derive a novel expert prediction framework that effectively leverages information from related pages without the need for sampling from the current time slice.

  8. PROBLEM FORMULATION AND TARGETS where o ϵO at time • Types of Web Page Change 1. Whether the page o ϵ O changes significantly. 2. Whether the change in page o ϵ O corresponds to a change from non relevant previous content to relevant current content. 3. Whether there is a new out link from a page o ϵ O .

  9. …..Continued • Information Settings 1. 1D setting 2. 2D setting 3. 3D setting

  10. …..Continued • Information Observability 1.Partially Observed 2. Fully Observed

  11. LEARNING ALGORITHMS • BASELINE ALGORITHM Prediction is based on the probability of the page change significantly. i.e. p(h( oi,tj )=1 | h( oi,tk ) ϵE where tk < tj and (tj – tk)≤ l). • SINGLE EXPERT ALGORITHM Represents the pages with set of features. • MULTIPLE EXPERT ALGORITHM Consider both page’s features and features of other pages

  12. EXPERIMENTAL SETUP RESULTS

  13. APPLICATION: • Application to Crawling Maximising Freshness

  14. CONCLUSIONS • Tackled the problem of predicting significant content change. • Sheds light on how and why content changes on the web and how it can be predicted. • the addition of the page content improves prediction when compared to simple frequency-based prediction. • Additionally, the addition of information of related pages content improves over the usage of page's content alone.

  15. FURTHER PLANS • To predict the appropriate analysis in Real time Scenario.

  16. REFERENCES • E. Adar, J. Teevan, S. Dumais, and J. Elsas. The web changes everything: Understanding the dynamics of web content. In Proc. of WSDM, 2009. • J. Cho and H. Garca-Molina. The evolution of the web and implications for an incremental crawler. In Proc. of VLDB, 2000. • J. Cho and H. Garca-Molina. Estimating frequency of change. TOIT, 3(3):256{290, 2003.

  17. REFERENCES • D. Fetterly, M. Manasse, M. Najork, and J. L. Wiener. A large-scale study of the evolution of web pages. In Proc. Of WWW, 2003. • Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. JMLR, 4:933{969, 2003.

  18. REFERENCES • L. Getoor and L. Mihalkova. Exploiting statistical and relational information on the web and in social media. In Proc. of WSDM, 2011.

  19. THANK YOU !

More Related