1 / 10

Using PageRank and Naïve Bayes Models for extracting data on good web page design

Using PageRank and Naïve Bayes Models for extracting data on good web page design. Archit Baweja, Daniel Moyer, Doug Traher. Introduction. Web no longer a set of inter connected text-only web pages. Presentation of content equally important. Web 2.0 Good page design is subjective

ursa
Télécharger la présentation

Using PageRank and Naïve Bayes Models for extracting data on good web page design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using PageRank and Naïve Bayes Models for extracting data on good web page design Archit Baweja, Daniel Moyer, Doug Traher

  2. Introduction • Web no longer a set of inter connected text-only web pages. • Presentation of content equally important. • Web 2.0 • Good page design is subjective • Re-phrase • Can PageRank be used for classification of web pages for a given category • Present solution requires a domain expert • What if we could extract domain knowledge using stochastic techniques • The world wide web is our pool domain knownledge

  3. Related Works • Studies on inferences from PageRank or similar ranking algorithms • People’s attention in the blogosphere[4] • LinkedIn connections to find powerful people[5] • CodeRank for software metrics[6] • Visual impact of political websites on user trust • Reed et al analyze the visual impact of political websites[3] • Harrison et al on the impact of initial consumer trust on intentions to interact with a website[7]

  4. Background • Google’s PageRank [1] • Naïve Bayes Model [2]

  5. Approach • Use Google API to sort web pages for a given subject • Extract web page features • Train a Naïve Bayes Classifier • Classifier helps answer various questions • Does a given web page belong to a given class • What features should my web page have to belong to a class of web pages.

  6. Evaluation • Experiment details • We used our approach in classifying political websites • Colors as the basis of classification, set • Reasons • Fits well with the naïve bayes model of classification. • Colors are an easy parameter for political websites in the United States of America. • Results Discussion • Promising. • See final report.

  7. Conclusions • Approach is viable • Basis for extracting domain knowledge when there is lack of domain experts • Limitations of technologies used • Rank source is inferred by Google • Assumption of the Naïve Bayes Model

  8. Future Work • Need more experiments • Newer features of web pages; images, flash, widgets • Experiment with other classification techniques • Use a more controlled PageRank implementation • Use other classification techniques (bayes networks, neural networks, markov decision process) • Apply to other fields • Basis for extracting domain knowledge when there is lack of domain experts

  9. Questions?

  10. References • R.M. T.W. Lawrence Page, Sergey Brin. The PageRank citation ranking: Bringing order to the web, 1999. • S.J. Russell and P. Norvig. Artifical Intelligence: A Modern Approach (2nd Edition). Prentice Hall, December 2002. • K.N. Reed and D.P. Groth. Looking good on the web: evaluating the visual impact of political websites. In CHI’08: CHI’08 extended abstracts on Human factors in computing systems, pages 3753-3758, New York, NY, USA, 2008. ACM • L.Kirchhoff, A. Bruns, and T.Nicolai. Investigating the impact of the blogosphere: Using PageRank to determine the distribution of attention. Association of Internet Researchers, 2007. • F. van Puffelens. Using PageRank to determine the most powerful people on LinkedIn. http://frank.vanpuffelen.net/2008/07/using-pagerank-to-determine-most.html • B. Neate, W. Irwin, and N. Churcher. CodeRank: A new family of software metrics. In Software Engineering Conference, 2006, Australian, Apr 2006. • Harrison McKnight D., Choudhury V., and Kacmar C., The Impact of Initial Consumer Trust on Intentions to Interact with a Website: A Trust Building Model. Journal of Strategic Information Systems, 2002.

More Related