1 / 25

Towards Adaptive Websites: A Conceptual Framework and Case Study Mike Perkowitz Oren Etzioni

Towards Adaptive Websites: A Conceptual Framework and Case Study Mike Perkowitz Oren Etzioni. Presented By Ben Childs James Hunter Sergey Petrov Issam Souilah. Department of Computer Science, November 2005. Agenda. Introduction Paper Outline Motivation

wayde
Télécharger la présentation

Towards Adaptive Websites: A Conceptual Framework and Case Study Mike Perkowitz Oren Etzioni

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Adaptive Websites: A Conceptual Framework and Case StudyMike Perkowitz Oren Etzioni Presented By Ben Childs James Hunter Sergey Petrov Issam Souilah Department of Computer Science, November 2005

  2. Agenda • Introduction • Paper Outline • Motivation • What are Adaptive Websites? • Approaches to Adaptation • The Index Page Synthesis Use Case • The PageGather Algorithm • Description of The Algorithm • Experimental Method • Time Complexity • Comparison with Related Algorithms • The IndexFinder Algorithm • Conceptual Cluster Mining • Experiments • Implementations • Conclusions • Related Work • Summary • Resources

  3. Paper Outline • Published 1999 (latest version 2001) • Explores adaptive web sites • Describes the design space of adaptive web sites • Considers a case study: index page synthesis • Presents two algorithms: • PageGather: a statistical cluster mining algorithm • IndexFinder: a conceptual cluster mining algorithm

  4. Motivation • Designing a complex web site so it readily yields its information is tricky, because: • Different visitors have distinct goals • Same user may seek different information at different times • Many sites outgrow their original design, accumulating links and pages in unlikely places • A site may be designed for a particular use, but may be used in unanticipated ways in practice • Too often, web sites are fossils cast in HTML, while web navigation is dynamic, time-dependent, and idiosyncratic

  5. What Are Adaptive Websites? • Adaptive websites are sites that automatically improve their organization and presentation by learning from visitor access patterns • They mine the data buried in web server logs to produce more easily navigable websites • To demonstrate the feasibility of adaptive websites, the index page synthesis use case is considered

  6. Approaches to Adaptation • Aim is to make a website “better”, so we need a clear quality measure • Quality measure as a function of variables: • How often users find what they are looking for • How many clicks users have to make to get to their goal • How much time users spend reading link text and scrolling through pages • Two approaches to adaptation: • Content-based : organizes and presents pages based on their content. • Access-based : uses the way past visitors have interacted with the site to guide how information is structured. • Content-based and access-based adaptations are complementary and may be used together

  7. The Index Page Synthesis Case Study(1) • Page synthesis is the automatic creation of web pages • An index page is a page consisting of links to a set of pages that cover a particular topic • Index page synthesis problem: given a web site and a visitor access log, create new index pages containing collections of links to related but currently unlinked pages

  8. The Index Page Synthesis Case Study (2) The Index Page Synthesis Problem: • What are the contents (i.e. hyperlinks) of the index page? • How are the hyperlinks on the page ordered? • How are the hyperlinks labeled? • What is the title of the page? Does it correspond to a coherent concept? • Is it appropriate to add the page to the site? If so, where?

  9. Solutions • 2 Algorithms have been suggested by the authors of the paper • PageGather • IndexFinder

  10. The PageGather Algorithm • The PageGather algorithm is a statistical cluster mining algorithm • Clustering algorithms take a collection of objects as their input and produce a partition of the collection • Cluster mining is a variation on traditional clustering that may place a single object in multiple overlapping clusters • PageGather uses cluster mining to find collections of related pages at a website

  11. Description of PageGather • Process the access log into visits • Compute the co-occurrence frequencies between pages and create a similarity matrix • Create the graph corresponding to the matrix, and find maximal cliques (or connected components) in the graph • Rank the clusters found, and choose which to output • Eliminate overlap among the clusters • Present it to the webmaster for evaluation

  12. Experimental Method • Experiments draw on data collected from three distinct collections of web pages • The effectiveness of index page synthesis is based on three factors: • Impact: How many people use the new pages and how often • Benefit: How much effort is saved by those who visit the pages • Recall: How much information sought by the user was actually found

  13. Time Complexity • What is the running time of PageGather? • Let L be the number of page views in the log and N the number of pages at the site • Step (1) requires O(L log L) time: page views must be sorted by origin and time • Step (2) requires O(L + N2) time: must process the log and create a matrix of size O(N2) • In step (3) we may find either connected components (linear in the size of the graph) or cliques (exponential in general, but since size of discovered clusters is bound to k, this step is a polynomial of degree k)

  14. Comparison with related algorithms • PageGather significantly outperforms other statistical clustering algorithms, but is not as well as human-authored clusters

  15. The IndexFinder Algorithm • PageGather relies on a statistical approach to discovering candidate link sets; its candidates do not correspond precisely to intuitive concepts, whereas human-authored index pages do • An algorithm that finds only candidate link sets that are conceptually coherent is desired • IndexFinder is a key extension to PageGather that guarantees that only sets corresponding to topics are generated

  16. IndexFinder - Problem Definition • Given: • A data collection D (e.g., a set of pages at a web site) • A pairwise similarity measure m defined over D (e.g., page co-occurrence frequencies derived from access logs) • A conceptual language L for describing elements of D (e.g., conjunctions of descriptive features) • A description in L of each object in D • Output all subsets c of D such that: • c is highly cohesive with respect to m (e.g., the average pairwise similarity of the objects in c exceeds some threshold) • c corresponds to a concept expressible in L

  17. IndexFinder – Previous work • Relevant previous work of three types: • Statistical approaches (e.g., PageGather): useful for finding cohesive sets in large collections of data, but make no attempt to ensure that their results correspond to an intuitive concept • Conceptual clustering algorithms (e.g., Fisher’s COBWEB [4]): partition a data collection into clusters of similar objects. Moreover, objects are described in a conceptual descriptive language • Concept learning algorithms: aim to find a conceptual description of a set of objects from a data collection (note that data needs to be classified in advance)

  18. Experiments • Experiments show that IndexFinder outperforms both PageGather and COBWEB and is close to the performance of the human-authored index pages

  19. Implementations • And More: • Use both user’s path and model to guess what pages they are interested in seeing e.g., AVANTI Project [1] • Automatic user categorization • Hybrid approach • Footprints [2] uses the metaphor of travellers creating footpaths in the grass over time • Using meta-information e.g., XML, Apple’s Meta-Content Format, STRUDEL [3] • Client-side customization

  20. Conclusions (1) • PageGather and IndexFinder outperform traditional methods including: the Apriori data mining algorithm, standard clustering algorithms and the COBWEB conceptual clustering algorithm • PageGather and IndexFinder are instances of novel, domain-independent approaches to unsupervised data mining • Extensions and applications to these approaches outside the domain of adaptive websites can be found

  21. Conclusions (2) • Future work may focus on the automatic placement of new index pages at the website • Automatically suggesting names for the new pages, and deciding where in the site they should be located • Index page synthesis itself is a step towards the long-term goal of change in view: adaptive websites that automatically suggest re-organisations of their contents based on visitor access patterns

  22. Related Work • By the authors: • Mainly updates to the original paper (most recent one in 2001) • By others: • Adaplix [5] : A system that extends HTML by introducing conditional statements and an inductive logic programming component to learn the user's browsing preferences • WebWatcher [6]: A “tour guide” of the web. It accompanies the user from page to page, highlighting hyperlinks that it believes will be of interest

  23. Summary • We have covered: • Adaptive Websites • The Index Page Synthesis Use Case • The PageGather Algorithm • The IndexFinder Algorithm • Implementations • Related Work

  24. Any Questions?

  25. Resources • [1] J. Fink, A Kobsa, and A. Nill. User-oriented Adaptivity and Adaptability in the AVANTI Project. In Designing for the Web: Empirical Studies, Microsoft Usability Group, Redmond (WA)., 1996. • [2] A. Wexelblat and P. Maes. Footprints: History-rich web browsing. In Proc. Conf. Computer-Assisted Information Retrieval (RIAO), pages 75-84, 1997. • [3] M. Fernandez, D. Florescu, J. Kang, A. Levy, and D. Suciu. System Demonstration - Strudel: A Web-site Management System. In ACM SIGMOD Conference on Management of Data, 1997. • [4] D. Fisher. Knowledge Acquisition Via Incremental Conceptual Clustering. Machine Learning, 2:139-172, 1987 • [5]Nico Jacobs. Adaplix: Towards Adaptive Websites. In P. De Bra and L. Hardman, editors, Proceedings van de Informatiewetenschap'99 Conferentie, pages 22--28. Eindhoven University of Technology, November 1999 • [6] URL :http://www.cs.cmu.edu/~webwatcher, accessed on 22 November 2005

More Related