1 / 23

WWW Search and Navigation

WWW Search and Navigation. Mark Levene SCIS, Birkbeck College University of London www.dcs.bbk.ac.uk/~mark/. Talk Overview. Hypertext and the navigation problem NavigationZone ’s solution Problems being researched A Demonstration. Hypertext and Navigation. Long history

virgo
Télécharger la présentation

WWW Search and Navigation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London www.dcs.bbk.ac.uk/~mark/

  2. Talk Overview • Hypertext and the navigation problem • NavigationZone’s solution • Problems being researched • A Demonstration

  3. Hypertext and Navigation • Long history • Bush 1945, memex – trail blazing • Nelson 1965, Xanadu - network of documents • Problem of “getting lost in hyperspace” • Navigation aids • Bookmarks • History • Overview diagrams • Recommendations

  4. State-of-the-Art Navigation Aids • Novel User-Interfaces to visualise web sites • Clustering (e.g. Self-Organising Maps) • Web data mining – finding user patterns • Semi-automated navigation, BestTrail algorithm – motivation to follow …

  5. Typical corporate search

  6. A typical search scenario • Submit a query to a search engine • Is it too broad / too specific? • Does it capture my information needs? • Select a URL from the result set • Have I made the right choice? • Start manual navigation • Where - am I? have I come from ? am I going to ? • Goto (1) to reformulate the query

  7. e d a e d * a b c Content centric approach

  8. Problems with standard Search • Page level relevance scoring • sensitive to query terms • No look ahead • ‘click and discover’ • No context • results are totally isolated • No navigation support • Users are left on their own to find their way

  9. Possible solutions (information retrieval) • Improve basic IR • Link analysis, e.g. pagerank and HITS • Meta data tagging • Keywords and taxonomies (semantic web) • Natural language • Q&A, sentence analysis, synonyms

  10. Possible solutions (information seeking) • Suggestion engines • Link and content generation • Categories and directories • Explicit manual construction • Automatic classification • Machine learning techniques

  11. Are these feasible? • Re-architecting corporate information infrastructure is extremely expensive • Sophisticated approaches are not always intuitive and are yet to be proven • Same problem every couple of years • Mergers and acquisitions

  12. There is, actually, a better way! • Treat sequence of pages, or trails, as first-class citizens for search • Consider the topology of the area in which you are searching • Employ navigational aids

  13. e d * a b e c e d * a b d * a b c c Context centric approach

  14. The information value of a trail is higher than the sum of it parts!

  15. Our approach • Provide information retrieval of the highest quality and in addition, • Find out what is beyond the most relevant pages by ‘exploring the area’ • Present users with precise and relevant trails • Provide navigation assistance within the UI

  16. NavZone user interface

  17. NavZone Usability Study First Monday paper Task – find answers to 5 types of questions Fact Finding – What are the term dates? Judgement – Is CSIS a “good” place to do research? Fact Comparison – Which train station is closest to the college? Judgement Comparison – Is the research in deptA better than that in deptB? General Navigational – How do you get to the checkout?

  18. NavZone vs. Google and Compass % of subjects, 4+ questions correct • 59% Google • 75% Compass • 83% NavZone

  19. Average # clicks to complete task • 44 Google • 40 Compass • 27 NavZone NavZone is bandwidth “green” !

  20. Average time taken per task (min) • 18 Compass • 17 Google • 13 NavZone Wilcoxon Test - Statistically Significant

  21. user interface BestTrail crawler indexer trail engine web graph user interface robot Parser HTML, XML, PDF, PostScript, Word, Other BestTrail postprocessor web graph generic format inverted file The main ingredients

  22. Under Development • Alternative User-Interfaces • Seamless integration with relational databases and file systems • Data mining and personalisation • Mobile/PDA support

  23. Open Problem • How do we make use of statistical regularities that are present in the web to improve search and navigation? • See, Levene et al. A stochastic model for the evolution of the web., Condensed Matter Archive, cond-mat/0110016, 2001- many distributions related to the web graph follow a power law

More Related