Quicklink Selection for Efficient Website Navigation

Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

What are quicklinks Result Website Quicklinks

Quicklinks Result Website • Quicklinks = URLs within the search result website • Enable fast navigation to important parts of the website • Which URLs should be QLs? Quicklinks

Quicklink Selection • Some obvious strategies don’t work very well • Top clicked URLs in search engine • URL may have low relevance in the QL context • lib.utexas.edu/maps is popular for searches on “maps” and not for searches on “Univ. of Texas” • URL may be too specific: • automobiles.honda.com/civic-hybrid/exterior-photos.aspx for honda.com • URL popularity be time sensitive: • nytimes.com/election-guide/2008/ for nytimes.com

Quicklink Selection • Some obvious strategies don’t work very well • Top clicked URLs in search engine • Top visited URLs intoolbar data • May not relate to search activity:e.g., for nytimes.com • #3 is nytimes.com/mem/emailthis.html • #6 isnytimes.com/auth/login • #8 isnytimes.com/gst/regi.html

Quicklink Selection • Some obvious strategies don’t work very well • Top clicked URLs in search engine • Top visited URLs in toolbar data • Top URLs from analysis of hyperlink graph • Ignores preferences of search users • Toolbar data is more representative • Heavily tagged URLs (e.g., del.icio.us/digg) • Low coverage: Too few websites

Quicklink Selection • Need a combined approach • Search logs • Toolbar data • Web-server logs • Website hyperlink graph • User tags This paper

Related Work • Sitemap generation [Perkowitz+/00] • Detection of hard-to-find URLs [Srikant+/01] • Improving website navigability [Doerr+/07] • Mining Web usage patterns [Buchner/99, Cadez+/03] • BrowseRank [Liu+/08] • Post-search browsing behavior [Bilenko+/08] We focus on QLs in the context of Search

Outline • Motivation and Related Work • Problem Formulation • Proposed Solution • Experiments • Conclusions

Problem Formulation • Which k URLs should be QLs? “The greatest good for the greatest number” • QLs save clicks • Maximize the total number of clicks saved using at most k QLs • But when exactly is a click “saved”?

Problem Formulation • When does a QL get clicked by the user? Say we pick this node as a QL nasa.gov Hubble telescope Photos Graph of click trails (Toolbar data)

Problem Formulation Say we pick this node as a QL nasa.gov Hubble telescope Photos Graph of click trails (Toolbar data) Assumption:The user recognizes if SearchResult  QL  Destination

Problem Formulation nasa.gov (saves 1 click each) Say we pick this node as a QL Graph of click trails (Toolbar data) Assumption:The user recognizes if SearchResult  QL  Destination

Problem Formulation nasa.gov (saves 1 click each) (saves 0) Say we pick this node as a QL (saves 0) (saves 2 clicks each) Total savings = 1*3 + 2*2 = 7 clicks Graph of click trails (Toolbar data) Assumption:The user recognizes if SearchResult  QL  Destination

Problem Formulation • However… • Unknown pages might become QLs lyrics.com These could become the “best” QLs … A B C Z

Problem Formulation • However… • Unknown pages might become QLs • Automatic-redirect pages might become QLs: • nytimes.com forces logging in • aaa.com forces zipcode entry • We need QLs that are “noticeable” in a search context

Problem Formulation • How can we estimate noticeability? • Via Search click-logs • Noticeability of a URL u: • User notices a useful QL with probability α(u) Tuning param(≈ 2) Fraction of search clicks for u on website

Problem Formulation nasa.gov # trailprob#clicks saves 2 x α1 x 2 saves 1 x α1 x 1 saves 2 x (1-α2)α1 x 1 saves 2 x α2 x 2 Total = 5α1 + 4α2 + 2(1-α1)α2 ? (saves 0) QL1 (saves 0) QL2 Assumption:The user picks the best QL that he/she notices

Problem Formulation nasa.gov # trailprob#clicks saves 2 x α1 x 2 saves 1 x α1 x 1 saves 2 x (1-α2)α1 x 1 saves 2 x α2 x 2 Total = 5α1 + 4α2 + 2(1-α1)α2 (saves 0) QL1 (saves 0) QL2 If only QL1 is perfectly noticeable (α1=1, α2=0): Total = 7 clicks (as if 1 QL only) If both QLs are perfectly noticeable (α1=1, α2=1): Total = 9 clicks

Problem Formulation • Which k URLs should be QLs? • Maximize the expected number of clicks saved using at most k QLs • while incorporating “noticeability”

Algorithms • Maximize expected number of saved clicks using k QLs  NP-Hard • Theorem: This objective is non-decreasing submodular • Non-negative • Adding QLs never hurts • “Diminishing Returns” u Marginal improvement to superset S’ Marginal improvement to set S

Algorithms • Greedy algorithm: Iteratively pick QLs that increase the number of saved clicks the most • Within a factor (1-1/e) of OPT[Nemhauser+/’78]

Algorithms • However… • Inhomogeneous results: QLs for ea.com are • fifa08.ea.com • battlefield.ea.com • 6 webpages deep inside thesim2.ea.com • Redundant results: QLs for senate.gov include • obama.senate.gov • obama.senate.gov/about • obama.senate.gov/contact • obama.senate.gov/votes Two games made by EA Parent URL makes the child URLs redundant

Algorithms • Both can be specified as pairwise constraints on URLs allowed to belong to a QL set • Pairwise-constrained QL selection isNP-hard. • Two-step process: • Heuristically find a large subset of trails that form a tree • Enforce constraints on tree • Dynamic program  optimal on tree

Experiments • Baseline Methods • TopClicked: • URL score = # search clicks on URL • TopVisited: • URL score = # occurrences on toolbar trails • PageRank: • Build a weighted graph on URLs, where weight(i,j) = # trails using the ij edge • URL score = PageRank on this graph

Experiments • Live Traffic dataset • Computed CTRs on QLs currently displayed by Yahoo! (1043 website subset) • Measure: • Pick two equal-sizes subsets of QLs • Use sum-of-scores and sum-of-CTRs to predict the better subset • Measure how often the predictions match

Live Traffic Data Experiments Fraction of subset-pairs where predictions agree with live traffic Subset sizes QL-ALG > TopVisited > PageRank > TopClicked

Experiments 100 80 • Tree-structured trails • Most dropped trails are very short • Tree-structured trails improve accuracy 60 Number of trails dropped 40 20 0 1 10 100 1000 10000 Length of trail Distribution of dropped trails Live Traffic prediction quality comparison

Conclusions • Proposed a formulation for the QL selection problem • Both toolbar and search logs are used intuitively • Proposed two algorithms: • Greedy: (1-1/e)-optimal • Tree-structured: empirically better • Improvement of 22% over competing baselines

Quicklink Selection for Efficient Website Navigation

Quicklink Selection for Efficient Website Navigation

Presentation Transcript

NAVIGATIONAL DUTIES

Dynamic Sample Selection for Approximate Query Processing

Ranking of Database Query Results

PLC Selection Criteria and Results

Airplane Navigational Tools

Holistic Optimization by Prefetching Query Results

Navigational Lighting Update

Dynamic Sample Selection for Approximate Query Processing

The Min-dist Location Selection Query

Navigational materials

Navigational Bronchoscopy

Query Caching and View Selection for XML Databases

Automatic Categorization of Query Results

QuickLink

Intuitive Database Query System, Zooming Query Results Previews

Selection of ZEUS results

Navigational Safety

Navigational Plans For Data Integration

Efficient Computation of Diverse Query Results

PLC Selection Criteria and Results

Quicklink Selection for Navigational Query Results