Enhancing Parametric Search with Continuous Querying for Interactive Exploration
190 likes | 290 Vues
Explore a new search paradigm merging searching and browsing, enabling interactive exploration. Implementations details, issues with current paradigms, architecture overview, rendering methods, and additional features like ranking and fuzzy restrictions are discussed.
Enhancing Parametric Search with Continuous Querying for Interactive Exploration
E N D
Presentation Transcript
Searching E-Commerce Data J. Shafer, R. Agrawal : WWW-9
Outline • Current parametric-search paradigm • New paradigm • Implementation details
Current Paradigm • User enters search criteria into HTML form • User’s query is received by web-server and submitted to a server-side database (usually as SQL) • Result set is returned to user as an HTML page (or a series of pages) • Examples: Schwab, Expedia
Problems • Users often don’t know exactly what they are looking for • Unfamiliar with domain and/or database • Unable to compose precise queries • Too many/few results: try different query • Each query change must be sent back to the server and evaluated (often as a new query)
Source of Problems • Database query technology is targeted as reporting rather than exploration • The query itself is the goal • The results are “interesting” regardless of their size • In user-exploration, the end goal is typically to find one or two particular records of interest • Search is a process, not a single operation • An individual query is simply a means to an end
What is needed? • Combine searching with browsing • Replace submit/response metaphor with “continuous” querying that allows interactive exploration
Observations • Data must be cached on client side • There is always a notion of state • There is only one mouse • User can only see those records currently displayed on screen
Architecture Overview Eureka ListRenderer DataPump DataGroup HTTP DataColumn #1 DataColumn #N . . . client server JDBC Servlet Database
Example Dataset: Used Cars Make Distance Make Distance rid DataPump
Numeric DataColumns Distance rid RID List Data
Categorical DataColumns Make Data rid RID List value: Ford count: 3 index: 4 value: Honda count: 4 index: 7 hashtable value: Chrysler count: 2 index: 2 value: BMW count: 2 index: 0
ListRenderer • Only paint as many rows as fit on the canvas • Repaint canvas whenever: • scrollbar position changes • sort order changes • records appear/disappear from query results • Restrictions array indicates whether or not a particular row should be painted (count != 0)
Numeric Restrictionsmax(distance) = 100 Restrictions RID List Data lowerIndex upperIndex
Categorical RestrictionsMake != { Ford } Restrictions RID List Data RID List Data value: Ford count: 3 index: 4 value: Honda count: 4 index: 7 value: Chrysler count: 2 index: 2 value: BMW count: 2 index: 0
Rendering the List(sorted by Distance) Restrictions RID List Data ListRenderer rid
Rendering the List(sorted by Make) Restrictions RID List Data value: Ford count: 3 index: 4 ListRenderer rid value: Honda count: 4 index: 7 value: Chrysler count: 2 index: 2 value: BMW count: 2 index: 0
Additional features • Restriction by example • Ranking • Fuzzy restrictions
Final Comments • Used Eureka in several situations with very positive feedback • Used Eureka on datasets with 100K records with no visible deterioration in performance • Performance is excellent, even in Java