1 / 49

User Interaction

User Interaction. CS511 Presentation: Dec-02-2005 Jesus Alvarez Jianwen Lai James Lin Long Vu. Road Map. Introduction & Motivation Application scenarios for Online Query Processing CONTROL online query processing Implementation Study of CONTROL Summary Future work. Road Map.

Télécharger la présentation

User Interaction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. User Interaction CS511 Presentation: Dec-02-2005 Jesus AlvarezJianwen LaiJames LinLong Vu

  2. Road Map • Introduction & Motivation • Application scenarios for Online Query Processing • CONTROL online query processing • Implementation • Study of CONTROL • Summary • Future work

  3. Road Map • Introduction & Motivation • Traditional user interaction • Drawbacks of Current Tech • Batch vs. Online Processing • CONTROL project introduction • Application scenarios for Online Query Processing • CONTROL online query processing • Implementation • Study of CONTROL • Summary • Future work

  4. Introduction • Data Analysis in OLAP • To know so much and have control over nothing • Date storage increases fast • Complex process involving multiple time-consuming steps, flawed human-computer interaction • Common Properties of Data Analysis • Performance: time-consuming • Non-trivial techniques • Multi-step process

  5. Traditional User Interaction • Properties • Black box: user inputs -> wait -> outputs • Batch process: long waiting time • Static charts: no changes in processing • Tools and Approaches • User-driven SQL and OLAP systems • Machine-automated data mining • Iteration of analysis with natural human mode of interaction • Hybrid approaches • Algorithms • Optimized completion time

  6. Trends in User Interaction with Data Visualization • Dynamic, online visualizations and user interacting • Chart Types • Level of User Interaction with Visualization • Size and Complexity of Data Structures Represented by Vis-ualization • OLAP with Style Report (SR) • Traditional OLAP software is separated from reporting • SR’s OLAP analysis is designed as an integrated, built right inside SR reporting engine • Minimum IT support

  7. Data Visualization Life-Cycle Stages • Three life-cycle stages • Maturing • Evolving • Emerging

  8. Sample User Interacting OLAP Interface – Step 1 Drill down by the blue text to bring to next slide

  9. Sample User Interacting OLAP Interface – Step 2

  10. Drawbacks of Current Technology • Only exact answers are available • A losing proposition as data volume grows • Hardware improvements not sufficient • Interactive systems fail on massive data • E.g., spreadsheet programs (64Krow limit) • DBMS not interactive • No user feedback or control (“back to the 60’s”) • Long processing times • Fundamental mismatch with preferred modes of HCI • OLAP: a partial solution • Can’t handle large data sets

  11. Online Traditional Goals for Online Processing • New “greedy” performance regime • Need data defined on-the-fly • Therefore need FEEDBACK and CONTROL 100%  Time

  12. Batch vs. On-Line Processing • Batch Processing • Gives 100% accurate answers, but users must wait for entire query to finish . . . • On-Line Processing • Gives progressively refining answers as the query runs! • Allow users to control processing. • Applications of On-Line Processing • Large, ad-hoc queries in domains where approximate answers are acceptable (“big picture”)

  13. CONTROL • Continuous Output and Navigation Technology with Refinement On Line • Online Query Processing: Interactive behavior for SQL aggregation queries.  • CLOUDS: On-Line Interactive Visualization and Exploration.  • Potter's Wheel: An interactive framework for data analysis, cleaning and transformation. • CARMA: Continuous Association Rule Mining Algorithm • At Bekerley University, leaded by Prof. Hellerstein, 8 people, 12 papers – 6 about Online Query Processing

  14. CONTROL • Interactive Data Analysis • Online mode: user can control system at all times • A crystal ball: “see into” online processing, use that information, can change by changing processing • Balance performance goal • Minimizing uneventful “dead time” between update for users • Simultaneously maximizing the rate at which partial/approximate answer approach a correct answer

  15. Data Warehouse • The World's Biggest Data WarehousesOrganization/Data warehouse size (GB)/Vendor 1. France Telecom/29,232/Oracle 2. AT&T Labs/26,269/AT&T 3. SBC/24,805/Teradata 4. Anonymous/16,191/IBM 5. Amazon.com/13,001/Oracle 6. Kmart/12,592/Teradata 7. Claria/12,100/Oracle 8. Health Insurance Review Agency/11,942/Sybase 9. FedEx /9,981/Teradata 10. Vodafone/9,108/Teradata • Vendors • Oracle - 100 TERABYTE, the world’s largest DW, on Sep. 14, 2005 – Yahoo!  • NCR Teradata • Sybase • AT&T • Netezza • IBM Informix - NOT really! • Microsoft - NOT at all! Source: Winter's TopTen Program

  16. Online Query Processing in Informix • Online Query Processing • Issue a SQL with online processing ability • See results immediately • Adjust processing as the query runs • Algorithm • For stream of output • Cases • Aggregation query • Enumeration query

  17. Road Map • Introduction & Motivation • Application scenarios for Online Query Processing • Aggregation • Visualization • Enumeration • Data mining • CONTROL online query processing • Implementation • Study of CONTROL • Summary • Future work

  18. Additional Features: Speed up Slow down Terminate Online Aggregation • SELECT AVG(GPA) FROM Colleges GROUP BY College

  19. On-Line Visualization: CLOUDS CLOUDS displays an approximation of an image based on data while the data is being fetched Conventional Algorithm CLOUDS Algorithm CLOUDS (with Index) Note that CLOUDS predicts the high density of cities in the Midwest

  20. Online Enumeration • Potter’s Wheel • Scalable spreadsheet • A fraction of data is materialized in GUI • Scrolling = preference for data delivery • Permit “fuzzy” querying • Interactive data cleaning • Online structure and discrepancy (error) detection

  21. Scalable Spreadsheets

  22. Visual Transformation Shot Changes applied for columns while processing query online

  23. Data mining • Visualize immediate results in mining process • Display results of data mining operations: roll up, drill down, dice, etc • Decrease time and space complexity: user may change their query based on immediate results

  24. Road Map • Introduction & Motivation • Application scenarios for Online Query Processing • CONTROL online query processing • Sampling (one relation) • Re-ordering (one relation) • Ripple join (multiple relations) • Implementation • Study of CONTROL • Summary • Future work

  25. Sampling • Motivation • In most scenarios, user like to see representatives ASAP • It is suitable for aggregation: AVG, SUM, etc • Granularity of sample • Instance-level (row-level): high I/O cost • Block-level (page-level): high variability from clustering • Type of sample • Often simple random sample • Especially for on-the-fly • With/without replacement usually not critical • Data structure from which to sample • Files or relational tables • Indexes (B+ trees, etc)

  26. Sampling con’t • Random sampling is simply scanning a table in certain scenarios • Informix: guarantee random delivery by storing tables in random order • Drawbacks: • Every scan of tables generates samples in the same order • Tables stored in random order make things difficult to manage the RDMS over the time

  27. T S R join transmit disk produce process consume reorder Online Reordering • Deliver “interesting” items first • User can get satisfactory results first • “Interesting” determined on the fly: user could change their preferences by selecting new items • Exploit rate gap between produce and process/consume

  28. Ripple join algorithm • Online requirements • Never wait for anything to finish • Display immediate results • Interact with users • Joins and pipelining • Sort-merge join is blocking • Hybrid hash join generate outputs but a fraction of its input • Nested-loop join is pipelining but slow (especially when inner relation is huge)

  29. R 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 S 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 Traditional Nested Loops  Ripple join algorithm

  30. Ripple join algorithm • Designed for online performance goals • Completely pipelined • Adapt to data characteristics • Symmetric join • Simplest version • Read new tuples s from Sand r from R • Join rands • Join r with old Stuples • Join s with old Rtuples

  31. x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Block Ripple Joins (Size = 2) R S

  32. x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Rectangular Ripple Join R S

  33. Related Work on Online QP • Morgenstein’s PhD, Berkeley ’80 • Online Association Rules • Ng, et al’s CAP, SIGMOD ’98 • Hidber’s CARMA, SIGMOD ‘99 • Implications for deductive DB semantics • Monotone aggregation in LDL++, Zaniolo and Wang • Online agg with subqueries • Tan, et al. VLDB ’99 • Dynamic Pipeline Scheduling • Urhan/Franklin VLDB ’01 • Pipelining Hash Joins • Raschid, Wilschut/Apers, Tukwila, Xjoin • Relation to semi-naive evaluation • Anytime Algorithms • Zilberstein, Russell, et al.

  34. Road Map • Introduction & Motivation • Application scenarios for Online Query Processing • CONTROL online query processing • Implementation • Study of CONTROL • Summary • Future work

  35. Implementation • Client-server interfaces • APIs must support interactivity between client and server • Output must go beyond relational results • Users should provide inputs while query is running • Output API • Make cursor available instantly • Allow multiple tuples on same data (convergence to conf.level) • Input API • User can speed up, slow down or pause groups of records • Pacing • Consider fast producer (server), slow consumer (client) • Skip factor k  after how many tuples estimates are updated

  36. Implementation • Online Query Operators • Informix is divided into a physical storage manager (RSAM) and logical query optimizer/execution engine • Online operators implemented above RSAM to reduce complexity • Online Reorder • Index Stride would be more efficient reading a block of tuples but RSAM interface is tuple-at-a-time • Ripple Join • Multiple scans assume tuples in same order • Cache implemented to replay inputs in the same order • Online query end quickly, cache should not exceed memory

  37. Implementation • Constructing Online Query Plans • Minimize time to “reasonable” accuracy, not time to completion • Access Method Selection • Sequential scan vs Index Stride • If there is a clustered index on the GROUP BY column, use Index Stride, otherwise use sequential scan with online reorder • Join Ordering • Ripple joins are symmetric, only need to tune aspect ratio • Find query plan like RDMS then inserting reorder operators and converting some joins to use ripple joins.

  38. Implementation

  39. Implementation • Beyond simple select-project-join queries • ORDER BY, HAVING • Implemented on the client instead of the server • Client must filter groups using the HAVING clause, at times during the online query a group may meet or not meet the clause SELECT college, AVG(grade) FROM enroll GROUP BY college HAVING AVG(grade) > 3.0; • Subqueries and other expensive predicates • Current CONTROL implementation does not address online processing with subqueries or user defined functions

  40. Road Map • Introduction & Motivation • Application scenarios for Online Query Processing • CONTROL online query processing • Implementation • Study of CONTROL • Summary • Future work

  41. Study of CONTROL in Action • Test CONTROL algorithms on sample real life scenarios • Use “chronons”- time units - to avoid Informix comparisons • Scenario: Analyst needs average price of all orders • SELECT ONLINE AVG(o.totalprice), CONFIDENCE_AVG(o.totalprice) FROM order o; With a 276 MB table, CONTROL produced results with 2% confidence within 12 chronons. A non online version would take 1490 chronons (124X more!)

  42. Road Map • Introduction & Motivation • Application scenarios for Online Query Processing • CONTROL online query processing • Implementation • Study of CONTROL • Summary • Future work

  43. Summary • Data analysis is time-consuming, involve Non-trivial techniques, and complex multi-step • Traditional user interaction is a black box, batch process, and static • Trends to be more dynamic, online visualizations and user interacting • CONTROL algorithms are suitable for Online Query Processing

  44. Summary • Some grouping and aggregation predicates are processed better at the client instead of the server • Test results show online queries can produce useful results 75 to 600 times faster than relational queries. Online reordering allows users to change query priorities “on the fly”. • There are many issues and future work

  45. Road Map • Introduction & Motivation • Application scenarios for Online Query Processing • CONTROL online query processing • Implementation • Study of CONTROL • Summary • Future work

  46. Future Work • Extending an SQL engine to provide online query processing with complex queries • Works are pertinent outside the specific domain of online query processing: API, new access methods, composing algorithms into large query plans • Interface: Online enumeration and visualization

  47. Observe Environment Act Make Decision Looking Forward: Adaptive Systems • Observation/Decision • Already critically important in today’s systems

  48. Thanks for listening Some slides taken from Prof. Hellerstein

More Related