1 / 9

CONTROL:

CONTROL:. CONTROL group Joe Hellerstein, Ron Avnur, Christian Hidber, Bruce Lo, Chris Olston, Vijayshankar Raman, Tali Roth, Kirk Wylie, UC Berkeley. Continuous Output and Navigation Technology with Refinement On-Line. Batch vs. On-Line Processing. Batch Processing

penda
Télécharger la présentation

CONTROL:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CONTROL: CONTROL group Joe Hellerstein, Ron Avnur, Christian Hidber, Bruce Lo, Chris Olston, Vijayshankar Raman, Tali Roth, Kirk Wylie, UC Berkeley Continuous Output and Navigation Technology with Refinement On-Line

  2. Batch vs. On-Line Processing • Batch Processing • Gives 100% accurate answers, but users must wait for entire query to finish . . . • On-Line Processing • Gives progressively refining answers as the query runs! • Allow users to control processing. • Applications of On-Line Processing • Large, ad-hoc queries in domains where approximate answers are acceptable (“big picture”)

  3. estimate Demo Outline • On-Line Aggregation • Refining estimates • Statistics give confidence • User Control • The user can speed up the processing of certain groups • The user can stop the processing at any time • On-Line Visualization • Displays an approximation of an image based on data while the data is being fetched • Shows the estimated density and distribution of data

  4. On-Line Agg.: Query Processing • New Access Methods • Randomly delivered data. • Index Striding • We can take advantage of B-Trees to access the groups • Heap Striding • More generally, on-line permutation • Non-blocking Join Algorithms • Ripple Join Family • RIPL = Rectangles of Increasing Perimeter Length • Join progressively larger samples of two tables

  5. AAABABACDCDAAA... ABCDABCDABCD... Heap File Fair Sample Output Access Methods for On-Line Agg. • Index Stride • Round-robin through the groups to get a fair sample • Works with an index on the grouping column • Heap Stride (On-Line Permutation) • Reorder tuples on the fly to get a fair sample

  6. R R S S Ripple Traditional Multi-Table On-Line Aggregation • Progressively refining join: Ripple Join • Ever-larger rectangles in R  S • Comes in naive, block, and hash flavors • Benefits: • sample from both relations simultaneously • gives better statistical confidences much faster • intimate relationship between delivery and estimation

  7. On-Line Aggregation User Interface Estimates for Each Group User Controls Graph of Estimates w/Confidence Intervals

  8. On-Line Visualization: CLOUDS CLOUDS displays an approximation of an image based on data while the data is being fetched Conventional Algorithm CLOUDS Algorithm CLOUDS (with Index) Note that CLOUDS predicts the high density of cities in the Midwest

  9. Quantifying the benefit of CLOUDS CLOUDS gives a better approximate image faster than the conventional algorithm Conventional Error CLOUDS Time (seconds)

More Related