Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Presented by: PowerPoint Presentation
Download Presentation
Presented by:

Presented by:

84 Vues Download Presentation
Télécharger la présentation

Presented by:

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Dynamic Sample Selection for Approximate Query ProcessingBrian Babcock, Surajit Chaudari & Gautam Das Presented by: Mariam John CSE 6392 02/14/2006

  2. Contents • Introduction • Dynamic Sample Selection • Policies for Sample Selection • Small Group Sampling • Pre-Processing Phase • Summary

  3. Why do we do Approximate Query Processing? • Multi-gigabyte data repositories • Data Analysis Application • Data mining • Decision Support Analysis • Fast query response time • Acceptability of inexact query response

  4. Problem • Constructing an optimal sample that well represents the underlying data. • Uniform sampling • Non-uniform sampling

  5. Non-uniform sampling • Purpose is to produce more accurate results across a particular set of queries. • Produces more approximate results than uniform sampling. • Optimal bias differs from query to query.

  6. Dynamic Sample Selection SAMPLE DATA DATA SAMPLE SAMPLE ? ? SAMPLE SAMPLE Dynamic Sample Selection Standard Sampling

  7. Dynamic Sample Selection • Pre-Processing Phase Query Workload Sample Data Select Strata Build Sample Data Meta- Data

  8. Dynamic Sample Selection • Runtime Phase Query Sample Data Choose Samples Rewrite Query Meta- Data

  9. Dynamic Sample Selection • How to identify the set of biased samples to be created? • Occurs during pre-processing phase • How to determine which of the various samples to use to answer a query? • Occurs during runtime phase • Simplest and most efficient strategy is when choice of samples is guided by the syntax of incoming query.

  10. Small Group Sampling • Specific dynamic sample selection technique which targets aggregate queries with “group-by’s”. • Small group sampling approach: • Overall sample – perform uniform sampling on large groups. • Small group tables-one or more sample tables for smaller groups.

  11. Small group Sampling • Set of small groups depends on: • grouping columns • selection predicates

  12. Small Group Sampling Idea behind Small Group Sampling: • Determine for which values in each column to create small group tables. • Create small group tables for each column of a table along with the overall sample. • During runtime, choose a subset of sample tables to answer a query most accurately. • Query is rewritten to run against the sample tables instead of the base tables.

  13. Pre-processing Phase • For every column, identify the rare values within it and create small group tables. • Pre-processing phase produces three outputs: • Overall sample table • Small group tables • Metadata table

  14. Pre-processing phase • Rows can appear in multiple sample tables. • Bitmask field is used to identify the set of sample tables to which a row was added. • Avoids double counting of rows assigned to multiple sample tables.

  15. Summary • Dynamic Sample Selection • Takes advantage of available disk space • Creates multiple biased sample tables during the pre-processing phase • Picks best samples during runtime for query processing. • Small Group Sampling • Notion is to treat large and small groups differently • Creates an overall sample table for large groups and a number of small group tables for each rare values in each column.