Dynamic Sample Selection for Efficient Approximate Query Processing

Dynamic Sample Selection for Approximate Query ProcessingBrian Babcock, Surajit Chaudari & Gautam Das Presented by: Mariam John CSE 6392 02/14/2006

Contents • Introduction • Dynamic Sample Selection • Policies for Sample Selection • Small Group Sampling • Pre-Processing Phase • Summary

Why do we do Approximate Query Processing? • Multi-gigabyte data repositories • Data Analysis Application • Data mining • Decision Support Analysis • Fast query response time • Acceptability of inexact query response

Problem • Constructing an optimal sample that well represents the underlying data. • Uniform sampling • Non-uniform sampling

Non-uniform sampling • Purpose is to produce more accurate results across a particular set of queries. • Produces more approximate results than uniform sampling. • Optimal bias differs from query to query.

Dynamic Sample Selection SAMPLE DATA DATA SAMPLE SAMPLE ? ? SAMPLE SAMPLE Dynamic Sample Selection Standard Sampling

Dynamic Sample Selection • Pre-Processing Phase Query Workload Sample Data Select Strata Build Sample Data Meta- Data

Dynamic Sample Selection • Runtime Phase Query Sample Data Choose Samples Rewrite Query Meta- Data

Dynamic Sample Selection • How to identify the set of biased samples to be created? • Occurs during pre-processing phase • How to determine which of the various samples to use to answer a query? • Occurs during runtime phase • Simplest and most efficient strategy is when choice of samples is guided by the syntax of incoming query.

Small Group Sampling • Specific dynamic sample selection technique which targets aggregate queries with “group-by’s”. • Small group sampling approach: • Overall sample – perform uniform sampling on large groups. • Small group tables-one or more sample tables for smaller groups.

Small group Sampling • Set of small groups depends on: • grouping columns • selection predicates

Small Group Sampling Idea behind Small Group Sampling: • Determine for which values in each column to create small group tables. • Create small group tables for each column of a table along with the overall sample. • During runtime, choose a subset of sample tables to answer a query most accurately. • Query is rewritten to run against the sample tables instead of the base tables.

Pre-processing Phase • For every column, identify the rare values within it and create small group tables. • Pre-processing phase produces three outputs: • Overall sample table • Small group tables • Metadata table

Pre-processing phase • Rows can appear in multiple sample tables. • Bitmask field is used to identify the set of sample tables to which a row was added. • Avoids double counting of rows assigned to multiple sample tables.

Summary • Dynamic Sample Selection • Takes advantage of available disk space • Creates multiple biased sample tables during the pre-processing phase • Picks best samples during runtime for query processing. • Small Group Sampling • Notion is to treat large and small groups differently • Creates an overall sample table for large groups and a number of small group tables for each rare values in each column.

Dynamic Sample Selection for Efficient Approximate Query Processing

Dynamic Sample Selection for Efficient Approximate Query Processing

Presentation Transcript

Presented by N.Fil OKB GIDROPRESS

Presented By:

Presentation On Warid Telecom Presented To: Sir Imran Hanif Presented By: Muhammad Taha Khan

This presentation on buying a Home is presented to you by DFW Metro Housing

Presented by: James Nelson

Financial Literacy

FINANCIAL SECURITY IN AN INSECURE WORLD Presented to : Main Line Assoc. for Continuing Education June 21, 2012 Presente

Presented by

Presented by:

Presented by Daniel Williams, P.E. May 1, 2008

LOG 103

Welcome

This presentation on buying a Home is presented to you by DFW Metro Housing

§483.65 Infection Control (F441) Update Presented at

This presentation on buying a Home is presented to you by DFW Metro Housing

The following slides were presented at the GDC 2003 roundtable:

PRESENTED BY:

PRESENTED BY DR SANDEEP.R

Presented By CA Swatantra Singh, B.Com , FCA, MBA

Presented by: Cal Macy Project Director Pete Sparks Project Coordinator

Kleene’s Theorem

presented by David Burns