Advanced Techniques in Database Query Optimization: Search Space and Cost Models
This tutorial explores advanced concepts in database query optimization, focusing on the structure of search spaces and effective evaluation functions. It explains how to optimize queries by exploring candidates and employing methods like semantic rewriting and subquery flattening. The content also covers cost models, assumptions regarding data distribution, and factors affecting selectivity. Key topics include dynamic programming optimization, histogram techniques, and the implications of relational algebra equivalences on query plans.
Advanced Techniques in Database Query Optimization: Search Space and Cost Models
E N D
Presentation Transcript
CS3223 Tutorial 8 Fan Qi fan.qi@nus.edu.sg Database Lab 1, com1 #01-08
Query Optimization • Similar to other optimization • Optimize by search • Two main factor: • Search Space: how many candidates • Evaluation function : how to judge which candidate is good Search Space Query Plan
Query Optimization • Search Space Enumeration • Relational Algebra Equivalences • Query rewrite: • View rewrite • Semantic rewrite • Subqueryflatterning
Query Optimization • Evaluation function: • A.K.A cost model • Evaluate query plan based on statistical information • Assumptions: • Independence, Uniform distribution, inclusion assumption • Reduction Factor • Selectivity • Histogram: • Equiwidth • Equidepth • Number of Rows • Per Column Distinct Value • Per Column Value Distribution • Combined Column Value Distribution • Available Index • ...
Query Optimization • What if we know both ? • Dynamic programming optimization • Optimal Substructure If Q is optimal, all its subquery are optimal B A D C
Question 1 (A) Cost of ? First Join: Second Join:
Question 1 (B) Cost of ? 137 Cost of ? 180 Cost of ? 97
Question 1 (C) Implies ? A A U A B
Question 2 • Equiwidth histogram with 3 buckets • 9 distinct value • Each bucket has 3 distinct values
Question 3 Table Scan data pages Clustered B+ tree (age, sal) : leaf page Height: Access Methods: Hash Index on (eid) : page (uniform distribution) Unclustered B+ tree (sal) : leaf page Height: Hash Index on (age) : pages (uniform distribution)
Question 3 (a) Table Scan tuples Clustered B+ tree (age, sal) Access Methods: Hash Index on (eid) Not Applicable Unclustered B+ tree (sal) 20,000 Not Applicable Hash Index on (age)
Question 3 (b) Table Scan tuples Clustered B+ tree (age, sal) Access Methods: Hash Index on (eid) Not Applicable Unclustered B+ tree (sal) Not Applicable Hash Index on (age) 2,000 + 20,000
Question 3 (c) Table Scan tuples Clustered B+ tree (age, sal) Access Methods: Hash Index on (eid) Not Applicable Unclustered B+ tree (sal) Not Applicable Hash Index on (age) Not Applicable
Question 3 (d) Table Scan tuple Clustered B+ tree (age, sal) Access Methods: Hash Index on (eid) 2,000 + 1 Unclustered B+ tree (sal) Not Applicable Hash Index on (age) Not Applicable
Question 3 (e) Table Scan tuple Clustered B+ tree (age, sal) Access Methods: Hash Index on (eid) Not Applicable Unclustered B+ tree (sal) Hash Index on (age) Not Applicable
Question 3 (f) Table Scan tuple Clustered B+ tree (age, sal) Access Methods: Hash Index on (eid) Not Applicable Unclustered B+ tree (sal) Hash Index on (age) 2,000 + 20,000
Question 3 (g) Table Scan tuple Clustered B+ tree (age, sal) Access Methods: Hash Index on (eid) Not Applicable Unclustered B+ tree (sal) Hash Index on (age) Not Applicable
Question 3 (h) Table Scan tuples Clustered B+ tree (age, sal) Access Methods: Hash Index on (eid) Not Applicable Unclustered B+ tree (sal) Not Applicable Hash Index on (age)
Thank you! Happy Good Friday & Easter !