Query Optimization Allison Griffin
Importance of Optimization • Time is money • Queries are faster • Helps everyone who uses the server • Solution to speed lies in the algorithm • Different performance improvements with different database engines and schemas
Brief History • Before 1970’s: Dark days, manual optimization • Late 70’s to mid 80’s: • Birth of relational data model and declarative SQL • Optimization is job of system • System R-beginning work on join order optimization • Dynamic Programming: Heuristic Optimizers • Mid 80’s to early 90’s: • Extensible query optimization (Exodus) • Mid 90’s to late 90’s: • Materialized Views
Volcano Extensible Query Optimizer Generator • General purpose cost based query optimizer, based on equivalence rules in algebra • Equivalences: join associativity, select push down, aggregate push down • Extensible: new operations and equivalences can be easily added • Developed by Graefe and McKenna 1993
Materialized Views • Can materialize (pre-compute and store) views to speed up queries • Incremental maintenance • when database is updated, propagate updates to materialized view without complete re-computation • Deciding when to use materialized views • even if query does not refer to materialized view, optimizer can figure out it can be used
Deciding What to Materialize • Maintenance cost and query cost • Workload depends on what is materialized: • queries and update transactions • weights for each component of workload • Goal: find set of views that gives minimum cost if materialized, subject to space constraints
What we already know… • Query optimizer analyzes set of query execution plans and gives optimal (least cost) • Heavily dependent on optimizer’s estimate for number of rows that will result at each step of QEP • Estimates rely on statistics typically stored in histograms
Recent Approaches to Improve Statistics • Paper “Distinct-Value Synopses for Multiset Operations” by Kevin Beyer, Rainer Gemulla, Peter J. Haas, Berthold Reinwalk, and Yannis Sismanis, 2007 • IBM’s LEO (Learning Empirical Results in Query Optimization), 2001
Summary of Paper Results • Addresses the problem of efficient estimate of number of distinct values of an attribute • Builds on leveraging of randomized algorithms • Claim to have unbiased estimator for distinct values with lower mean squared error • Past attempts tend to by higher than the actual number so they have come up with way to cut that number down to be more reasonable
Distinct-Value Estimation • Propose summary structure (synopsis) for a relation • Synopsis can be used to estimate number of DVs in the partition • Synopses can be combined to create synopses for compound partitions created from base partitions using multiset union, intersection or difference operations • Updates can be performed on compound partitions by using synopses from base relations
LEO - Learning Emperical Results in Query Optimization • Autonomic feedback loops that create a self-tuning database query optimizer • Self-validates and adjusts to improve query optimization and execution without requiring user interaction to repair incorrect statistics or cardinality estimates • Reduces the total cost of owning database management systems by simplifying database administration
How it works • Monitors queries as they execute • Compares the optimizer’s estimates with actuals at each step in a QEP • Then computes adjustments to its estimates that may be used during future optimizations of similar queries • Moreover, estimation errors can also trigger re-optimization of a query in mid-execution.
Challenges in Research of LEO • (1) ensuring stability and convergence of the autonomic system • (2) guaranteeing consistency of the overall optimizer's model upon refinements
Results • Reduction of query execution time by orders of magnitude at negligible additional run-time cost • Reduced administration time • Fewer problem queries • Overall improved query performance with increased robustness and predictability of query response times
Bibliography • “LEO-Learning Empirical Results in Query Optimization.” IBM. <http://domino.watson.ibm.com/comm/research.nsf/pages/r.datamgmt.innovation.html>. • “Optimizing for Query Speed”. SQL. <http://www.devshed.com/c/a/MySQL/Optimizing-for-Query-Speed/1/ • “Optimizing Database Queries”. IBM. <http://www.stevengould.org/portfolio/developerWorks/efficientPHP/wa-effphp/wa-effphp-4-1.html>. • “Optimize Queries Theory in Practice”. <http://www.serverwatch.com/tutorials/article.php/2175621/How-to-Optimize-Queries-Theory-an-Practice.htm>. • Beyer, Kevin, Gemulla, Rainer, Haas, Peter J., Reinwald, Berthold, Sismani, Yannis. “Distinct-Value Synopses for Multiset Operations”. Communications of the ACM. Vol. 52. October 2009. • Chaudhuri, Surajit. “Technical Perspective: Relational Query Optimization-Data Management Meets Statistical Estimation”. Communications of the ACM. Vol. 52. October 2009.