query optimization n.
Skip this Video
Loading SlideShow in 5 Seconds..
Query Optimization PowerPoint Presentation
Download Presentation
Query Optimization

Query Optimization

152 Vues Download Presentation
Télécharger la présentation

Query Optimization

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Query Optimization Allison Griffin

  2. Importance of Optimization • Time is money • Queries are faster • Helps everyone who uses the server • Solution to speed lies in the algorithm • Different performance improvements with different database engines and schemas

  3. Brief History • Before 1970’s: Dark days, manual optimization • Late 70’s to mid 80’s: • Birth of relational data model and declarative SQL • Optimization is job of system • System R-beginning work on join order optimization • Dynamic Programming: Heuristic Optimizers • Mid 80’s to early 90’s: • Extensible query optimization (Exodus) • Mid 90’s to late 90’s: • Materialized Views

  4. Volcano Extensible Query Optimizer Generator • General purpose cost based query optimizer, based on equivalence rules in algebra • Equivalences: join associativity, select push down, aggregate push down • Extensible: new operations and equivalences can be easily added • Developed by Graefe and McKenna 1993

  5. Materialized Views • Can materialize (pre-compute and store) views to speed up queries • Incremental maintenance • when database is updated, propagate updates to materialized view without complete re-computation • Deciding when to use materialized views • even if query does not refer to materialized view, optimizer can figure out it can be used

  6. Deciding What to Materialize • Maintenance cost and query cost • Workload depends on what is materialized: • queries and update transactions • weights for each component of workload • Goal: find set of views that gives minimum cost if materialized, subject to space constraints

  7. What we already know… • Query optimizer analyzes set of query execution plans and gives optimal (least cost) • Heavily dependent on optimizer’s estimate for number of rows that will result at each step of QEP • Estimates rely on statistics typically stored in histograms

  8. Recent Approaches to Improve Statistics • Paper “Distinct-Value Synopses for Multiset Operations” by Kevin Beyer, Rainer Gemulla, Peter J. Haas, Berthold Reinwalk, and Yannis Sismanis, 2007 • IBM’s LEO (Learning Empirical Results in Query Optimization), 2001

  9. Summary of Paper Results • Addresses the problem of efficient estimate of number of distinct values of an attribute • Builds on leveraging of randomized algorithms • Claim to have unbiased estimator for distinct values with lower mean squared error • Past attempts tend to by higher than the actual number so they have come up with way to cut that number down to be more reasonable

  10. Distinct-Value Estimation • Propose summary structure (synopsis) for a relation • Synopsis can be used to estimate number of DVs in the partition • Synopses can be combined to create synopses for compound partitions created from base partitions using multiset union, intersection or difference operations • Updates can be performed on compound partitions by using synopses from base relations

  11. LEO - Learning Emperical Results in Query Optimization • Autonomic feedback loops that create a self-tuning database query optimizer • Self-validates and adjusts to improve query optimization and execution without requiring user interaction to repair incorrect statistics or cardinality estimates • Reduces the total cost of owning database management systems by simplifying database administration

  12. How it works • Monitors queries as they execute • Compares the optimizer’s estimates with actuals at each step in a QEP • Then computes adjustments to its estimates that may be used during future optimizations of similar queries • Moreover, estimation errors can also trigger re-optimization of a query in mid-execution.

  13. Challenges in Research of LEO • (1) ensuring stability and convergence of the autonomic system • (2) guaranteeing consistency of the overall optimizer's model upon refinements

  14. Results • Reduction of query execution time by orders of magnitude at negligible additional run-time cost • Reduced administration time • Fewer problem queries • Overall improved query performance with increased robustness and predictability of query response times

  15. Bibliography • “LEO-Learning Empirical Results in Query Optimization.” IBM. <>. • “Optimizing for Query Speed”. SQL. < • “Optimizing Database Queries”. IBM. <>. • “Optimize Queries Theory in Practice”. <>. • Beyer, Kevin, Gemulla, Rainer, Haas, Peter J., Reinwald, Berthold, Sismani, Yannis. “Distinct-Value Synopses for Multiset Operations”. Communications of the ACM. Vol. 52. October 2009. • Chaudhuri, Surajit. “Technical Perspective: Relational Query Optimization-Data Management Meets Statistical Estimation”. Communications of the ACM. Vol. 52. October 2009.