1 / 29

Multi-Query Optimization

Multi-Query Optimization. Prasan Roy Indian Institute of Technology - Bombay. Overview. Multi-Query Optimization: What? Problem statement Multi-Query Optimization: Why? Application scenarios Multi-Query Optimization: How? A cost-based practical approach

MikeCarlo
Télécharger la présentation

Multi-Query Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-Query Optimization Prasan Roy Indian Institute of Technology - Bombay

  2. Overview • Multi-Query Optimization: What? • Problem statement • Multi-Query Optimization: Why? • Application scenarios • Multi-Query Optimization: How? • A cost-based practical approach • Prototyping Multi-Query Optimization • On MS SQL-Server at Microsoft • Research prototype at IIT-Bombay

  3. Multi-Query Optimization: What? Exploit common subexpressions (CSEs) in query optimization • Consider DAG execution plans in addition to tree execution plans

  4. Example B A D C C B Best Plan for A JOIN B JOIN C Best Plan for B JOIN C JOIN D

  5. Example (contd) Alternative: D A C B Common Subexpression

  6. Multi-Query Optimization: Why? • Queries on views, nested queries, … • Overlapping query batches generated by applications • Update expressions for materialized views • Query invocations with different parameters • . . . Practical solutions needed!

  7. Multi-Query Optimization: How? • Set up the search space • Identify the common subexpressions • Explore the search space efficiently • Find the best way to exploit the common subexpressions

  8. Problems • Materializing and sharing a CSE not necessarily cheaper • Mutually exclusive alternatives (A JOIN B JOIN C) (B JOIN C JOIN D) (C JOIN D JOIN E) What to share: (B JOIN C) or (C JOIN D) ? Huge search space!

  9. Earlier Work: Practical Solutions As early as 1976 • Preprocess query before optimization [Hall, IBM-JRD76] As late as 1998 • Postprocess optimized plans [Subramanium and Venkataraman, SIGMOD98] Query optimizer is not aware!

  10. Earlier Work: Theoretical Studies [Sellis, TODS88], [Cosar et al., CIKM93], [Shim et al., DKE94],... • Set of queries {Q1, Q2, …, Qn} • For each query Qi, set of execution plans {Pi1, Pi2, …, Pim} • Pij is a set of tasks from a common pool Pick a plan for each query such that the cost of tasks in the union is minimized Not integrated with existing optimizers, no practical study

  11. Microsoft Experience with Paul Larson, Microsoft Research

  12. Prototyping MQO on SQL-Server Add multi-query optimization capability to SQL-Server • Well integrated with the existing optimization framework • another optimization level • minimal changes, minimal extra lines of code • First cut: exhaustive • How slow can it be? • A working prototype by the summer-end

  13. Op (AND node) B A C D What (almost) already exists in the SQL-Server Optimizer • AND/ORQuery-DAG representation of plan space Group (OR node)

  14. What actually exists in the SQL-Server Optimizer • Relations cloned for each use B1 B2 A C1 C2 D

  15. Preprocessing Step: Query-DAG Unification • Performed in a bottom-up traversal       B1 B2 A C1 C2 D

  16. Common Subexpression Identification • Unified nodes are CSEs Common Subexpression B A C D

  17. Exploring the Search Space: A Naïve Algorithm • For each set S of common subexpressions • materialize each node in S • MatCost(S) = sum of materialization costs of the nodes in S • invoke optimizer to find the best plan for the root and for each node S • CompCost(S) = sum of costs of above plans • Cost(S) = MatCost(S) + CompCost(S) • Pick S with the minimum Cost

  18. Doing Better: Incremental Reoptimization Goal: best plan for Si  best plan for Sj • Observation • Best plans change for only the ancestors of nodes in Si XOR Sj • Algorithm: • Propagate changed costs in bottom-up topological order from nodes in Si XOR Sj • Update min-cost plan at each node visited • Do not propagate further up if min-cost plan remains unchanged at a node Work done at IIT-Bombay

  19. Incremental Optimization: Example • Si =  min-cost B A C D

  20. Incremental Optimization: Example • Si =  Sj = {(B JOIN C)}   Now materialized  Previous min-cost New min-cost B A C D

  21. Current Status • A first-cut implementation working • Lines of C++ code added: 1500 approx.

  22. Future Work • Performance tuning and smarter data structures needed • Ways to restrict enumeration taking DAG structure into account

  23. Research at IIT-Bombay: Heuristics for MQO with S. Sudarshan, S. Seshadri

  24. A Greedy Heuristic • Pick nodes for materialization one at a time, in “benefit” order Benefit(n) = reduction in cost on materialization of n Benefit computation is expensive

  25. Monotonicity Assumption • Benefit of a node does not increase due to materialization of other nodes Exploited to avoid some benefit computations Optimization costs decrease by 90%

  26. A Postpass Heuristic: Volcano-SH • No change in Volcano best plan computation • Cost-based materialization of nodes in best Volcano plan Implementation easy Low overhead Optimizer is not aware

  27. A Volcano Variant: Volcano-RU • Volcano best plan search aware of best plans for earlier queries • Cost based materialization of best plan nodes that are used by later queries Implementation easy Low overhead Local decisions, plan quality sensitive to query sequence

  28. Experimental Conclusion • Greedy • Expensive, but practical • Overheads typically offset by plan quality • especially for expensive “canned” queries • Almost linear scaleup with query batch size • typically, only the width of the Query DAG affected • Volcano-RU • Mostly better than Volcano-SH, same overhead • Negligible overhead over Volcano • recommended for cheap but complex queries

  29. Conclusion • Multi-query optimization is needed • Multi-query optimization is practical! • Multi-query optimization is an easy next step for DAG-based optimizers

More Related