1 / 31

Reducing Order Enforcement Cost in Complex Query Plans

Reducing Order Enforcement Cost in Complex Query Plans. Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007). Background. Sort-based query processing algorithms Sort-merge Join (also Union/Intersection) Sort-based grouping and duplicate elimination Explicit “order by”

Télécharger la présentation

Reducing Order Enforcement Cost in Complex Query Plans

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)

  2. Background • Sort-based query processing algorithms • Sort-merge Join (also Union/Intersection) • Sort-based grouping and duplicate elimination • Explicit “order by” • Notion of “Interesting Sort Orders” (System-R) • Find and remember the best plan for each sort order that may be useful • Optimization goal in Volcano : (expr, sort-order)

  3. Group By {a2,a4,a5,…} R.a1=S.a1 and R.a2=S.a2 … R.an=S.an G S R The Problem • Interesting orders can be too many! • Factorial in number of attributes involved • Plan cost can vary substantially with the choice of interesting order • Clustering and covering indices • Other operators in the input sub-expressions • Possibility of partial sorting

  4. Motivation • Joins in data integration and decision support involve large number of attributes • Increasing use of covering indices • Several alternative sort orders • Partial sorting • Query patterns • Attributes common to multiple operators • Known techniques • Work only for unary operators like group-by

  5. Outline of the Talk • Partial sorting • Changes to external sort • Optimizer changes to handle partial sort orders • Interesting orders for a join tree : A special case • Problem is NP-Hard • A 2-approximation for the special case • The general problem • Notion of favorable orders • Plan generation using favorable orders • Post-optimization phase • Experimental results

  6. Exploiting Partial Sort Orders • Sort on (a1, a2) given (a1) • Standard external-sort • Cost is independent of input sort order • Replacement-selection • Produces single run but incurs I/O • Both methods break the pipeline – first o/p tuple after reading all i/p R.a1=S.a1 and R.a2=S.a2 (a1) (a1,a2) () (a1,a2) R S C. Index on (R.a1)

  7. A Minor Change to External Sorting • Multiple “partial sort segments” • Hold only one segment at any given time • When a new segment starts • Sort the current segment and output • No run generation I/O if each segment fits in memory • Early output (good for Top-K) • Reduced comparisons • O(n log n/k) Vs. O(n log n), k = # segments

  8. Optimizer Changes to Handle Partial Sort Orders • Cost Model for Partial Sort: • Let the input order be o1 • Required (output) order be o2 • Let os=Longest common prefix between o1 and o2 • Let or=o2 – os (i.e, os + or = o2) • A(o) = Attribute set of order o • Є : Empty (no) sort order • coe(e, o1,o2) = D(e, A(os)) X coe(e’, Є, or), • where e’=sp(e) and p equates A(os) to a constant.

  9. Optimizer Changes to Handle Partial Sort Orders • Cost Model for Partial Sort: • coe(e, o1,o2) = D(e, A(os)) X coe(e’, Є, or), • where e’=sp(e) and p equates A(os) to a constant. o2=(a,c) o1=(a,b) e os=(a), or=(c), e’=s(a=k)(e)

  10. Flexible Order Requirements • Most operators have interest in any order on the attributes involved • Merge-Join, Merge-Union, Group By, Duplicate Elimination • Binary operators demand the same order from inputs G {a1, a2} {a1,a2,a3,a4} {a3,a5,a6} {a4,a7}

  11. A special case: All relations/intermediate results of the same size All attribute cardinalities same We try to maximize the length of common prefixes Maximize SLCP(pi, pj) Reduction from graph layout problem SUM-CUT Optimal algorithm for paths and 2-approximation for binary trees Finding Optimal is NP-Hard

  12. s2 Sn-1 s1 s3 sn A 2-Approximation Algorithm • Optimal algorithm for paths • OPT(i,j) = max {OPT(i,k) + OPT(k+1,j) + c(i,j)}, i ≤ k < j • 2-Approximation for binary trees Even levels Odd levels - OPT ≤ OPT-EVEN + OPT-ODD - Take the one with higher benefit

  13. General Case • Logical plan space for inputs not expanded (i.e, Join order not fixed) • Varying sizes of relations and intermediate results • All orders on base relations do not have the same cost (due to clustering and covering indices)

  14. Overview of the Approach • Identify a small set of favorable orders • Orders that are relatively inexpensive • Should not require expanding the input plan space • Plan generation (Phase-1) • Deduce the interesting orders from the favorable orders • Try each of the interesting order, retain the best • Plan refinement (Phase-2) • Use the 2-approximation algorithm and refine the sort orders further

  15. Favorable Orders • Benefit of an order: benefit(o, e) = cbp(e, Є) + coe (e, Є, o) – cpb(e,o) Positive benefit The order can be obtained at cost less than the full sort of unordered result (e.g., the clustering order) • Favorable orders: ford(e)={ o : benefit(o,e) > 0 } • Can be a huge set • E.g., Every order having the clustering order as its prefix is a favorable order.

  16. E E Minimal Favorable Orders • A favorable order o that satisfies: • o’ ≤ o s.t. cbp(e, o’) + coe(e, o’, o) = cbp(e,o) • o” s.t. o ≤ o” and cbp(e, o”) = cbp(e,o) E.g., Relation R with clustering index on (a1,a2) • (a1,a2) is a minimal favorable order • (a1 ), (a1,a2,a3) are not • ford-min(e): Set of all minimal favorable orders for expression e • For base relations size of ford-min limited to the number of covering indices

  17. Computing Favorable Orders: Issues • Defined in terms of cost of best plan • Need them before optimizing input sub-expressions • Even ford-min can get prohibitively large for join, group-by expressions ford-min contains every permutation of the join attributes J1 J2 S R

  18. U U U Heuristics for Computing ford-min • e=R : {o: o is clustering or covering index order} • e=sp(e1) : {o: o ford-min(e1)} • e=PL(e1) :{o: o’ ford-min(e1) and o=o’ ^ L} • Pa,b(e1), ford-min(e1)={(a,c,b)}  ford-min(e)={(a)} • e=e1 e2 : • Let T=ford-min(e1) U ford-min(e2) • T U {o: o’ T and o=((o’ ^ S) permute(S – A(o’ ^ S)))

  19. Heuristics for Computing ford-min S={a,b,c,d} T = {(a,b,e), (b), (a)} ford-min={(a,b,e),(b)} ford-min={(a)}

  20. Plan Generation (Phase-1) • Form the set I of interesting orders to try • Collect input favorable orders and rqd. o/p order • Take LCP with the set of join attributes • Extend the orders (arbitrarily) to include remaining attributes • For each order o in I, generate optimization sub-goals for input sub-expressions

  21. Plan Refinement (Phase-2) • Identify the suffix that can be freely reordered • Use the 2-approximation algorithm to reorder the suffix {a,e,h} (a,b,c,h) (a,h,b,c) {a,e,h} {a,d,h} (a,h,e) (a,e,h) (a,d,h) (a,h,d) R4 (a) R3 (a) R2 (a) R1 (a)

  22. Experiments • Benefits of exploiting partial sort orders • Evaluate the plans produced by our optimizer extensions

  23. Experiment 1 SELECT suppkey, partkey FROM lineitem ORDER BY suppkey, partkey; (suppkey)  (suppkey, partkey)

  24. Experiment 2 R(c1,c2,c3), 10 M records, (c1)(c1,c2), card(c1)=10,000

  25. Experiment 3

  26. SELECT ps_suppkey, ps_partkey, ps_availqty, sum(l_quantity) AS total_required FROM partsupp, lineitem WHERE ps_suppkey=l_suppkey AND ps_partkey=l_partkey AND l_linestatus='O' GROUP BY ps_partkey, ps_suppkey, ps_availqty, HAVING sum(l_quantity) > ps_availqty ORDER BY ps_partkey; Parts running out of stock: Experiment 4

  27. Experiment 4 - Plans Merge-Join Plan on SYS1 and SYS2 Plan Generated by PYRO-O

  28. Experiment 4 & 5 - Timings

  29. Experiments with Variants of PYRO PYRO : Baseline PYRO PYRO-O-: No partial sort PYRO-P : Postgres Heuristic PYRO-O : Our Approach PYRO-E : Exhaustive

  30. Optimization Overheads

  31. Questions?

More Related