1 / 33

Eddies: Continuously Adaptive Query Processing

Eddies: Continuously Adaptive Query Processing. Based on a SIGMOD’2002 paper and talk by Avnur and Hellerstein. State-of-Art in Query Optimization. Given: Database state and statistics known a-priori One (short) user query to process Query may be run only once Query Processing:

vachel
Télécharger la présentation

Eddies: Continuously Adaptive Query Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Eddies: Continuously Adaptive Query Processing Based on a SIGMOD’2002 paper and talk by Avnur and Hellerstein.

  2. State-of-Art in Query Optimization • Given: • Database state and statistics known a-priori • One (short) user query to process • Query may be run only once • Query Processing: • A-priori decide on a (static) query plan • Run query using this one plan • Also: • Possibly update statistics sometimes (in steady state)

  3. Adaptive Systems: General Flavor Repeat: • Observe (model) environment • Use observation to choose behavior • Take action

  4. Adaptivity in Current DBs • Limited & coarse grain Repeat: • Observe (model) environment • runstats (once per week!!): model changes in data • Use observation to choose behavior • query optimization: fixes a single static query plan • Take action • query execution: blindly follow plan

  5. Query Optimization • Adaptivity at a per-week frequency! • Not suited for volatile environments

  6. A Networking Problem!? • Networks do dataflow! • Significant history of adaptive techniques • E.g. TCP congestion control • E.g. routing • But traditionally much lower function • Ship bitstreams • Minimal, fixed code • Lately, moving up the foodchain? • app-level routing • active networks

  7. Varying … • Computing resources • Data flows unpredictably from sources • Code performs unpredictably along flows • Continuous volatility due to many decentralized systems • Data Characteristics • Distributions • Burstiness • User preferences • What get fast • How much data

  8. Toward Continuous Adaptivity • Need much more frequent adaptivity • Goal: adapt per tuple of each relation?? • The traditional runstats-optimize-execute loop is far too coarse-grained • So, continuously perform all 3 functions, at runtime • Aim for adaptivity over best-case performance (as the later never exists for long)

  9. Road Map • Adaptive Query Processing • Intra-join adaptivity • Synchronization Barriers • Moments of Symmetry • Eddies • Encapsulated, adaptive dataflow

  10. Adaptable Operators and Plans • Moments of symmetry = query processing stage during which pipelined query operators or inputs can be easily reordered (with no or minimal state management) • Synchronization barriers = require inputs from different sources to be coordinated and possibly restricted to the rate of the slower input • We need “good” operators.

  11. Adaptable Joins, Issue 1 • Synchronization Barrier: merge join • Right input frozen,waiting for left • Can’t adapt while waitingfor barrier! • So, favor joins that have: • no barriers or seldom barriers • at worst, adaptable barriers  2000 2001 2002 2003 2004 2 3 4 5 6

  12. Adaptable Joins, Issue 2 • Would like to reorder in-flight (pipelined) joins • Base case: swap inputs to a join ?? • Moment of symmetry: • inputs can be swapped with no/little state management • Aim for frequent moments of symmetry more frequent adaptivity

  13. R S R S R S Adaptable Joins, Issue 2 • Moments of Symmetry • Suppose you can adapt an in-flight query plan • How would you do it? • Base case: reorder inputs of a single join • Nested loops join

  14. R S Adaptable Joins, Issue 2 • Moments of Symmetry • Suppose you can adapt an in-flight query plan • How would you do it? • Base case: reorder inputs of a single join • Nested loops join • Cleaner if you waittil end of inner loop

  15. R S Adaptable Joins, Issue 2 • Moments of Symmetry • Suppose you can adapt an in-flight query plan • How would you do it? • Base case: reorder inputs of a single join • Nested loops join • Cleaner if you waittil end of inner loop • Hybrid Hash • Reorder while “building”?

  16. Moments of Symmetry, cont. • Moment of Symmetry: • Can swap join inputs w/o state modification • Nested Loops join: end of each inner loop • Hybrid Hash join: never • Sort-Merge join: essentially always • More frequent moments of symmetry more frequent adaptivity

  17. Joins for Adaptivity • Pipelined hash join (hash ripple or Xjoin) • No synchronization barriers • Continuous symmetry • Good for equi-join • Simple (or block) ripple join • Synchronization barriers at “corners” • Moments of symmetry at “corners” • Good for non-equi-join • When symmetry: At corners, i.e., for each “new” tuple, once it has been processed using the given operator ‘s state R S 

  18. Beyond Binary Joins • Think of swapping “inners” • Can be done at a global moment of symmetry • Intuition: like an n-ary join • Except that each pair can bejoined by a different algorithm! • So… • Need to introduce n-ary joins to a query engine

  19. Need well-behaved join algorithms • Pipelining • Avoid synch barriers • Frequent moments of symmetry

  20. Continuous Adaptivity Goal: Eddies Eddy • Avoid need for traditional cost estimation • Avoid generation of a ‘good’ query plan

  21. Continuous Adaptivity: Eddies Eddy • A pipelining n-ary tuple-routing iterator (just like join or sort) • works well with ops that havefrequent moments of symmetry

  22. Continuous Adaptivity: Eddies Eddy • Adjusts flow adaptively • Tuples flow in different orders • Visit each op once before output

  23. Routing: Eddies Eddy • Naïve routing policy: • All ops fetch from eddy as fast as possible • Previously-seen tuples precede new tuples

  24. Schedule : Grab when Ready? • Two expensive selections s1 and s2 • Selectivity(s1)=Selectivity(s2)=50% • Cost(s2) = 5. • Vary Cost(s1). • What expect? ? • Does it make a difference at all?

  25. Cost Factor? • Two expensive selections, 50% selectivity • Cost(s2) = 5. Vary cost of s1. • Favors faster operation

  26. But is it Enough? • Given two expensive selections: • Cost same, say cost(s1)=cost(s2)=5 • Selectivity(s2) = 50%. • Vary selectivity of s1. • Does that make a difference?

  27. Selectivity-based? • Two expensive selections, cost 5 • Selectivity(s2) = 50%. Vary selectivity of s1.

  28. Schedule: Selectivity-based? • Conclude: Heavy tuple shedder early on is good.

  29. How to choose? • If we knew all selectivities and all costs (and they were static), maybe we could pick the best overall “schedule” here. • Otherwise, we need a cheap means to observe their changes • And, we need a means to react in a simply manner based on those perceived changes

  30. An Aside: How to choose? • A machine learning problem? • Each agent pays off differently • Explore Or Exploit? • Heuristics ? • Sometimes want to randomly choose one • Usually want to go with the best • If probabilities are stationary, dampen exploration over time

  31. Eddies with Lottery Scheduling • Operator gets 1 ticket when it takes a tuple • Favor operators that run fast (low cost) • Operator loses a ticket when it returns a tuple • Favor operators that drop tuples (low selectivity) • Winner? • Large number of tickets == measure of goodness • Lottery Scheduling: • When two operators vie for the same tuple, hold a lottery • Never let any operator go to zero tickets • Support occasional random “exploration”

  32. Lottery-Based Eddy • Two expensive selections, cost 5 • Selectivity(s2) = 50%. Vary selectivity of s1.

  33. Summary • Eddies: Continuously Adaptive Dataflow • Suited for volatile performance environments • Changes in operator/machine peformance • Changes in selectivities (e.g. with sorted inputs) • Changes in data delivery • Currently adapts join order • Competitive methods to adapt access & join methods? • Requires well-behaved join algorithms • Pipelining • Avoid synch barriers • Frequent moments of symmetry • The end of the runstats/optimizer/executor boundary! • At best, System R is good for “hints” on initial ticket distribution

More Related