Download
adaptive query processing with eddies n.
Skip this Video
Loading SlideShow in 5 Seconds..
Adaptive Query Processing with Eddies PowerPoint Presentation
Download Presentation
Adaptive Query Processing with Eddies

Adaptive Query Processing with Eddies

139 Vues Download Presentation
Télécharger la présentation

Adaptive Query Processing with Eddies

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Adaptive Query Processing with Eddies Amol Deshpande University of Maryland

  2. Roadmap • Adaptive Query Processing: Motivation • Eddies [AH’00] • STAIRs [DH’04] and SteMs [RDH’03] • Experimental Study • Implementation in PostgreSQL [Des’03] • Continuous queries [MSHR’02] (very briefly) • Open problems

  3. Query Processing in Database Systems Declarative Query Database System Results We will focus on traditional select-project-join queries

  4. Query Processing: Example select * from students, enrolled, courses where students.name = enrolled.name and enrolled.course = courses.course Database System Students Enrolled Courses

  5. Students Enrolled Enrolled Courses Query Processing: Example select * from students, enrolled, courses where students.name = enrolled.name and enrolled.course = courses.course Courses Students Enrolled

  6. ES EC CE SE Example Query: Execution Plans SEC SEC CE SE S C Students Courses C S E E Courses Enrolled Students Enrolled A Query Execution Plan An alternate Execution Plan

  7. EC SE Cost-based Query Optimization Estimate cost of each plan and choose the best SEC Cost = g(|SE|, |C|, R) Input sizes + SE C Cost = f(|S|, |E|, R) Courses = S E Students Enrolled Runtime Parameters Cost (Plan) A Query Execution Plan

  8. Cost-based Query Optimization Results Query Optimizer Query Executor Compiled Query Plan Declarative Query Disk(s)

  9. Network Cost-based Query Optimization Results Query Optimizer Query Executor Compiled Query Plan Declarative Query Wide area data sources: e.g. remote tables, web data sources Disk(s)

  10. Network Disk(s) Cost-based Query Optimization Results Query Optimizer Query Executor Compiled Query Plan Declarative Query Streaming data e.g. Stock tickers Network logs Sensor networks

  11. EC Erroneous estimation of intermediate result sizes Input sizes may not be available SE Estimation Errors Cost = g(|SE|, |C|, R) SEC SE C Courses S E Students Enrolled A Query Execution Plan

  12. EC Effect on the cost function may be unpredictable SE Estimation Errors Cost = g(|SE|, |C|, R) SEC SE Unknown runtime parameters C Courses S E Students Enrolled A Query Execution Plan

  13. How to solve this problem ? • More sophisticated estimation techniques • Sophisticated summary structures • e.g. MHists [PI’97], Wavelets [VWI’98] • Feedback loop in the optimization process • e.g. [SLMK’01, BC’02] • Adaptive query processing • Can’t always build and maintain synopses • Runtime environments can be very unpredictable • So…adapt query plans mid-way during execution

  14. Eddies: Extreme Adaptivity • Telegraph & TelegraphCQ (at UC Berkeley) • Eddies [AH’00] • SteMs [RDH’03] • Continuous queries [MSHR’02, CF’02, C+’03, K+’03] • Implementation in PostgreSQL [Des04] • Fault-tolerance and load balancing [SHB’04] • STAIRs [DH’04] • Other work • Distributed eddies, Content-based Routing [BB’05] per tuple static plans inter- operator late binding intra- operator Dynamic QEP, Parametric, Competitive Query Scrambling, MidQuery Re-opt Traditional DBMS XJoin, DPHJ Convergent QP Eddies

  15. Roadmap • Adaptive Query Processing: Motivation • Eddies [AH’00] • STAIRs [DH’04] and SteMs [RDH’03] • Experimental Study • Implementation in PostgreSQL [Des’03] • Continuous queries [MSHR’02] (very briefly) • Open problems

  16. Eddies [AH’00] select * from S where pred1(S) and pred2(S) Plans considered by the optimizer pred1(S) pred2(S) S Output pred2(S) pred1(S) S Output Decision made apriori based on statistics Sort by (1-s)/c, where s = selectivity, c = cost Once this decision is made, all tuples are processed using the same order

  17. pred2(S) Eddy Output S pred1(S) Eddies [AH’00] select * from S where pred1(S) and pred2(S) Executing the query using an Eddy • An eddy operator • Intercepts tuples from source(s) and output tuples from operators • Query executed by routing tuples between the operators • Uses feedback from the operators to route Change routing ==> Change query execution plan used

  18. Per-tuple State select * from S where pred1(S) and pred2(S) Executing the query using an Eddy pred2(S) Eddy Output S Two Bitmaps Ready bits - which operators can a tuple be routed to next Done bits - which operators has a tuple already been through pred1(S) Example: Ready(t1) = [1, 1] - can be routed to either Done(t1) = [0, 0] - not done either Example: Ready(t2) = [1, 0] - can be routed to pred1 Done(t2) = [0, 1] - done pred2 For selection queries, ready is a bit-complement of done

  19. Eddies: Routing Policy • Choosing which operator to route a given tuple to • The brain of the eddy Pred2 is more selective Send here 99% of the time Send to the other operator 1% of the time Lottery Scheduling [Avnur 00] Simplified Description 1. Maintain for each operator: tuples sent tuples returned cost per tuple 2. Choose (roughly) based on the above 3. Explore by randomly sending tuples in the wrong orders sent = 100 received = 2 pred2(S) Eddy Output S pred1(S) sent = 30 received = 20

  20. Students Enrolled Enrolled Courses A Join Query select * from students, enrolled, courses where students.name = enrolled.name and enrolled.course = courses.course Courses Students Enrolled

  21. Output SE EC SE EC Eddy S Output E C C S E Eddies [AH’00] Query execution using an eddy A traditional query plan A key difference: Tuples can’t be arbitrarily routed to any operator E.g. S tuples can’t be routed to E Join C Use ready bits to identify this

  22. Eddies w/ Joins • Traditional join operators typically consume one relation entirely and then start reading the second relation • E.g. hash join operator builds a hash table on one relation first, and then reads in the other relation • This is problematic for eddies • An eddy needs to see tuples from different relations in order to make its routing decisions • Also, if the inner relations are pre-decided, not much options left for adapting the join order • [Avnur, Hellerstein 00] discusses this issue in detail for traditional join operators

  23. SE HashTable E.Name HashTable S.Name Symmetric Hash Join • We will use a new join operator called symmetric hash join operator • Also called doubly pipelined • Other variants include ripple joins, Xjoins (disk-based) When a new S tuple arrives: It is built into S.name hashtable Probed into E.name hash table to find matches with already arrived E tuples Matches are immediately output Symmetric Operation !! S E

  24. SE EC HashTable E.Name HashTable S.Name HashTable C.Course HashTable E.Course Query Execution using Eddies Probe to find matches Insert with key hash(joe) Eddy S E Output C No matches; Eddy processes the next tuple

  25. SE EC HashTable E.Name HashTable S.Name HashTable C.Course HashTable E.Course Query Execution using Eddies Probe Insert Eddy S E Output C

  26. SE EC HashTable E.Name HashTable S.Name HashTable C.Course HashTable E.Course Query Execution using Eddies Probe Eddy S E Output C Probe

  27. Per-tuple State • Here also we need to keep track of what operators a tuple has already been through • Again use • Ready bits - operators that can be applied next • Done bits - operators that have already been applied • Unlike selections, these are not bit-complements of each other

  28. SE EC HashTable E.Name HashTable S.Name HashTable C.Course HashTable E.Course Per-tuple State Eddy S E Output C

  29. SE EC HashTable E.Name HashTable S.Name HashTable C.Course HashTable E.Course Per-tuple State Eddy S E Output C

  30. SE EC HashTable E.Name HashTable S.Name HashTable C.Course HashTable E.Course Per-tuple State Eddy S E Output C

  31. Can we talk about what exactly the eddy did during the execution ? Yes ! Execution Postmortem

  32. Execution Postmortem Output Output EC E S SE Courses CE Students Students Enrolled Courses Enrolled Eddy executes different query execution plans for different parts of data

  33. Can we talk about what exactly the eddy did during the execution ? Yes ! Eddy executes different plans for different parts of data This is where the adaptivity comes from Execution Postmortem

  34. Routing policy • Lottery scheduling unfortunately doesn’t work well with joins • Just because a join operator does not return tuples right now doesn’t mean it won’t return more tuples later • In other words, a join operator is state-ful • Selection operators are state-less

  35. |S E| |EC| ES EC CE SE Example: Delayed Data Sources SETUP: >> Execution plan 1 Execution plan 2 SEC SEC CE SE S C C E S E Cost (Plan 1) > Cost (Plan 2)

  36. |S E| |EC| Example: Delayed Data Sources SETUP: E and Carrive early; Sis delayed >> S E C time

  37. |S E| |EC| SE HashTable E.Name HashTable S.Name Eddy S E Output C HashTable C.Course HashTable E.Course Eddy decides to route E to EC EC SETUP: E and Carrive early; Sis delayed >> sent and received suggested (so far) that S Join E is better option for E tuples S0 S S E S0 S –S0 E C time C S0E SE (S –S0)E Eddy learns the correct sizes Too Late !!

  38. State got embedded as a result of earlier routing decisions |S E| |EC| SE HashTable E.Name HashTable S.Name EC Eddy S E Output C HashTable C.Course HashTable E.Course SE EC SETUP: E and Carrive early; Sis delayed >> S E C C SE S E Execution Plan Used Query is executed using the worse plan. Too Late !!

  39. Joins and Lottery Scheduling • Lottery scheduling doesn’t work well with joins • Not clear how any routing policy can work without reasonable knowledge of future • Whatever the current state in the join operators, an adversary can send tuples to make it look very bad • Two possible solutions: • Allow manipulation of state (STAIRs) [DH’04] • Don’t embed state in the operators (SteMs) [RDH’03]

  40. Roadmap • Adaptive Query Processing: Motivation • Eddies [AH’00] • STAIRs [DH’04] and SteMs [RDH’03] • Experimental Study • Implementation in PostgreSQL [Des’03] • Continuous queries [MSHR’02] (very briefly) • Open problems

  41. STAIRs [DH’04] • Expose join state to the eddy • Provide state management primitives • That guarantee correctness of execution • That can be used to manipulate embedded state in the operators • Also allow support for cyclic queries etc

  42. New Operator: STAIR SE HashTable E.Name HashTable S.Name Eddy S Output E C HashTable C.Course HashTable E.Course EC

  43. S.Name STAIR HashTable E.Name STAIR HashTable Eddy S Output E C HashTable HashTable E.Course STAIR C.Course STAIR New Operator: STAIR Storage, Transformation and Access for Intermediate Results

  44. S.Name STAIR Build into S.Name STAIR HashTable E.Name STAIR HashTable Eddy S Output E C HashTable HashTable E.Course STAIR C.Course STAIR Query execution using STAIRS Similar to using Join Operators Probe into E.Name STAIR s1 s1 s1 s1

  45. STAIR: Operations • Build (insert): • Insert the given tuple into the STAIR • Probe (lookup): • Find matching tuples for the given tuple • State Management Operations: • Demotion • Promotion

  46. e1 e1 e2c1 e2 s1e1 e2c1 e2 s1e1 State Management Primitive: Demotion Replace a tuple in a STAIR with a projection of that tuple S.Name STAIR HashTable E.Name STAIR s1 Demoting e2c1 toe2 HashTable e1 e2 e2c1 Eddy S E Output C HashTable e2 s1e1 HashTable c1 E.Course STAIR Can be thought of as undoing work C.Course STAIR

  47. Promotinge1 using EC e1 e1 e1c1 e1 e1c1 State Management Primitive: Promotion Replace a tuple in a STAIR with the result of joining it with other tuples S.Name STAIR • Two arguments: • A tuple • A join to be used to promote this tuple HashTable E.Name STAIR s1 HashTable e1 e1c1 e2c1 Eddy S E Output C HashTable e2 s1e1 HashTable c1 e1 E.Course STAIR Can be thought of as precomputation of work C.Course STAIR

  48. STAIRs: Correctness • Theorem: For any sequence of applications of the state management operations, STAIRs will produce the correct query output. • STAIRs will produce every result tuple • There will be no spurious duplicates

  49. Lifting Burden of History: Delayed Data Sources

  50. |S E| |EC| SE HashTable E.Name HashTable S.Name Eddy S E Output C HashTable C.Course HashTable E.Course Eddy decides to route E to EC EC SETUP: E and Carrive early; Sis delayed >> S0 S E S0 E C time C S0E Eddy learns the correct selectivities