1 / 87

Adaptive Query Processing with Eddies

Adaptive Query Processing with Eddies. Amol Deshpande University of Maryland. Roadmap. Adaptive Query Processing: Motivation Eddies [AH’00] STAIRs [DH’04] and SteMs [RDH’03] Experimental Study Implementation in PostgreSQL [Des’03] Continuous queries [MSHR’02] (very briefly) Open problems.

herman
Télécharger la présentation

Adaptive Query Processing with Eddies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adaptive Query Processing with Eddies Amol Deshpande University of Maryland

  2. Roadmap • Adaptive Query Processing: Motivation • Eddies [AH’00] • STAIRs [DH’04] and SteMs [RDH’03] • Experimental Study • Implementation in PostgreSQL [Des’03] • Continuous queries [MSHR’02] (very briefly) • Open problems

  3. Query Processing in Database Systems Declarative Query Database System Results

  4. Query Processing: Example select * from students, enrolled, courses where students.name = enrolled.name and enrolled.course = courses.course Database System Students Enrolled Courses

  5. Students Enrolled Enrolled Courses Query Processing: Example select * from students, enrolled, courses where students.name = enrolled.name and enrolled.course = courses.course Courses Students Enrolled

  6. ES EC CE SE Example Query: Execution Plans SEC SEC CE SE S C Students Courses C S E E Courses Enrolled Students Enrolled A Query Execution Plan An alternate Execution Plan

  7. EC SE Cost-based Query Optimization Estimate cost of each plan and choose the best SEC Cost = g(|SE|, |C|, R) Input sizes + SE C Cost = f(|S|, |E|, R) Courses = S E Students Enrolled Runtime Parameters Cost (Plan) A Query Execution Plan

  8. Cost-based Query Optimization Results Query Optimizer Query Executor Compiled Query Plan Declarative Query Disk(s)

  9. Network Cost-based Query Optimization Results Query Optimizer Query Executor Compiled Query Plan Declarative Query Wide area data sources: e.g. remote tables, web data sources Disk(s)

  10. Network Disk(s) Cost-based Query Optimization Results Query Optimizer Query Executor Compiled Query Plan Declarative Query Streaming data e.g. Stock tickers Network logs Sensor networks

  11. EC Erroneous estimation of intermediate result sizes Input sizes may not be available SE Estimation Errors Cost = g(|SE|, |C|, R) SEC SE C Courses S E Students Enrolled A Query Execution Plan

  12. EC Effect on the cost function may be unpredictable SE Estimation Errors Cost = g(|SE|, |C|, R) SEC SE Unknown runtime parameters C Courses S E Students Enrolled A Query Execution Plan

  13. How to solve this problem ? • More sophisticated estimation techniques • Sophisticated summary structures • e.g. MHists [PI’97], Wavelets [VWI’98] • Feedback loop in the optimization process • e.g. [SLMK’01, BC’02] • Adaptive query processing • Can’t always build and maintain synopses • Runtime environments can be very unpredictable • So…adapt query plans mid-way during execution

  14. Eddies: Extreme Adaptivity • Telegraph & TelegraphCQ (at UC Berkeley) • Eddies [AH’00] • SteMs [RDH’03] • Continuous queries [MSHR’02, CF’02, C+’03, K+’03] • Implementation in PostgreSQL [Des04] • Fault-tolerance and load balancing [SHB’04] • STAIRs [DH’03] • Other work • Distributed eddies, Content-based Routing [BB’05] per tuple static plans inter- operator late binding intra- operator Dynamic QEP, Parametric, Competitive Query Scrambling, MidQuery Re-opt Traditional DBMS XJoin, DPHJ Convergent QP Eddies

  15. Roadmap • Adaptive Query Processing: Motivation • Eddies [AH’00] • STAIRs [DH’04] and SteMs [RDH’03] • Experimental Study • Implementation in PostgreSQL [Des’03] • Continuous queries [MSHR’02] (very briefly) • Open problems

  16. Eddies [AH’00] select * from S where pred1(S) and pred2(S) Plans considered by the optimizer pred1(S) pred2(S) S Output pred2(S) pred1(S) S Output Decision made apriori based on statistics Sort by (1-s)/c, where s = selectivity, c = cost

  17. pred2(S) Eddy Output S pred1(S) Eddies [AH’00] select * from S where pred1(S) and pred2(S) Executing the query using an Eddy • An eddy operator • Intercepts tuples from source(s) and output tuples from operators • Query executed by routing tuples between the operators • Uses feedback from the operators to route Change routing ==> Change query execution plan used

  18. Per-tuple State select * from S where pred1(S) and pred2(S) Executing the query using an Eddy pred2(S) Eddy Output S Two Bitmaps Ready bits - which operators can a tuple be routed to next Done bits - which operators has a tuple already been through pred1(S) Example: Ready(t1) = [1, 1] - can be routed to either Done(t1) = [0, 0] - not done either Example: Ready(t2) = [1, 0] - can be routed to pred1 Done(t2) = [0, 1] - done pred2 For selection queries, ready is a bit-complement of done

  19. Eddies: Routing Policy • Choosing which operator to route a given tuple to • The brain of the eddy Send here 99% of the time Send to the other operator 1% of the time Lottery Scheduling [Avnur 00] Simplified Description 1. Maintain for each operator: tuples sent tuples returned cost per tuple 2. Choose (roughly) based on the above 3. Explore by randomly sending tuples in the wrong orders sent = 100 received = 2 pred2(S) Eddy Output S pred1(S) sent = 10 received = 20

  20. Students Enrolled Enrolled Courses A Join Query select * from students, enrolled, courses where students.name = enrolled.name and enrolled.course = courses.course Courses Students Enrolled

  21. Output SE EC SE EC Eddy S Output E C C S E Eddies [AH’00] Query execution using an eddy A traditional query plan A key difference: Tuples can’t be arbitrarily routed to any operator E.g. S tuples can’t be routed to E Join C Use ready bits to identify this

  22. SE EC HashTable E.Name HashTable S.Name HashTable C.Course HashTable E.Course Query Execution using Eddies Probe to find matches Insert with key hash(joe) Eddy S E Output C No matches; Eddy processes the next tuple

  23. SE EC HashTable E.Name HashTable S.Name HashTable C.Course HashTable E.Course Query Execution using Eddies Probe Insert Eddy S E Output C

  24. SE EC HashTable E.Name HashTable S.Name HashTable C.Course HashTable E.Course Query Execution using Eddies Probe Eddy S E Output C Probe

  25. SE EC HashTable E.Name HashTable S.Name HashTable C.Course HashTable E.Course Per-tuple State Eddy S E Output C

  26. SE EC HashTable E.Name HashTable S.Name HashTable C.Course HashTable E.Course Per-tuple State Eddy S E Output C

  27. SE EC HashTable E.Name HashTable S.Name HashTable C.Course HashTable E.Course Per-tuple State Eddy S E Output C

  28. Eddies: Postmortem Output Output EC E S SE Courses CE Students Students Enrolled Courses Enrolled Eddy executes different query execution plans for different parts of data

  29. Joins and Lottery Scheduling • Lottery scheduling doesn’t work well with joins

  30. |S E| |EC| ES EC CE SE Example: Delayed Data Sources SETUP: >> Execution plan 1 Execution plan 2 SEC SEC CE SE S C C E S E Cost (Plan 1) > Cost (Plan 2)

  31. |S E| |EC| Example: Delayed Data Sources SETUP: E and Carrive early; Sis delayed >> S E C time

  32. |S E| |EC| SE HashTable E.Name HashTable S.Name Eddy S E Output C HashTable C.Course HashTable E.Course Eddy decides to route E to EC EC SETUP: E and Carrive early; Sis delayed >> S0 sent and received suggest S Join E is better option S S E S0 S –S0 E C time C S0E SE (S –S0)E Eddy learns the correct sizes Too Late !!

  33. State got embedded as a result of earlier routing decisions |S E| |EC| SE HashTable E.Name HashTable S.Name EC Eddy S E Output C HashTable C.Course HashTable E.Course SE EC SETUP: E and Carrive early; Sis delayed >> S E C C SE S E Execution Plan Used Query is executed using the worse plan. Too Late !!

  34. Joins and Lottery Scheduling • Lottery scheduling doesn’t work well with joins • Not clear how any routing policy can work without reasonable knowledge of future • Whatever the current state in the join operators, an adversary can send tuples to make it look very bad • Two possible solutions: • Allow manipulation of state (STAIRs) [DH’04] • Don’t embed state in the operators (SteMs) [RDH’03]

  35. Roadmap • Adaptive Query Processing: Motivation • Eddies [AH’00] • STAIRs [DH’04] and SteMs [RDH’03] • Experimental Study • Implementation in PostgreSQL [Des’03] • Continuous queries [MSHR’02] (very briefly) • Open problems

  36. STAIRs [DH’04] • Expose join state to the eddy • Provide state management primitives • That guarantee correctness of execution • That can be used to manipulate embedded state in the operators • Also allow support for cyclic queries etc

  37. New Operator: STAIR SE HashTable E.Name HashTable S.Name Eddy S Output E C HashTable C.Course HashTable E.Course EC

  38. S.Name STAIR HashTable E.Name STAIR HashTable Eddy S Output E C HashTable HashTable E.Course STAIR C.Course STAIR New Operator: STAIR Storage, Transformation and Access for Intermediate Results

  39. S.Name STAIR Build into S.Name STAIR HashTable E.Name STAIR HashTable Eddy S Output E C HashTable HashTable E.Course STAIR C.Course STAIR Query execution using STAIRS Similar to using Join Operators Probe into E.Name STAIR s1 s1 s1 s1

  40. STAIR: Operations • Build (insert): • Insert the given tuple into the STAIR • Probe (lookup): • Find matching tuples for the given tuple • State Management Operations: • Demotion • Promotion

  41. e1 e1 e2c1 e2 s1e1 e2c1 e2 s1e1 State Management Primitive: Demotion Replace a tuple in a STAIR with a projection of that tuple S.Name STAIR HashTable E.Name STAIR s1 Demoting e2c1 toe2 HashTable e1 e2 e2c1 Eddy S E Output C HashTable e2 s1e1 HashTable c1 E.Course STAIR Can be thought of as undoing work C.Course STAIR

  42. Promotinge1 using EC e1 e1 e1c1 e1 e1c1 State Management Primitive: Promotion Replace a tuple in a STAIR with the result of joining it with other tuples S.Name STAIR • Two arguments: • A tuple • A join to be used to promote this tuple HashTable E.Name STAIR s1 HashTable e1 e1c1 e2c1 Eddy S E Output C HashTable e2 s1e1 HashTable c1 e1 E.Course STAIR Can be thought of as precomputation of work C.Course STAIR

  43. STAIRs: Correctness • Theorem: For any sequence of applications of the state management operations, STAIRs will produce the correct query output. • STAIRs will produce every result tuple • There will be no spurious duplicates

  44. Lifting Burden of History: Delayed Data Sources

  45. |S E| |EC| SE HashTable E.Name HashTable S.Name Eddy S E Output C HashTable C.Course HashTable E.Course Eddy decides to route E to EC EC SETUP: E and Carrive early; Sis delayed >> S0 S E S0 E C time C S0E Eddy learns the correct selectivities

  46. S.Name STAIR HashTable |S E| |EC| S0 E E E HashTable E E E C C C S0E Eddy decides to route E to EC E.Course STAIR SETUP: E and Carrive early; Sis delayed >> S0 E.Name STAIR HashTable S E E Eddy S C E Output C time E HashTable C Eddy decides to migrateE Eddy learns the correct selectivities By promoting E using EC C.Course STAIR

  47. |S E| |EC| HashTable E C S0E E.Course STAIR SETUP: E and Carrive early; Sis delayed >> S.Name STAIR HashTable S S0 E.Name STAIR HashTable S S –S0 S –S0 E Eddy S C (S –S0) E C E Output C time E HashTable C C.Course STAIR

  48. S E EC S – S0 C EC SE E S0 E C HashTable E C SE E.Course STAIR S.Name STAIR HashTable S E.Name STAIR HashTable UNION Eddy S E Output C E HashTable C Most of the data is processed using the correct plan C.Course STAIR

  49. Further Motivating Adaptive State Management • Eager pre-computation for faster response times • Query scrambling [UFA’98] • Partial results [RH’02] • Selective caching of intermediate results • Continuous queries over streams • Cyclic queries • Adapting the join spanning tree used

  50. Making State Migration Decisions • Another policy question • Optimal migration decisions • Requires knowledge of future selectivities and the sizes of relations

More Related