1 / 22

Parallel Querying with Non-Dedicated Nodes

Parallel Querying with Non-Dedicated Nodes. Vijayshankar Raman, Wei Han, Inderpal Narang IBM Almaden Research Center. Properties of a relational database. Ease of schema evolution Declarative Querying Transparent scalability does not quite work. L 1. L 3. L 2. O 1. O 3. O 2. S a.

srosas
Télécharger la présentation

Parallel Querying with Non-Dedicated Nodes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Querying with Non-Dedicated Nodes Vijayshankar Raman, Wei Han, Inderpal Narang IBM Almaden Research Center

  2. Properties of a relational database • Ease of schema evolution • Declarative Querying • Transparent scalability does not quite work Parallel Querying with Non-Dedicated Computers

  3. L1 L3 L2 O1 O3 O2 Sa Sc Sb Today: Partitioning is basis for parallelism • static partitioning (on the base tables) • Dynamic partitioning via exchange operators • Claim: partitioning does not handle non-dedicated nodes well Parallel Querying with Non-Dedicated Computers

  4. initial partitioning Problems of partitioning • Hard to scale incrementally • Data must be re-partitioned • Disk and CPU must be scaled together • DBA must ensure partition-cpu affinity • Homogeneity Assumptions • Same plan runs on each node • Identical software needed on all nodes • Susceptible to load variations, node failures / stalls, … • Response time is dictated by speed of slowest processor • Bad for transient compute resources • E.g. we want ability to interrupt query work by higher-priority local work exchange Parallel Querying with Non-Dedicated Computers

  5. GOAL: A more graceful scale-out solution Sacrifice partitioning for scalability • Avoid initial partitioning • No exchange New means for work allocation in absence of partitioning • Handles heterogeneity and load variations better • Two Design Features • Data In The Network (DITN) • Shared files on high speed networks (e.g SAN) • Intra-Fragment Parallelism • Send SQL fragments to heterogeneous join processors:each performs the same join, over a different subset of cross-product space • Easy fault-tolerance • Can use heterogeneous nodes -- whatever is available at that time Parallel Querying with Non-Dedicated Computers

  6. Outline • Motivation • DITN design • Experimental Results • Summary Parallel Querying with Non-Dedicated Computers

  7. DITN Architecture • Find idle coprocessors P1, P2, P3, P4, P5, P6 • Prepare O, L, C • Logically divide OxLxC into workunits Wi • In Parallel, Run SQL queries for Wi at Pi • Property: SPJAG(OxLxC) = AG (iSPJAG(Wi)) Restrictions (will return to this at the end) • Pi cannot use indexes at info. Integrator • Isolation issues Parallel Querying with Non-Dedicated Computers

  8. Why Data in the Network • Observation: Network bandwidth >> Query Operator Bandwidth • N/W bandwidth: in Gbps (SAN/LAN),Scan: 10-100 Mbps, Sort: about 10 Mbps • Interconnect transfers data faster than query operators can process it • But, exploiting this fast interconnect via SQL is tricky • E.g. ODBC Scan: 10x slower than local scan • Instead, keep temp files in a shared storage system (e.g. SAN-FS) • Allows exploitation of full n/w bandwidth • immediate benefits • Fast data transfer • DBMS doesn’t have to worry about disks, i/o ||ism, || scans, etc. • Independent scaling of CPU and I/O Parallel Querying with Non-Dedicated Computers

  9. Work Allocation without Partitioning • For each join: we now have to join the off-diagonal rectangles also • Minimize Response time = max(RT of each work-unit) = maxi,j JoinCost(|Li|, |Oj|) • How to optimize the Work allocation? • ~ cut join hyper-rectangle into n pieces to minimize max perimeter • Simplification: assume that the join is cut into a grid • Choices: number of cuts on each table, size of each cut, allocation of work-units to processors Parallel Querying with Non-Dedicated Computers

  10. Allocation to homogenous processors • Theorem: For monotonic JoinCost, RT is minimized when each cut (on a table) is of same size • So allocation done into rectangles of size |T1|/p1, |T2|/p2, … |Tn|/pn • Theorem: For symmetric JoinCost, RT is minimized when |T1|/p1 = |T2|/p2 = … |Tn|/pn • E.g., with 10 processors, cut Lineitem into 5 parts and Orders into 2 • Note: cutting each table into same number of partitions (as is done usually) is sub-optimal Parallel Querying with Non-Dedicated Computers

  11. Allocation to heterogeneous co-processors • Response time of queryRT = max(RT of each work-unit)Choose size of each work-unit, and allocation of work-units to co-processor, so as to minimize RT • Like a bin packing problem • Solve for number of cuts on each table, assuming homogeneity • Then solve a Linear Program to find the optimal size of each cut • Have to make some approximations in order to avoid Integer Program (see paper) Parallel Querying with Non-Dedicated Computers

  12. Failure/Stall Resiliency by Work-Unit Reassignment Without tuple shipping between plans, failure handling is easy • If co-processor’s A,B,C finished by time X,and co-processor D has not finished by time X(1+f) • Take D’s work unit and assign to fastest among A,B,C – say A • When either of D or A returns, close the cursor on the other • Can generalize to a work-stealing scheme • E.g. with 10 coprocessors, assign each to 1/20th of the cross-product space • When a coprocessor returns with a result, assign it more work • Tradeoff: Finer work allocation => more flexible work-stealing BUT, more redundant work Parallel Querying with Non-Dedicated Computers

  13. Analysis: What do we lose by not partitioning • Say join of L x O x C (TPC-H) with 12 processors: 12 = p1p2p3 • RT without partitioning ~ JoinCost(|L|/p1, |O|/p2 , |C|/p3) • RT with partitioning ~ JoinCost(|L|/p1p2p3, |O|/p1p2p3, |C|/p1p2p3) • At p1=6, p2=2, p3=1, loss in CPU speedup is JoinCost(|L|/6, |O|/2, |C| ) ~ 2JoinCost(|L|/12, |O|/12, |C|/12) • Note: I/O speedup is unaffected • Can close the gap with partitioning further • Sort the largest tables of the join: e.g. |L|, |O| on their join column • Now, loss is: JoinCost(|L|/12,|O|/12,|C|) / JoinCost(|L|/12, |O|/12,|C|/12) • Still avoids exchange => can use heterogeneous, non-dedicated nodes,but causes problems with isolation Optimization: selective clustering Parallel Querying with Non-Dedicated Computers

  14. Lightweight Join Processor • Work Allocation via Query Fragments => co-processors can be heterogeneous • Need not have a full DBMS; join processor is enough • E.g. screen saver for join processing  • We use a trimmed down version of Apache Derby • Parse CSV files • Predicates, projections, sort-merge joins, aggregates, group by Parallel Querying with Non-Dedicated Computers

  15. Outline • Motivation • DITN design • Experimental Results • Summary Parallel Querying with Non-Dedicated Computers

  16. Performance degradation due to not partitioning O  L SOL • At 10 nodes on SxOxLxCxNxR,DITN is about 2.1x slower than PBP(Work alloc: L/5, O/2, S, C, N, R) • DITN2PART has very little slowdown • But needs total clustering • Slow-down oscillates due to discreteness of work-allocation SOLCNR Parallel Querying with Non-Dedicated Computers

  17. Failure/Stall Resiliency by Work-Unit Reassignment • Orders x Lineitemgroup by o_orderpriority5 co-processors • Impose high load on oneco-processor as soon as query begins • At 60% load (50% wait), DITN times out and switches to alternative PBP DITN2PART Parallel Querying with Non-Dedicated Computers

  18. Importance of Asymmetric Allocation • Initially 2 fast nodes: then add 4 slow nodes • With symmetric allocation: adding slow nodes can slow down system Contrast between DITN-symmetric and DITN-asymmetric Parallel Querying with Non-Dedicated Computers

  19. Danger of Tying partition to CPU • Repeated execution of O  L • Impose 75% CPU load on one of the 5 co-processors during 3rd iteration • PBP continues to use this slow node throughout • DITN switches to another node after two iterations Parallel Querying with Non-Dedicated Computers

  20. Related Work • Parallel query processing – Gamma, XPRS, many commercial systems • Mostly shared-nothing • Shared-disk: IBM Sysplex • Queries done via tuple shipping between co-processors • Oracle • Shared disk, but hash joins done via partitioning (static/dynamic) • Mariposa – similar query fragment level work allocation • Load Balancing Exchange, Flux, River, Skew-avoidance in hash joins • Fault-tolerant exchange (FLUX) • Polar*, OGSA-DQP • Distributed Eddies • Query Execution on P2P systems Parallel Querying with Non-Dedicated Computers

  21. Summary and Future work • Partitioning-based parallelism does not handle non-dedicated nodes • Proposal: Avoid partitioning • Share data via storage system • Intra-fragment parallelism instead of exchange • Careful work-allocation to optimize response time • Promising initial results: only 2x slowdown with 10 nodes • Index scans: want shared reads without latching • Isolation: DITN: uncommitted read; DITN2PART: read-only • Scaling to large numbers of nodes • Multi-query optimization to reuse shared temp tables Open Questions Parallel Querying with Non-Dedicated Computers

  22. Backup Slides Parallel Querying with Non-Dedicated Computers

More Related