1 / 80

CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design

CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design. Hector Garcia-Molina. Chapter 5 Ozsu & Valduriez. Distributed DB Design. Top-down approach: - have DB… - how to split and allocate the sites

adora
Télécharger la présentation

CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 347: Parallel and DistributedData ManagementNotes02: Distributed DB Design Hector Garcia-Molina Notes 02

  2. Chapter 5 Ozsu & Valduriez Distributed DB Design Top-down approach: - have DB… - how to split and allocate the sites Multi-DBs (or bottom-up): no design issues! Notes 02

  3. Two issues in DDB design: • Fragmentation • Allocation Note: issues not independent, but will cover separately Notes 02

  4. Example Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select * from E from E where loc=Sa where loc=Sb and… and ... Notes 02

  5. Motivation: Two sites: Sa, Sb Qa  Qb Sa Sb Example Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select * from E from E where loc=Sa where loc=Sb and… and ... Notes 02

  6. It does not take a rocket scientist to figure out fragmentation... Notes 02

  7. 5 Joe Sa 10 # NM Loc Sal E 7 Sally Sb 25 8 Tom Sa 15 .. .. F # NM Loc Sal # NM Loc Sal 5 Joe Sa 10 7 Sally Sb 25 .. 8 Tom Sa 15 .. At Sb At Sa Notes 02

  8. F = { F1, F2 } F1 =  loc=SaE F2 =  loc=SbE Notes 02

  9. F = { F1, F2 } F1 =  loc=SaE F2 =  loc=SbE  called primary horizontal fragmentation Notes 02

  10. Fragmentation • Horizontal Primary depends on local attributes R Derived depends on foreign relation • Vertical R Notes 02

  11. Fragmentation • Horizontal Primary depends on local attributes R Derived depends on foreign relation • Vertical R Fragmentation also called Sharding Notes 02

  12. Three common horizontal partitioning techniques • Round robin • Hash partitioning • Range partitioning Notes 02

  13. Round robin R D0 D1 D2 t1 t1 t2 t2 t3 t3 t4 t4 ... t5 • Evenly distributes data • Good for scanning full relation • Not good for point or range queries Notes 02

  14. Hash partitioning R D0 D1 D2 t1h(k1)=2 t1 t2h(k2)=0 t2 t3h(k3)=0 t3 t4h(k4)=1 t4 ... • Good for point queries on key; also for joins • Not good for range queries; point queries not on key • If hash function good, even distribution Notes 02

  15. Range partitioning R D0 D1 D2 t1: A=5 t1 t2: A=8 t2 t3: A=2 t3 t4: A=3 t4 ... • Good for some range queries on A • Need to select good vector: else unbalance  data skew  execution skew partitioning vector 4 7 V0 V1 Notes 02

  16. Which are good fragmentations? Example: F = { F1, F2 } F1 = sal<10 E F2 = sal>20 E Notes 02

  17. Which are good fragmentations? Example: F = { F1, F2 } F1 = sal<10 E F2 = sal>20 E  Problem: Some tuples lost! Notes 02

  18. Which are good fragmentations? Second example: F = { F3, F4 } F3 = sal<10 E F4 = sal>5 E Notes 02

  19. Which are good fragmentations? Second example: F = { F3, F4 } F3 = sal<10 E F4 = sal>5 E  Tuples with 5 < sal < 10 are duplicated... Notes 02

  20. Prefer to deal with replication explicitly Example: F = { F5, F6, F7 } F5 = sal  5 E F6 = 5< sal <10 EF7 = sal  10 E  Then replicate F6 if convenient (part of allocation problem) Notes 02

  21. Desired properties for horizontal fragmentation R F ={ F1, F2, … } (1) Completeness t  R,  Fi  Fsuch that t Fi Notes 02

  22. (2) Disjointness t  Fi,  Fj such that tFj, i  j, Fi, FjF (3) Reconstruction - ignore Notes 02

  23. How do we get completeness and disjointness? (1) Check it “manually”! e.g., F1 =  sal<10 E ; F2 =  sal10 E Notes 02

  24. How do we get completeness and disjointness? (2) “Automatically” generate fragments with these properties Desired simple predicates  Fragments Notes 02

  25. Example of generation • Say queries use predicates: A<10, A>5, Loc = SA, Loc = SB • Next: - generate “minterm” predicates - eliminate useless ones Notes 02

  26. Minterm predicates (part I) (1) A<10  A>5  Loc=SA Loc=SB (2) A<10  A>5  Loc=SA ¬(Loc=SB) (3) A<10  A>5  ¬(Loc=SA)  Loc=SB (4) A<10  A>5  ¬(Loc=SA)  ¬(Loc=SB) (5) A<10  ¬(A>5)  Loc=SA Loc=SB (6) A<10  ¬(A>5)  Loc=SA ¬(Loc=SB) (7) A<10  ¬(A>5)  ¬(Loc=SA)  Loc=SB (8) A<10  ¬(A>5)  ¬(Loc=SA)  ¬(Loc=SB) Notes 02

  27. Minterm predicates (part I) (1) A<10  A>5  Loc=SA Loc=SB (2) A<10  A>5  Loc=SA ¬(Loc=SB) (3) A<10  A>5  ¬(Loc=SA)  Loc=SB (4) A<10  A>5  ¬(Loc=SA)  ¬(Loc=SB) (5) A<10  ¬(A>5)  Loc=SA Loc=SB (6) A<10  ¬(A>5)  Loc=SA ¬(Loc=SB) (7) A<10  ¬(A>5)  ¬(Loc=SA)  Loc=SB (8) A<10  ¬(A>5)  ¬(Loc=SA)  ¬(Loc=SB) Notes 02

  28. 5 < A < 10 A  5 Minterm predicates (part I) (1) A<10  A>5  Loc=SA Loc=SB (2) A<10  A>5  Loc=SA ¬(Loc=SB) (3) A<10  A>5  ¬(Loc=SA)  Loc=SB (4) A<10  A>5  ¬(Loc=SA)  ¬(Loc=SB) (5) A<10  ¬(A>5)  Loc=SA Loc=SB (6) A<10  ¬(A>5)  Loc=SA ¬(Loc=SB) (7) A<10  ¬(A>5)  ¬(Loc=SA)  Loc=SB (8) A<10  ¬(A>5)  ¬(Loc=SA)  ¬(Loc=SB) Notes 02

  29. Minterm predicates (part II) (9) ¬(A<10)  A>5  Loc=SA Loc=SB (10) ¬(A<10)  A>5  Loc=SA¬(Loc=SB) (11) ¬(A<10)  A>5 ¬(Loc=SA)  Loc=SB (12) ¬(A<10)  A>5 ¬(Loc=SA) ¬(Loc=SB) (13) ¬(A<10) ¬(A>5)  Loc=SA Loc=SB (14) ¬(A<10) ¬(A>5)  Loc=SA¬(Loc=SB) (15) ¬(A<10) ¬(A>5) ¬(Loc=SA)  Loc=SB (16) ¬(A<10) ¬(A>5) ¬(Loc=SA) ¬(Loc=SB) Notes 02

  30. A  10 Minterm predicates (part II) (9) ¬(A<10)  A>5  Loc=SA Loc=SB (10) ¬(A<10)  A>5  Loc=SA¬(Loc=SB) (11) ¬(A<10)  A>5 ¬(Loc=SA)  Loc=SB (12) ¬(A<10)  A>5 ¬(Loc=SA) ¬(Loc=SB) (13) ¬(A<10) ¬(A>5)  Loc=SA Loc=SB (14) ¬(A<10) ¬(A>5)  Loc=SA¬(Loc=SB) (15) ¬(A<10) ¬(A>5) ¬(Loc=SA)  Loc=SB (16) ¬(A<10) ¬(A>5) ¬(Loc=SA) ¬(Loc=SB) Notes 02

  31. Final fragments: F2: 5 < A < 10 Loc=SA F3: 5 < A < 10  Loc=SB F6: A  5  Loc=SA F7: A  5 Loc=SB F10: A  10  Loc=SA F11: A  10 Loc=SB Notes 02

  32. Note: elimination of useless fragments depends on application semantics: e.g.: if LOC could be  SA,  SB, we need to add fragments F4: 5 <A <10 Loc  SA Loc  SB F8: A  5 Loc  SA Loc  SB F12: A  10  Loc  SA Loc  SB Notes 02

  33. Why does this work? Predicates: p1  p2  p3  p4 p1  p2  p3  ¬p4 ¬ p1  ¬p2  ¬p3  ¬p4 ... Notes 02

  34. (1) Completeness: Take t  R pi(t) must be T or F! Say p1(t) =T p2(t) = T p3(t) =F p4(t) =F Then t is in fragment with predicate p1  p2  ¬p3  ¬p4 Notes 02

  35. (2) Disjointness Say t  Fragment p1 p2  ¬p3  ¬p4 Then: p1(t) = T, p2(t) = T, p3(t) = F, p4(t)= F  t cannot be in any other fragment! Notes 02

  36. Summary • Given simple predicates Pr= { p1, p2,.. pm } minterm predicates are M={m | m = pk*, 1  k m } where pk* is pk or is ¬ pk pkPr • Fragments m R for all m  M are • complete and disjoint Notes 02

  37. Another Desired Fragmentation Property:Match Access Patterns data A try to place in same fragment frequently accessed together data B data C Notes 02

  38. Return to example: E(#, NM, LOC, SAL,…) Common queries: Qa: select * Qb: select * from E from E where LOC=Sa where LOC=Sb and … and ... Notes 02

  39. Three choices: (1) Pr = { } F1 ={ E } (2) Pr = {LOC=Sa, LOC=Sb} F2={  loc=Sa E,  loc=Sb E } (3) Pr = {LOC=Sa, LOC=Sb, Sal<10} F3={ loc=Sa  sal<10 E, loc=Sa  sal10 E, loc=Sb  sal<10E, loc=Sb  sal10 E } Notes 02

  40. In other words: Qa: Select … loc = Sa ... Loc=Sa  sal < 10 Qb: Select … loc = Sb ... Loc=Sa  sal  10 F3 F1 F2 Loc=Sb  sal < 10 Loc=Sb  sal  10 Notes 02

  41. F2 is good… (not F1 , F3 ) In other words: Qa: Select … loc = Sa ... Loc=Sa  sal < 10 Qb: Select … loc = Sb ... Loc=Sa  sal  10 F3 F1 F2 Loc=Sb  sal < 10 Loc=Sb  sal  10 Notes 02

  42. Derived horizontal fragmentation Example: E(#, NM, SAL, LOC)F={ E1, E2} by LOC J(#, DES,…) Common query for project: [Given employee name, list projects (s)he works in] Notes 02

  43. E2 E1 (at Sa) (at Sb) J Notes 02

  44. J1 = J E1 J2 = J E2 E2 E1 (at Sa) (at Sb) J2 J1 Notes 02

  45. Derived horizontal fragmentation F could be primary or derived R, F = { F1, F2, ... Fn}  S, D = {D1, D2, …Dn} where Di =S Fi Convention: R is owner S is member Notes 02

  46. Checking completeness and disjointness of derived fragmentation Example: Say J is:  But no #= 33in E1 nor in E2! This J tuple will not be in J1 nor J2 Fragmentation not complete Notes 02

  47. To get completeness Need to enforce referential integrity constraint: join attr(#) of member relation  joint attr(#) of owner relation Notes 02

  48. Example: E2 E1 Fragmentation is not disjoint! J J1 J2 Notes 02

  49. To get disjointness Join attribute(#) should be key of owner relation Notes 02

  50. Summary: horizontal fragmentation • Type: primary, derived • Properties: completeness, disjointness Notes 02

More Related