1 / 206

Locality Sensitive Distributed Computing

Locality Sensitive Distributed Computing. David Peleg Weizmann Institute. Structure of mini-course. Basics of distributed network algorithms Locality-preserving network representations Constructions and applications. Part 1: Basic distributed algorithms. Model Broadcast

marcus
Télécharger la présentation

Locality Sensitive Distributed Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Locality Sensitive Distributed Computing David PelegWeizmann Institute

  2. Structure of mini-course • Basics of distributed network algorithms • Locality-preserving network representations • Constructions and applications

  3. Part 1: Basic distributed algorithms • Model • Broadcast • Tree constructions • Synchronizers • Coloring, MIS

  4. The distributed network model Point-to-point communication network

  5. The distributed network model Described by undirected weighted graph G(V,E,w) V={v1,…,vn} - Processors (network sites) E - bidirectional communication links

  6. The distributed network model w: E R+ edge weight function representing transmission costs (usually satisfies triangle inequality) Unique processor ID's: ID : V  S S={s1,s2,…} ordered set of integers

  7. Communication Processor v has deg(v,G)ports (external connection points) Edge e represents pair ((u,i),(v,j)) = link connecting u's port i to v's port j

  8. Communication • Message transmission from u to neighbor v: • u loads M onto port i • v receives M in input buffer of port j

  9. Assumption: At most one message can occupy a communication link at any given time (Link is available for next transmission only after previous message is removed from input buffer by receiving processor) Communication Allowable message size = O(log n) bits (messages carry a fixed number of vertex ID's, e.g., sender and destination)

  10. Issues unique to distributed computing There are several inherent differences between the distributed and the traditional centralized-sequential computational models

  11. Communication • In centralized setting: Issue nonexistent • In distributed setting: Communication • has its limits (in speed and capacity) • does not come “for free”  should be treated as a computational resource such as time or memory (often - the dominating consideration)

  12. Communication as a scarce resource One common model: LOCAL Assumes local processing comes for free (Algorithm pays only for communication)

  13. Incomplete knowledge In centralized-sequential setting: Processor knows everything (inputs, intermediate results, etc.) In distributed setting: Processors have very partial picture

  14. Partial topological knowledge Model of anonymous networks: Identical nodes no ID's no topology knowledge Intermediate models: Estimates for network diameter, # nodes etc unique identifiers neighbor knowledge

  15. Partial topological knowledge (cont) Permissive models: Topological knowledge of large regions, or even entire network Structured models: Known sub-structure, e.g., spanning tree / subgraph / hierarchical partition / routing service available

  16. Other knowledge deficiencies • know only local portion of the input • do not know who else participates • do not know current stage of other participants

  17. Coping with failures In centralized setting: Straightforward - Upon abnormal termination or system crash: Locate source of failure, fix it and go on. In distributed setting: Complication - When one component fails, others continue Ambitious goal: ensure protocol runs correctly despite occasional failures at some machines (including “confusion-causing failures”, e.g., failed processors sending corrupted messages)

  18. Fully synchronous network: • All link delays are bounded • Each processor keeps local clock • Local pulses satisfy following property: Timing and synchrony Message sent from v to neighbor u at pulse p of v arrives u before its pulse p+1 Think of entire system as driven by global clock

  19. Machine cycle of processors - composed of 3 steps: • Sendmsgs to (some) neighbors • Wait to receivemsgs from neighbors • Perform some local computation Timing and synchrony

  20. Asynchronous model • Algorithms are event-driven : • No access to global clock • Messages sent from processor to neighbor arrive within finite but unpredictable time

  21. Asynchronous model  Clock can't tell if message is coming or not: perhaps “the message is still on its way” Impossible to rely on ordering of events (might reverse due to different message transmission speeds)

  22. Nondeterminism Asynchronous computations are inherently nondeterministic (even when protocols do not use randomization)

  23. Nondeterminism Reason: Message arrival order may differ from one execution to another (e.g., due to other events concurrently occurring in the system – queues, failures)  Run same algorithm twice on same inputs - get different outputs / “scenarios”

  24. Nondeterminism

  25. Complexity measures • Traditional (time, memory) • New (messages, communication)

  26. Time For synchronous algorithm P: Time(P) = (worst case) # pulses during execution For asynchronous algorithm P ? (Even a single message can incur arbitrary delay ! )

  27. Time For asynchronous algorithm P: Time(P) = (worst-case) # time units from start to end of execution, assuming each message incurs delay < 1 time unit (*)

  28. Note: • Assumption (*) is used only for performance evaluation, not for correctness. • (*) does not restrict set of possible scenarios – any execution can be “normalized” to fit this constraint • “Worst-case” means all possible inputs and all possible scenarios over each input Time

  29. Memory Mem(P) = (worst-case) # memory bits used throughout the network MaxMem(P) = maximum local memory

  30. Message complexity Basic message = O(log n) bits Longer messages cost proportionally to length Sending basic message over edge costs 1 Message(P) = (worst case) # basic messages sent during execution

  31. Distance definitions Length of path (e1,...,es) = s dist(u,w,G) = length of shortest u - w path in G Diameter: Diam(G) = maxu,vV {dist(u,v,G)}

  32. Distance definitions (cont) Radius: Rad(v,G) = maxwV {dist(v,w,G)} Rad(G) = minvV {Rad(v,G)} A center of G: vertex v s.t. Rad(v,G)=Rad(G) Observe:Rad(G) < Diam(G) < 2Rad(G)

  33. M M M Broadcast Goal: Disseminate message M originated at sourcer0 to all vertices in network M M M M

  34. Basic lower bounds • Thm: • For every broadcast algorithm B: • Message(B) > n-1, • Time(B) > Rad(r0,G) = W(Diam(G))

  35. Tree broadcast • Algorithm Tcast(r0,T) • Use spanning treeT of G rooted at r0 • Root broadcasts M to all its children • Each node v getting M, forwards it to children

  36. Tree broadcast (cont) Assume: Spanning tree known to all nodes (Q: what does it mean in distributed context?)

  37. Tree broadcast (cont) • Claim: For spanning tree T rooted at r0: • Message(Tcast) = n-1 • Time(Tcast) = Depth(T)

  38. BFS (Breadth-First Search) tree = Shortest-paths tree: The level of each v in T is dist(r0,v,G) Tcast on BFS tree

  39. Corollary: • For BFS tree T w.r.t. r0: • Message(Tcast) = n-1 • Time(Tcast) < Diam(G) • (Optimal in both) Tcast (cont) But what if there is no spanning tree ?

  40. The flooding algorithm • Algorithm Flood(r0) • Source sends M on each outgoing link • For other vertex v: • On receiving M first time over edge e: • store in buffer; forward on every edge ≠ e • On receiving M again (over other edges): • discard it and do nothing

  41. Flooding - correctness • Lemma: • Alg. Flood yields correct broadcast • Time(Flood)=Q(Rad(r0,G)) = Q(Diam(G)) • Message(Flood)=Q(|E|) • in both synchronous and asynchronous model • Proof: • Message complexity: Each edge delivers m at most once in each direction

  42. Gl(v) = l-neighborhood of v = vertices at distance l or less from v Neighborhoods G0(v) G1(v) G2(v)

  43. Time complexity Verify (by induction on t) that: After t time units, M has already reached every vertex at distance < t from r0 (= every vertex in the t-neighborhood Gt(r0) ) Note: In asynchronous model, M may have reached additional vertices (messages may travel faster)

  44. Time complexity • Note: Algorithm Flood implicitly constructs • directed spanning treeT rooted at r0, • defined as follows: • The parent of each v in T • is the node from which v received M • for the first time Lemma: In the synchronous model, T is a BFS tree w.r.t. r0, with depth Rad(r0,G)

  45. Flood time Note: In the asynchronous model, T may be deeper (< n-1) r0 Note: Time is still O(Diam(G)) even in this case!

  46. Broadcast with echo Goal: Verify successful completion of broadcast Method: Collect acknowledgements on a spanning tree T

  47. Broadcast with echo • Converge(Ack) process - code for v • Upon getting M do: • For v leaf in T: • - Send up an Ack message to parent • For v non-leaf: • - Collect Ack messages from all children • - Send Ack message to parent

  48. Collecting Ack’s

  49. “Joint ack” for entire subtree Tv rooted at v, signifying that each vertex in Tv received M Semantics of Ack from v  r0 receives Ack from all children only after all vertices received M • Claim: On tree T, • Message(Converge(Ack)) = O(n) • Time(Converge(Ack))=O(Depth(T))

  50. Tree broadcast alg: Take same tree used for broadcast. Time / message complexities grow by const factor. Flooding alg: Use tree T defined by broadcast Synch. model: BFS tree - complexities double Asynch. model: no guarantee Tree selection

More Related