1 / 206

2.06k likes | 2.18k Vues

Locality Sensitive Distributed Computing. David Peleg Weizmann Institute. Structure of mini-course. Basics of distributed network algorithms Locality-preserving network representations Constructions and applications. Part 1: Basic distributed algorithms. Model Broadcast

Télécharger la présentation
## Locality Sensitive Distributed Computing

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Locality Sensitive Distributed Computing**David PelegWeizmann Institute**Structure of mini-course**• Basics of distributed network algorithms • Locality-preserving network representations • Constructions and applications**Part 1: Basic distributed algorithms**• Model • Broadcast • Tree constructions • Synchronizers • Coloring, MIS**The distributed network model**Point-to-point communication network**The distributed network model**Described by undirected weighted graph G(V,E,w) V={v1,…,vn} - Processors (network sites) E - bidirectional communication links**The distributed network model**w: E R+ edge weight function representing transmission costs (usually satisfies triangle inequality) Unique processor ID's: ID : V S S={s1,s2,…} ordered set of integers**Communication**Processor v has deg(v,G)ports (external connection points) Edge e represents pair ((u,i),(v,j)) = link connecting u's port i to v's port j**Communication**• Message transmission from u to neighbor v: • u loads M onto port i • v receives M in input buffer of port j**Assumption:**At most one message can occupy a communication link at any given time (Link is available for next transmission only after previous message is removed from input buffer by receiving processor) Communication Allowable message size = O(log n) bits (messages carry a fixed number of vertex ID's, e.g., sender and destination)**Issues unique to distributed computing**There are several inherent differences between the distributed and the traditional centralized-sequential computational models**Communication**• In centralized setting: Issue nonexistent • In distributed setting: Communication • has its limits (in speed and capacity) • does not come “for free” should be treated as a computational resource such as time or memory (often - the dominating consideration)**Communication as a scarce resource**One common model: LOCAL Assumes local processing comes for free (Algorithm pays only for communication)**Incomplete knowledge**In centralized-sequential setting: Processor knows everything (inputs, intermediate results, etc.) In distributed setting: Processors have very partial picture**Partial topological knowledge**Model of anonymous networks: Identical nodes no ID's no topology knowledge Intermediate models: Estimates for network diameter, # nodes etc unique identifiers neighbor knowledge**Partial topological knowledge (cont)**Permissive models: Topological knowledge of large regions, or even entire network Structured models: Known sub-structure, e.g., spanning tree / subgraph / hierarchical partition / routing service available**Other knowledge deficiencies**• know only local portion of the input • do not know who else participates • do not know current stage of other participants**Coping with failures**In centralized setting: Straightforward - Upon abnormal termination or system crash: Locate source of failure, fix it and go on. In distributed setting: Complication - When one component fails, others continue Ambitious goal: ensure protocol runs correctly despite occasional failures at some machines (including “confusion-causing failures”, e.g., failed processors sending corrupted messages)**Fully synchronous network:**• All link delays are bounded • Each processor keeps local clock • Local pulses satisfy following property: Timing and synchrony Message sent from v to neighbor u at pulse p of v arrives u before its pulse p+1 Think of entire system as driven by global clock**Machine cycle of processors - composed of 3 steps:**• Sendmsgs to (some) neighbors • Wait to receivemsgs from neighbors • Perform some local computation Timing and synchrony**Asynchronous model**• Algorithms are event-driven : • No access to global clock • Messages sent from processor to neighbor arrive within finite but unpredictable time**Asynchronous model** Clock can't tell if message is coming or not: perhaps “the message is still on its way” Impossible to rely on ordering of events (might reverse due to different message transmission speeds)**Nondeterminism**Asynchronous computations are inherently nondeterministic (even when protocols do not use randomization)**Nondeterminism**Reason: Message arrival order may differ from one execution to another (e.g., due to other events concurrently occurring in the system – queues, failures) Run same algorithm twice on same inputs - get different outputs / “scenarios”**Complexity measures**• Traditional (time, memory) • New (messages, communication)**Time**For synchronous algorithm P: Time(P) = (worst case) # pulses during execution For asynchronous algorithm P ? (Even a single message can incur arbitrary delay ! )**Time**For asynchronous algorithm P: Time(P) = (worst-case) # time units from start to end of execution, assuming each message incurs delay < 1 time unit (*)**Note:**• Assumption (*) is used only for performance evaluation, not for correctness. • (*) does not restrict set of possible scenarios – any execution can be “normalized” to fit this constraint • “Worst-case” means all possible inputs and all possible scenarios over each input Time**Memory**Mem(P) = (worst-case) # memory bits used throughout the network MaxMem(P) = maximum local memory**Message complexity**Basic message = O(log n) bits Longer messages cost proportionally to length Sending basic message over edge costs 1 Message(P) = (worst case) # basic messages sent during execution**Distance definitions**Length of path (e1,...,es) = s dist(u,w,G) = length of shortest u - w path in G Diameter: Diam(G) = maxu,vV {dist(u,v,G)}**Distance definitions (cont)**Radius: Rad(v,G) = maxwV {dist(v,w,G)} Rad(G) = minvV {Rad(v,G)} A center of G: vertex v s.t. Rad(v,G)=Rad(G) Observe:Rad(G) < Diam(G) < 2Rad(G)**M**M M Broadcast Goal: Disseminate message M originated at sourcer0 to all vertices in network M M M M**Basic lower bounds**• Thm: • For every broadcast algorithm B: • Message(B) > n-1, • Time(B) > Rad(r0,G) = W(Diam(G))**Tree broadcast**• Algorithm Tcast(r0,T) • Use spanning treeT of G rooted at r0 • Root broadcasts M to all its children • Each node v getting M, forwards it to children**Tree broadcast (cont)**Assume: Spanning tree known to all nodes (Q: what does it mean in distributed context?)**Tree broadcast (cont)**• Claim: For spanning tree T rooted at r0: • Message(Tcast) = n-1 • Time(Tcast) = Depth(T)**BFS (Breadth-First Search) tree =**Shortest-paths tree: The level of each v in T is dist(r0,v,G) Tcast on BFS tree**Corollary:**• For BFS tree T w.r.t. r0: • Message(Tcast) = n-1 • Time(Tcast) < Diam(G) • (Optimal in both) Tcast (cont) But what if there is no spanning tree ?**The flooding algorithm**• Algorithm Flood(r0) • Source sends M on each outgoing link • For other vertex v: • On receiving M first time over edge e: • store in buffer; forward on every edge ≠ e • On receiving M again (over other edges): • discard it and do nothing**Flooding - correctness**• Lemma: • Alg. Flood yields correct broadcast • Time(Flood)=Q(Rad(r0,G)) = Q(Diam(G)) • Message(Flood)=Q(|E|) • in both synchronous and asynchronous model • Proof: • Message complexity: Each edge delivers m at most once in each direction**Gl(v) = l-neighborhood of v**= vertices at distance l or less from v Neighborhoods G0(v) G1(v) G2(v)**Time complexity**Verify (by induction on t) that: After t time units, M has already reached every vertex at distance < t from r0 (= every vertex in the t-neighborhood Gt(r0) ) Note: In asynchronous model, M may have reached additional vertices (messages may travel faster)**Time complexity**• Note: Algorithm Flood implicitly constructs • directed spanning treeT rooted at r0, • defined as follows: • The parent of each v in T • is the node from which v received M • for the first time Lemma: In the synchronous model, T is a BFS tree w.r.t. r0, with depth Rad(r0,G)**Flood time**Note: In the asynchronous model, T may be deeper (< n-1) r0 Note: Time is still O(Diam(G)) even in this case!**Broadcast with echo**Goal: Verify successful completion of broadcast Method: Collect acknowledgements on a spanning tree T**Broadcast with echo**• Converge(Ack) process - code for v • Upon getting M do: • For v leaf in T: • - Send up an Ack message to parent • For v non-leaf: • - Collect Ack messages from all children • - Send Ack message to parent**“Joint ack” for entire subtree Tv rooted at v,**signifying that each vertex in Tv received M Semantics of Ack from v r0 receives Ack from all children only after all vertices received M • Claim: On tree T, • Message(Converge(Ack)) = O(n) • Time(Converge(Ack))=O(Depth(T))**Tree broadcast alg:**Take same tree used for broadcast. Time / message complexities grow by const factor. Flooding alg: Use tree T defined by broadcast Synch. model: BFS tree - complexities double Asynch. model: no guarantee Tree selection

More Related