1 / 42

Locality Sensitive Distributed Computing Exercise Set 2

Locality Sensitive Distributed Computing Exercise Set 2. David Peleg Weizmann Institute. Basic partition construction algorithm. Simple distributed implementation for Algorithm BasicPart Single “thread” of computation (single locus of activity at any given moment). Components

kael
Télécharger la présentation

Locality Sensitive Distributed Computing Exercise Set 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Locality Sensitive Distributed ComputingExercise Set 2 David PelegWeizmann Institute

  2. Basic partition construction algorithm Simple distributed implementation for Algorithm BasicPart Single “thread” of computation (single locus of activity at any given moment)

  3. Components ClusterCons : Procedure for constructing a cluster around a chosen center v NextCtr : Procedure for selecting the next center v around which to grow a cluster RepEdge : Procedure for selecting a representative inter-cluster edge between any two adjacent clusters Basic partition construction algorithm

  4. Cluster construction procedure ClusterCons Goal: Invoked at center v, construct cluster and BFS tree (rooted at v) spanning it Tool: Variant of Dijkstra's algorithm.

  5. Recall: Dijkstra’s BFS algorithm phase p+1:

  6. Main changes to Algorithm DistDijk 1. Ignoring covered vertices: Global BFS algorithm sends exploration msgs to all neighbors save those known to be in tree New variant ignores also vertices known to belong to previously constructed clusters 2. Bounding depth: BFS tree grown to limited depth, adding new layers tentatively, based on halting condition (|G(S)| < |S|·n1/k)

  7. Distributed Implementation • Before deciding to expand tree T • by adding newly discovered layer L: • Count # vertices in L by convergecast process: • Leaf w  T: set Zw = # new children in L • Internal vertex: add and upcast counts.

  8. Distributed Implementation • Root: compare final count Zv to total # vertices in T (known from previous phase). • If ratio ≥ n1/k, then broadcast next Pulse msg (confirm new layer and start next phase) • Otherwise, broadcast message Reject (reject new layer, complete current cluster) • Final broadcast step has 2 more goals: • mark cluster by unique name (e.g., ID of root), • inform all vertices of new cluster name

  9. Distributed Implementation (cont) This information is used to define cluster borders. I.e., once cluster is complete, each vertex in it informs all neighbors of its new residence.  nodes of cluster under construction know which neighbors already belong to existing clusters.

  10. Fact: Algorithm's “center of activity” always located at currently constructed cluster C. Idea: Select as center for next cluster some vertex v adjacent to C (= v from rejected layer) Implementation: Via convergecast process. (leaf: pick arbitrary neighbor from rejected layer, upcast to parent internal node: upcast arbitrary candidate) Center selection procedure NextCtr

  11. Problem: What if rejected layer is empty? Center selection procedure (NextCtr) (It might still be that the entire process is not yet complete: there may be some yet unclustered nodes elsewhere in G) ??  r0

  12. Center selection procedure (NextCtr) Solution: Traverse the graph (using cluster construction procedure within a global search procedure)  r0

  13. Use DFS algorithm for traversing the tree of • constructed cluster. • Start at originator vertex r0, invoke ClusterCons to construct the first cluster. • Whenever the rejected layer is nonempty, choose one rejected vertex as next cluster center • Each cluster center marks a parent cluster in the cluster DFS tree, namely, the cluster from which it was selected Distributed Implementation

  14. Distributed Implementation (cont) • DFS algorithm (cont): • Once the search cannot progress forward (rejected layer is empty) : • the DFS backtracks to previous cluster and looks for new center among neighboring nodes • If no neighbors are available, the DFS process continues backtracking on the cluster DFS tree

  15. Goal: Select one representative inter-cluster edge between every two adjacent clusters C and C' Inter-cluster edge selection RepEdge E(C,C') = edges connecting C and C' (known to endpoints in C, as C vertices know the cluster-residence of each neighbor) r0

  16. Inter-cluster edge selection RepEdge  Representative edge can be selected by convergecast process on all edges of E(C,C'). Requirement:C and C' must select same edge Solution: Using unique ordering of edges - pick minimum E(C,C') edge. Q: Define unique edge order by unique ID's?

  17. Inter-cluster edge selection (RepEdge) E.g., Define ID-weight of edge e=(v,w), where ID(v) < ID(w), as pair h ID(v),ID(w) i, and order ID-weights lexicographically; This ensures distinct weights and allows consistent selection of inter-cluster edges

  18. Inter-cluster edge selection (RepEdge) • Problem: • Cluster C must carry selection process • for every adjacent cluster C' individually • Solution: • Inform each C vertex of identities of all clusters adjacent to C by convergecast + broadcast • Pipeline individual selection processes

  19. (C1,C2,...,Cp) = clusters constructed by algorithm For cluster Ci: Ei = edges with at least one endpoint in Ci ni = |Ci|, mi = |Ei|, ri=Rad(Ci) Analysis

  20. ClusterCons: Depth-bounded Dijkstra procedure constructs Ci and BFS tree in: O(ri2) time and O(niri + mi) messages Analysis (cont)  Time(ClusterCons) = ∑i O(ri2) ≤ ∑i O(rik) ≤ k ∑i O(ni) = O(kn) Q: Prove O(n) bound

  21. Ci and BFS tree cost: O(ri2) time and O(niri + mi) messages Analysis (cont)  Comm(ClusterCons) = ∑i O(niri + mi) Each edge occurs in ≤ 2 distinct sets Ei, hence Comm(ClusterCons) = O(nk + |E|)

  22. DFS process on the cluster tree is more expensive than plain DFS: Analysis (NextCtr) DFS step Deciding next step visiting cluster Ci and deciding the next step requires O(ri) time and O(ni) comm. DFS step

  23. Analysis (NextCtr) • DFS visits clusters in cluster tree O(p) times • Entire DFS process (not counting Procedure ClusterCons invocations) requires: • Time(NextCtr) = O(pk) = O(nk) • Comm(NextCtr) = O(pn) = O(n2)

  24. si = # neighboring clusters surrounding Ci Convergecasting ID of neighboring cluster C' in Cicosts O(ri) time and O(ni) messages For all sineighboring clusters: O(si+ri) time (pipelining) O(sini) messages Analysis (RepEdge)

  25. Pipelined inter-cluster edge selection – similar. As si ≤ n, we get Time(RepEdge) = maxi O(si + ri) = O(n) Comm(RepEdge) = ∑i O(si ni) = O(n2) Analysis (RepEdge)

  26. Analysis Thm: Distributed Algorithm BasicPart requires Time = O(nk) Comm = O(n2)

  27. Example - m-dimensional hypercube: Hm=(Vm,Em), Vm={0,1}m, Em = {(x,y) | x and y differ in exactly one bit} |Vm|=2m, |Em|=m 2m-1, diameter m Ex: Prove that for every m ≥ 0, the m-cube has a 3-spanner with # edges ≤ 7·2m Sparse spanners

  28. Regional Matchings Locality sensitive tool fordistributed match-making

  29. Paradigm for establishing client-server connection in a distributed system (via specified rendezvous locations in the network) Distributed match making Ads of server v: written in locations Write(v) v client u: reads ads in locations Read(u) u

  30. Requirement: “read” and “write” sets must intersect: for every v,u  V, Write(v) Å Read(u) ≠  Regional Matchings Write(v) v Client u must find an ad of server v Read(u) u

  31. Distance considerations taken into account: Client u must find an ad of server v only if they are sufficiently close Regional Matchings (cont) l-regional matching: “read” and “write” sets RW = { Read(v) , Write(v) | vV } s.t. for every v,uV, dist(u,v) ≤ l  Write(v) Å Read(u) ≠ 

  32. Regional Matchings (cont) Degreeparameters: Dwrite(RW) = maxvV |Write(v)| Dread(RW) = maxvV |Read(v)|

  33. Radius parameters: Strwrite(RW) = maxu,vV { dist(u,v) | u  Write(v) } / l Strread(RW) = maxu,vV { dist(u,v) | u  Read(v)} / l Regional Matchings (cont)

  34. [Given graph G, k,l ≥ 1, • construct regional matching RWl,k] • Set S Gsl(V) • (l-neighborhood • cover) Regional matching construction

  35. Build coarsening coverT as in Max-Deg-Cover Thm Regional matching construction

  36. Select a center vertexr0(T) in each cluster TT Regional matching construction

  37. Select for every v a cluster TvT s.t. Gl(v)  Tv Regional matching construction Tv=T1 Gl(v) v

  38. r2 r3 T2 T3 • Set • Read(v) = {r0(T) | vT} • Write(v) = {r0(Tv)} Regional matching construction T1 r1 Gl(v) v Read(v) = {r1,r2,r3} Write(v) = {r1}

  39. Claim: Resulting RWl,k is an l-regional matching. Proof: Consider u,v such that dist(u,v) ≤ l Let Tv be cluster s.t. Write(v) = {r0(Tv)} Analysis

  40. Analysis (cont) By definition, u Gl(v). Also Gl(v)  Tv u  Tv r0(Tv)  Read(u) Read(u) Å Write(v) ≠ 

  41. Thm: For every graph G(V,E,w), l,k≥1, there is an l-regional matching RWl,k with Dread(RWl,k) ≤ 2k n1/k Dwrite(RWl,k) = 1 Strread(RWl,k) ≤ 2k+1 Strwrite(RWl,k) ≤ 2k+1 Analysis (cont)

  42. Taking k=log n we get • Corollary: For every graph G(V,E,w), l≥1, • there is an l-regional matching RWlwith • Dread(RWl) = O(log n) • Dwrite(RWl) = 1 • Strread(RWl) = O(log n) • Strwrite(RWl) = O(log n) Analysis (cont)

More Related