UBI529

UBI529 3. Distributed Graph Algorithms

2.4Distributed Path Traversals • Distributed BFS Algorithms • Distributed DFS Algorithms

Bellman-Ford BFS Tree • Algorithm : Use a variant of the flooding algorithm. Each node and each message store an integer which corresponds to the distance from the root. The root stores 0, every other node initially ∞. The root starts the flooding algorithm by sending a message “1” to all neighbors. • A node u with integer x receives a message “y” from a neighbor v: if y < x then node u stores y (instead of x) and sends “y+1” to all neighbors (except v).

Distributed Bellman-Ford BFS Algorithm • 1. Initially, the root sets L(r0) = 0 and all other • vertices set L(v) = 1. • 2. The root sends out the message Layer(0) to all • its neighbors. • 3. A vertex v, which gets a Layer(d) message • from a neighbor w does: • If d + 1 < L(v) then parent(v) = w; • L(v) = d + 1; • Send Layer(d + 1) to all neighbors except w. • Time complexity: O(D). • Message Complexity: O(n|E|).

Analysis • Analysis of Algorithm: The time complexity of Algorithm 3.10 is O(D), the message complexity is O(n|E|), where D is the diameter of the graph. • Proof: We can prove the time complexity by induction. We claim that a node at distance d from the root has received a message “d” by time d. The root knows by time 0 that it is the root. A node v at distance d has a neighbor u at distance d-1. Node u by induction sends a message “d” to v at time d-1 or before, which is then received by v at time d or before. • Message complexity : A node can reduce its integer at most n-1 times; each of these times it sends a message to all it neighbors. If all nodes do this we have O(n|E|) messages.

Remarks • There are graphs and executions that produce O(n|E|) messages. • How does the algorithm terminate? • Algorithm 3.8 has the better message complexity; algorithm 3.10 has the better time complexity. The currently best known algorithm has message complexity O(|E|+n log3 n) and time complexity O(D log3 n). • How do we find the root?!? Leader election in an arbitrary graph: FloodMax algorithm. Termination? Idea: Each node that believes to be the “max” builds a spanning tree… (More for example in Chapter 15 of Nancy Lynch “Distributed Algorithms”)

Distributed DFS • Distributed DFS algorithm: There is a single message called the token • 1. Start exploration (visit) at root r. • 2. When v is visited for the first time: • 2.1 Inform all neighbors of v that v has been visited. • 2.2 Wait for acknowledgment from all neighbors. • 2.3 Resume the DFS process. • The above algorithm ensures that only tree edges • are traversed. • Hence time complexity is O(n). • Message complexity is O(|E|).

2.5 Matching • Matching • Vertex Covers

Matching • Def. 2.4.1 : AmatchingMin a graph G is a set of nonloop edges with no shared endpoints. The vertices incident to M aresaturated (matched)by M and the others areunsaturated (unmatched).Aperfect matching covers all vertices of the graph (all are saturated) • Rem. : Odd order complete graphs have no perfect matching. • K2n+1 has po perfect matching • K5 and its matching M :

Maximal and Maximum Matching • Def. 2.4.2 : Amaximal matchingin a graph G is a matching that cannot be enlarged by adding an edge • Def. 2.4.3 : Amaximum matchingin a graph G is a matching of maximum size among all matchings

Vertex Cover • Def. 2.4.6 : A vertex cover of a graph G is a set P V(G) such that each edge e of G has one vertex in P. • Theorem 2.4.2 (Konig, 1931) : If G is a bipartite graph, then the maximum size of a matching in G equals the minimum size of a cover. • Ex. : Matching (bold lines) and vertex cover (squares) differ by one for an odd cycle

2.6Independent Sets and Dominating Sets

Independent Sets • Def. 2.5.1 : Anindependent set, stable set, or coclique is a set of vertices in a graph G, no two of which are adjacent. That is, it is a set V of vertices such that for every two vertices in V, there is no edge connecting the two. • Def. 2.5.2 : A maximum independent set is the largest independent set for a given graph. The problem of finding such a set is called the maximum independent set problem and is an NP-hard problem. • Rem. : It is very unlikely that an efficient algorithm for finding a maximum independent set of a graph exists.

Independent Sets • Rem. : If a graph has an independent set of size k, then its complement (the graph on the same vertex set, but complementary edge set) has a clique of size "k". • Rem. : The problem of deciding if a graph has an independent set of a particular size is the independent set problem. It is computationally equivalent to deciding if a graph has a clique of a particular size. • Rem. : The decision version of Independent Setand consequently, clique) is known to be NP-complete. However, the problem of finding the maximum independent set is NP-hard.

A Distributed Algorithm to find MIS • Same as the Distributed DFS Algorithm (revisited) : • 1. Start exploration (visit) at root r. • 2. When v is visited for the first time: • 2.1 Inform all neighbors of v that v has been visited. • 2.2 Wait for acknowledgment from all neighbors. • 2.3 Resume the DFS process. • Whenever the token reaches an unmarked vertex, it adds it to MIS and marks its neighbors as excluded.

Dominating Sets • Def. 2.5.4 : Adominating set for a graph G = (V, E) is a subsetV′ of V such that every vertex not in V′ is joined to at least one member of V′ by some edge. The domination number γ(G) is the number of vertices in the smallest dominating set for G. • Def. 2.5.5 : The connected dominating set problem is to find a minimum size subset S of vertices such that subgraph induced by S is connected and S is a dominating set. This problem is NP-hard. • Def. 2.5.6 : A total dominating set is a set of vertices such that all vertices in the graph (including the vertices in the dominating set themselves) have a neighbor in the dominating set. • Def. 2.5.7 : An independent dominating set is a set of vertices that form a dominating set and are independent.

Dominating Sets and Independent Sets • Rem. : Dominating sets are closely related to independent sets: a maximal independent set in a graph is necessarily a minimal dominating set. However, dominating sets need not be independent • Rem. : The dominating set problemconcerns testing whether γ(G) ≤ K for a given input K; it is NP-complete (Garey and Johnson 1979). Another NP-complete problem involving domination is the domatic number problem, in which one must partition the vertices of a graph into a given number of dominating sets; the maximum number of sets in any such partition is the domatic numberof the graph. • Rem. : If S is a connected dominating set, one can form a spanning treeof G in which S forms the set of non-leaf nodes of the tree; conversely, if T is any spanning tree in a graph with more than two vertices, the non-leaf nodes of T form a connected dominating set. Therefore, finding minimum connected dominating sets is equivalent to finding spanning trees with the maximum possible number of leaves.

A Greedy Central Algorithm to find DS • 1. First, select the vertex (or vertices) with the most neighbors. (This will be the vertex (or vertices) which have the largest degree) If that makes a dominating set, stop, you are done. • 2. Otherwise, choose the vertices with the next largest degree, and check to see if you are done. • 3. Keep doing this, until you have found a dominating set, then stop.

Guha and Khuller Algorithm to find CDS, 1997 • Idea : Grow a tree T starting from the vertex with the highest degree, at each step, scan a node by adding all of its edges and neighbors to T. At the end, all non-leaf nodes are in CDS. • 1. Initially mark all vertices WHITE • 2. Mark the highest degree vertex BLACK and scan it • 3. While there are WHITE nodes • 4. Pick a vertex v from marked nodes and color it BLACK • 5. Color all the neighbors of v GRAY and add them to T • Black nodes will form the CDS • Which node to pick ? : Node with the highest WHITE neighbors. This unfortunately does not always give the optimum results.

Example where scanning rule fails • Optimal CDS is any path (size 4) from u to v . Alg. Chooses u and colors N(u) gray. Then it chooses a node from N(u) and scans it, making it black. If u has a degree of d, the algorithm ends in d+2 steps. Alg. picks the CDS shown in black.

Guha and Khuller Algorithm: Modification • Define a new scanning rule for a pair of adjacent vertices u and v • Let u be gray and v be white • Scanning means, first making u black (this makes v and some other nodes gray), then coloring v black which makes more nodes gray • The total number of nodes colored gray is the yield of the scan step • Rule : At each step, either a single vertex or a pair of vertices are scanned, depending on whichever gives the higher yield. • This simple modifiction yields a much better approximation to the OPTDS (proof omitted). Ex. : Try Modified GKA with the pervious diagram NOTE : GKA is a central algorithm.

A Distributed DS Algorithm • All nodes are initially unmarked • 1. Each node exchanges neighbor sets with its neighbors • 2. Mark any node if it has two neighbors that are not directlyconnected • If the original graph is connected, the resulting set of markednodes is a dominating set • The resulting set of marked nodes is connected • The shortest path between any two nodes does not include • any nonmarked nodes • The dominating set is not minimal

Pruning Heuristics • After constructing a nonminimal dominating set, it can be reduced : • Unmark a node v if its neighborhood is included in theneighborhoods of two marked neighbors u and w and v has thesmallest identifier • Unmark a node v if its neighborhood is included in theneighborhoods of two marked neighbors u and w and v has thesmallest identifier • The Algorithm only requires O(Delta^2) time to exchange • neighborhood sets and constant time to reduce the set.

Wu and Li CDS Algorithm, 2001 • Idea : Wu and Li CDS Algorithm is a step wiseoperational distributed algorithm, in which every node has to wait for others in • lock state. • Initially each vertex marks itselfWHITE indicating that it is not dominated yet. In the first phase, a vertexmarks itself BLACK if any two of its neighbors are not connected to each otherdirectly. In the second phase, a BLACK marked vertex v changes its color toWHITE if either of the following conditions is met: • 1. 9u 2 N(v) which is marked BLACK such that N[v] N[u] and id(v) < • id(u); • 2. 9u,w 2 N(v)which is marked BLACK such that N(v) N(u)SN(w) and • id(v) = min{id(v), id(u), id(w)};

2.7 Clustering Outbreak of cholera deaths in London in 1850s.Reference: Nina Mishra, HP Labs

Clustering • Clustering. Given a set U of n objects labeled p1, …, pn, classify into coherent groups. • Distance function. Numeric value specifying "closeness" of two objects. • Fundamental problem. Divide into clusters so that points in different clusters are far apart. • Routing in mobile ad hoc networks. • Identify patterns in gene expression. • Document categorization for web search. • Similarity searching in medical image databases • Skycat: cluster 109 sky objects into stars, quasars, galaxies. photos, documents. micro-organisms number of corresponding pixels whose intensities differ by some threshold

Clustering of Maximum Spacing • k-clustering. Divide objects into k non-empty groups. • Distance function. Assume it satisfies several natural properties. • d(pi, pj) = 0 iff pi = pj(identity of indiscernibles) • d(pi, pj)  0 (nonnegativity) • d(pi, pj) = d(pj, pi) (symmetry) • Spacing. Min distance between any pair of points in different clusters. • Clustering of maximum spacing. Given an integer k, find a k-clustering of maximum spacing. spacing k = 4

Greedy Clustering Algorithm • Single-link k-clustering algorithm. • Form a graph on the vertex set U, corresponding to n clusters. • Find the closest pair of objects such that each object is in a different cluster, and add an edge between them. • Repeat n-k times until there are exactly k clusters. • Key observation. This procedure is precisely Kruskal's algorithm(except we stop when there are k connected components). • Remark. Equivalent to finding an MST and deleting the k-1 most expensive edges.

Greedy Clustering Algorithm: Analysis • Theorem. Let C* denote the clustering C*1, …, C*k formed by deleting thek-1 most expensive edges of a MST. C* is a k-clustering of max spacing. • Pf. Let C denote some other clustering C1, …, Ck. • The spacing of C* is the length d* of the (k-1)st most expensive edge. • Let pi, pjbe in the same cluster in C*, say C*r, but different clusters in C, say Cs and Ct. • Some edge (p, q) on pi-pj path in C*r spans two different clusters in C. • All edges on pi-pj path have length  d*since Kruskal chose them. • Spacing of C is  d* since p and qare in different clusters. ▪ Ct Cs C*r p q pj pi

A Distributed Clustering Algorithm • 1. Each node determines its local ranking property and • exchanges it with its neighbors • 2. A node can become clusterhead if it has the highest (or • lowest) rank among all its undecided neighbors • 3. It changes its state and announces it to all of its neighbors • 4. Nodes that hear about a clusterhead next to them switch to • cluster member and announce this to their neighbors • Similar to the Leader Election problem ( we will see this in Part II)

2.8 Connectivity

Connectivity Definitions • Def. 2.8.1 : A vertex cut of a graph G is a set S V(G) such that g-S has more than one component. The connectivity of G (к(G)) is the minimum size of a vertex set S such that G-S is disconencted or has only one vertex. A graph is k-connected if its connectivity is at least k. • Rem. :Connectivity ofк(Kn) = n-1

Sequential Connectivity Algorithms (revisited) • Alg. 2.8.1 [Connectivity] : • Idea : To determine whether a graph is connected (or not) can be done efficiently with a BFS or DFS from any vertex. If the tree generated has the same number of vertices as the graph then it is connected • Alg. 2.8.2 [Strong Connectivity] :Can determine if G is strongly connected in O(m + n) time. • Pf. • Pick any node s. • Run BFS from s in G. • Run BFS from s in Grev. • Return true iff all nodes reached in both BFS executions.

2.9Distributed Routing Algorithms • Routing • Distributed Bellman-Ford Algorithm • Chandy Misra Algorithm • Link State Algorithm

Routing problem • A Fundamental problem of Computer Networks • Unicast routing (or just simply routing) is theprocess of determining a “good" path or route to senddata from the source to the destination. • Typically, agood path is one that has the least cost.

Routing (borrowed from cisco documentation http://www.cisco.com)

Most shortest path algorithms are adaptations of the classic Bellman-Ford algorithm. Computes shortest path if there are no cycle of negative weight. Let D(j) = shortest distance of j from initiator 0. Thus D(0) = 0 Routing: Shortest Path w(0,m),0 0 m j (w(0,j)+w(j,k)), j The edge weights can represent latency or distance or some other appropriate parameter like power. k Classical algorithms: Bellman-Ford, Dijkstra’s algorithm are found in most algorithm books. What is the difference between an (ordinary) graph algorithm and a distributed graph algorithm?

Revisiting Bellman Ford : basicidea Consider a static topology Process 0 sends w(0,i),0 to neighbor i {program for process i} do message = (S,k)  S < D(i) --> if parent ≠ k -> parent := k fi; D(i) := S; send (D(i)+w(i,j),i) to each neighbor j ≠ parent;  message (S,k)  S ≥ D(i) --> skip od Shortest path Computes the shortest distance to all nodes from an initiator node The parent pointers help the packets navigate to the initiator

Chandy Misra Distributed Shortest Path Algorithm • program shortest path (for process i > 0} • define D,S : distance {S = value of distance in message} • parent : process; • deficit : integer; • N(i) : set of successors of process i; • { ecah message has format (distance, sender)} • initially D = inf., parent =i; deficit =0; • {for process 0} • send (w(0,i), 0) to each meighbor i; • deficit :=|N(0)|; • do deficit > 0  ack -> deficit :=deficit-1; • od • {deficit =0 signals termination}

{for process i > 0} do message = (S ,k)  S < D -> ifdeficit > 0 parent ≠ i-> send ack to parent fi; parent := k; D := S; send (D + w(i,j), i) to each neighbor j ≠ parent; deficit := deficit + |N(i)| -1  message (S,k)  S ≥ D -> send ack to sender  ack ->deficit := deficit – 1 deficit = 0  parent  i -> send ack to parent od Chandy Misra Distributed Shortest Path Algorithm 0 2 1 7 1 2 3 2 4 7 4 2 6 6 5 3 Combines shortest path computation with termination detection. Termination is detected when the initiator receives ack from each neighbor

What happens when topology changes ? Distance Vector D for each node i contains N elements D[i,0], D[i,1], D[i,2] … Initialize these to  {Here, D[i,j] defines its distance from node i to node j.} - Each node j periodically sends its distance vector to its immediate neighbors. - Every neighbor i of j, after receiving the broadcasts from its neighbors, updates its distance vector as follows:  k≠ i: D[i,k] = mink(w[i,j] + D[j,k] ) Used in RIP, IGRP etc Distance Vector Routing 1 1 1 1

Node1 thinks d(1,3) = 2 Node 2 thinks d(2,3) = d(1,3)+1 = 3 Node 1 thinks d(1,3) = d(2,3)+1 = 4 and so on. So it will take forever for thedistances to stabilize. A partial remedy isthe split horizon method that will prevent 1 from sending the advertisement about d(1,3) to 2 since its first hop is node 2 Counting to infinity Observe what can happen when the link (2,3) fails.  k≠ i: D[i,k] = mink(w[i,j] + D[j,k] ) Suitable for smaller networks. Larger volume of data is disseminated, but to its immediate neighbors only Poor convergence property

Distance Vector Algorithm • Compute the shortest (least-cost) path betweens and all other nodes in a given undirected graphG = (V,E, c) with real-valued positive edge weights. • Each node x maintains: • 1. a distance label a(x) which is the current knownshortest distance from s to x. • 2. a variable p(x) which contains the identity ofthe previous node on the current known shortest pathfrom s to x. • Initially, a(s) = 0, a(x) = 1, and p(x) is undefinedfor all x != s. • When the algorithm terminates, a(x) = d(s; x),where d(s,x) is the shortest path distance between sand x, and p(x) holds the neighbor of x on the shortestpath from x to s.

DBF Algorithm • The DBF consists of two basic rules: • Update rule: Suppose x with a label a(x) receivesa(z) from a neighbor z. • If a(z) + c(z, x) < a(x), then it updates a(x) to • a(z) + c(z, x) and sets p(x) to be z. • Otherwise a(x) and p(x) are not changed. • Send rule: Let a(x) be a new label adopted by x.Then x sends a(x) to all its neighbors. • The algorithm will work correctly in anasynchronous model also.

Computing All-pairs shortest paths • Generalized to compute the shortest path betweenall pairs of nodes, by maintaining in each node x adistance label ay(x) for every y 2 V . • This is called thedistance vector of x. • 1. Each node stores its own distance vector and thedistance vectors of each of its neighbors. • 2. Whenever something changes, say the weight of anyof its incident edges or its distance vector, the nodewill send its distance vector to all of its neighbors. • The receiving nodes then update their own distancevectors according to the update rule.

Link State Routing Algorithm • A link state (LS) algorithm knows the globalnetwork topology and edge costs. • 1. Each node broadcasts its identity number and costsof its incident edges to all other nodes in the networkusing a broadcast algorithm, e.g., flooding. • 2. Each node can then run the local link statealgorithm and compute the same set of shortest pathsas other nodes. A well-known LS algorithm is theDijkstra's algorithm for computing least-cost paths. • The message complexity and time complexity of thealgorithm is determined by the broadcast algorithm. • If broadcast is done by flooding, the messagecomplexity is O(|E|2). • The time complexity is O(|E|D).

Each node i periodically broadcasts the weights of all edges (i,j) incident on it (this is the link state) to all its neighbors. The mechanism for dissemination is flooding. This helps each node eventually compute the topology of the network, and independently determine the shortest path to any destination node. Link State Routing Smaller volume data disseminated over the entire network Used in OSPF

Each link state packet has a sequence numberseq that determines the order in which the packets were generated. When a node crashes, all packets stored in it are lost. After it is repaired, new packets start with seq = 0. So these new packets may be discarded in favor of the old packets! Problem resolved using TTL Link State Routing

UBI529

UBI529

Presentation Transcript

UBI529 Distributed Algorithms

UBI529 Distributed Algorithms

UBI529

UBI529 Distributed Algorithms