190 likes | 603 Vues
Greedy Approximation Algorithms for finding Dense Components in a Graph. Paper by Moses Charikar. Presentation by Paul Horn. Overview. Differing definitions of density The problem Undirected Case Linear Programming Network Flows Approximation Directed Case Linear Programming
E N D
Greedy Approximation Algorithms for finding Dense Components in a Graph Paper by Moses Charikar Presentation by Paul Horn
Overview • Differing definitions of density • The problem • Undirected Case • Linear Programming • Network Flows • Approximation • Directed Case • Linear Programming • Approximation
Defining Density • Logical definition of density relates the number of edges to the number of possible edges. In other words, given G(V,E)
Problems with Density • This simple definition of density does not make sense when looking for a densest subgraph, as two vertices connected by an edge have density 1, and this problem simplifies to maximum clique
Redefining Density • Instead we define density as the average degree of a subgraph. • This definition of density is appropriate for sparse graphs • This definition is, however, inappropriate for Erdős-Rényi random graphs.
Density of a Directed Graph • Introduced by Kannan and Vinay Given a digraph G(V,E), consider subgraphs S, T and let E(S,T) be the set of directed edges from S to T. Then the density of the sets S and T is The density of the graph G is
The problem • Known exact algorithms for finding a maximum density subgraph of a graph are cubic or slower. • For large graphs, such as the webgraph – or even any sizable chunk of the webgraph this is too slow.
Linear programming • In an undirected case an exactly solution can be solved by maximizing the following LP.
Go with the flow? • Flow-based algorithm to find a maximum density subgraph exists. • Finding a Maximum Density Subgraph, by A.V. Goldberg • Creates a digraph from the undirected graph, and uses flows to partion the graph. • Requires log(n) executions of a max flow algorithm
Getting Greedy… • Since the density of a subgraph S is its average degree, nodes of lowest degree are least likely to be a part of the densest subgraph. • Algorithm: Remove the lowest degree vertex each time, find the maximum density subgraph. • Runs in O(|V|) time. • Theorem: Algorithm is a 2-approximation of f(S)
Directing our Insight • Finding the maximum d(S,T) is harder as we need to find the maximum over all subgraphs S and T. • For our exact case, we can generalize our LP to use |S|/|T| = c as a parameter to give us our new LP(c) • Can be solved in O(n2) linear programs
LP(c) LP(c) A solution to this linear program corresponds to the densests sets S, T such that |S|/|T| = c for a given value of c. Therefore
Approximate this. • Idea: Maintain two sets, S and T. At each iteration remove either the vertex of the lowest ‘degree’ in S or T based on a certain rule. • We define degree of a vertex x in S to be |E({x}, T)| and degree of a vertex y in T to be |E(S,{y})|. • Our rule is based on the same idea of c=|S|/|T| that we found in the linear progam, so each pass finds an S and T that maximize for that particular c.
Analyzing our Approximation • When run over all c values, this algorithm gives us a 2 approximation of d(c). • There are, however, roughly n2possible values of c. • Each iteration can run in O(m+n) time. • Therefore running through all possible values becomes restrictive. • Anis possible in iterations of the algorithm.
Generalizations, and notes • While there is a flow-based algorithm for finding a maximum density subgraph of an undirected graph, none is known for a digraph. • Both cases can be generalized to weighted graphs, however the linear nature of the algorithm does not hold. • Using Fibonacci heaps it can run in O(m+nlogn). (in the directed case, for a single value of c.)
Wrapping Up • Finding dense subgraphs is important in areas such as clustering. • Kannan and Vinay defintion of density motivated by the idea of hubs and authorities. • With large graphs (such as any sizable chunk of the webgraph), solving the n2LP to find the exact densest graph is unrealistic
Wrapping Up: The Sequel • Therefore, the paper • Provides LP solutions to both the directed and undirected cases • Provides a linear approximation algorithm for undirected graph techniques • Generalizes the algorithm to directed graphs, finding sets S and T given |S|/|T|=c. • Observes that this is a 2-aproximation when run over all values of c and a aproximation is possible in iterations.
Future Work • Flow based algorithm for directed case. • The defintion of density which we used does not require S and T to be disjoint. How does this requirement affect the algorithm and it’s complexity? • An n-approximation of d(G) can provide an O(n)-approximation of d’(G)