1 / 29

310 likes | 506 Vues

Overlapping Community Search. Wanyun Cui. Problem. What is community search? Given a graph G(V,E), a query vertex v, find the community that v belong to in G. Done in my recent research < Local Search of Communities in Large Graphs> What is overlapping community search(OCS)?

Télécharger la présentation
## Overlapping Community Search

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Overlapping Community Search**Wanyun Cui**Problem**What is community search? Given a graph G(V,E), a query vertex v, find the community that v belong to in G. Done in my recent research <Local Search of Communities in Large Graphs> What is overlapping community search(OCS)? Given a graph G(V,E), where communities may be overlapping , and a query vertex v, find all the communities that contain v In a network with overlapping communities, different communities may share common vertices.**Motivations**Many real graphs always have overlapping communities a large fraction of proteins belong to several protein complexes simultaneously [1] Online social networks are made of highly overlapping cohesive communities. [2] Applications of overlapping community search Compute the centrality of a given vertex[3] Find the communities of a protein, based on their interactions. Previous research focus on overlapping communities detection, but none for overlapping community search**Challenges**• Computational intractability • NP-Complete as will be proved • How to well define overlapping community search problem remains a challenges • It is difficult to precisely model the complex semantic of real overlapping communities. • Avoid trivial results • For example, when the definition is too restrictive, no meaningful communities may be found • It is difficult to distinguish between two overlapping communities and one large community with two component • Scalable to real large graph with million nodes • Current overlapping community detection can only handle graphs with ten thousands of nodes • Why current community search can not extend to solve OCS?**Related Works**<A Multi-Resolution Approach to Learning with Overlapping Communities>KDD2010 WORKSHOP**Related Works**<Detecting the overlapping and hierarchical community structure in complex networks> New Journal of Physics**Related Works**<Detecting the overlapping and hierarchical community structure in complex networks>**Related Works**<Uncovering the overlapping community structure of complex networks in nature and society> Nature 2005 a k-clique-community as a union of all k-cliques (complete subgraph of size k) that can be reached from each other through a series of adjacent k-cliques (where adjacency means sharing k − 1 nodes)**Related Works**<Detect overlapping and hierarchical community structure in networks>**Problem Definition**Given an undirected graph G(V,E) and a query vertex v \in V, find all the overlapping communities that contain v. A community based on k-clique a union of all k-cliques that Are reachable from each other through a series of adjacent k-cliques Where K-clique is a complete graph with k vertices Two k-cliques are adjacent if they share k-1 vertices A k-clique Cs is reachable to another k-clique Ct if there exists a k-clique path C1=Cs, C2,…,Cj=Ct, such that each Ci is adjacent to Ci+1**Why we use k-clique based community?**Meaningful communities it cannot be too restrictive C1 In real cases, we set k=3~4, which is less restrictive should be based on the density of links C2 should allow overlaps, and strictly distinguish different communities C3 Can be implemented through local information C4**Different Community Measures**F(V)=min{degG[V](v)|v∈V} M1 <The community-search problem and how to plan a successful cocktail party> KDD 2010 F(V)=\sigma{degG[V](v)/|V|} M2 <Greedy approximation algorithms for finding dense components in a graph> APPROX, 2000. M3 kgin and kgout are the total internal and external degrees of the nodes, and \alpha is a parameter <Detecting the overlapping and hierarchical community structure in complex networks>**Why we use k-clique based community?**K-clique community is meaningful in many real networks Scientist collaboration network South Florida Free Association norms list Protein interaction network See Nature 2005**Hardness of the problem**It is a NP-complete problem Can be reduced to maximal clique problem**Naïve Approach**Step 1: Initially VC=v Step 2: For each unvisited vertex u \in VC Set u as visited; Find all k-cliques that contain u; Add those vertices in the k-cliques into VC Step 3: Calculate the adjacency matrix of k-clique result set Step 4: Combine all known k-cliques by union-find Step 5: Return all the k-clique chains which contain v**Optimization 1**• In step 2, find k-clique for each (k-1)-size of subsets instead of each vertex • Example • Suppose v=a, k=4 • At first we find a k-clique {a,b,c,d} • If we try to find the k- cliques that contain b, we will find {b,f,g,h} • According to the definition of adjacency, {b,f,g,h} is meaningless**Optimization 1**• Step 2 • Skc denote the k-cliques found so far, initially Skc=\emptyset • Find all the k-cliques that contain v, and add those k-cliques into Skc • For each unvisited Skckci(k-clique i) • Find all the adjacent k-clique of kci, add them into Skc • We don’t need step 3 and step 4 any more, for we already know the adjacency relationship among those k-cliques in step 2 • Complexity analysis**Property of the naïve solution**• Order-independent**Optimization 2**• For each k-clique kci, finding all the adjacent k-cliques to it costs a lot. How to reduce the time? • DFS vs BFS? • We use DFS order to expand all those k-cliques • When using DFS order, any two adjacent k-cliques in the searching clique sequences .**Optimization 2**• A new data structure • Suppose the current k- clique is kci • Make k list, l1..lk • lj contains all the vertices that have j edges to the current vertices**Optimization 2**• Example • Current k-clique: {a,b,c,d} • Current state • 4 |-> null • 3 |-> e -> null • 2 |-> null • 1 |-> f -> g->h ->null • Enumerate the vertex u that will be replaced • Delete u and maintain the data structure • The left vertices and any vertices in the list lk-1 will form a k-clique**Optimization 3**• Brute-force method for determining whether a k-clique exists in Skc costs a lot • Use hash function to determine whether a k-clique exists in Skc • Desired hash function • Order-indepedent • Another benefit of DFS: reduce the hash time from O(k) to O(1) • Hash function example**Problems and Future Work**What if we change the definition of adjacency? How many k-cliques are produced before step 3? Worst case analysis Real network case More optimization for the algorithm? Theoretical research direction? Boot Strapping**Reference**• [1] Gavin, A. C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002). • [2]Lei Tang, Xufei Wang, Huan Liu and Lei Wang. A Multi-Resolution Approach to Learning with Overlapping Communities. In Workshop on Social Media Analytics, KDD 2010. • [3] Martin G. Everett and Stephen P. Borgatti. Analyzing clique overlap. CONNECTIONS, 21(1):49–61, 1998.**Motivation**• Real networks shows community structure • What is the underlying mechanism accounting for the emergence of community? • How to generate a network with community structure following these principle?**Basic Idea: Distance based**• New vertices are randomly located at n-dimensional space • The probability of a link from v to u is based on the distance between v and u • The probability of v belong to which community is also based on the distance between v and the center of the community**Experiment**• On 1-dimensional space, vertices are well clustered, the modularity of the graph is larger than 0.3 • Higher dimensional space makes lower modularity

More Related