1 / 26

Statistical properties of network community structure

Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research. Statistical properties of network community structure. Network communities. Communities: Sets of nodes with lots of connections inside and few to outside (the rest of the network) Assumption:

cade
Télécharger la présentation

Statistical properties of network community structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research Statistical properties of network community structure

  2. Network communities • Communities: • Sets of nodes with lots of connections inside and few to outside (the rest of the network) • Assumption: • Networks are (hierarchically) composed of communities (modules) Communities, clusters, groups, modules

  3. Community score (quality) • How community like is a set of nodes? • Want a measure that corresponds to intuition • Conductance (normalized cut): • Φ(S) = # edges cut / # edges inside • SmallΦ(S) corresponds to more community-like sets of nodes

  4. Community score (quality) • Score: Φ(S) = # edges cut / # edges inside

  5. Community score (quality) Bad community Φ=5/7 = 0.7 • Score: Φ(S) = # edges cut / # edges inside

  6. Community score (quality) Bad community Φ=5/7 = 0.7 Better community Φ=2/5 = 0.4 • Score: Φ(S) = # edges cut / # edges inside

  7. Community score (quality) Bad community Φ=5/7 = 0.7 Best community Φ=2/8 = 0.25 Better community Φ=2/5 = 0.4 • Score: Φ(S) = # edges cut / # edges inside

  8. Network Community Profile Plot • We define: Network community profile (NCP) plot Plot the score of best community of size k • Search over all subsets of size k and find best: Φ(k=5) = 0.25 • NCP plot is intractable to compute • Use approximation algorithm

  9. NCP plot: Small Social Network • Dolphin social network • Two communities of dolphins NCP plot Network

  10. NCP plot: Zachary’s karate club • Zachary’s university karate club social network • During the study club split into 2 • The split (squares vs. circles) corresponds to cut B Network NCP plot

  11. NCP plot: Network Science • Collaborations between scientists in Networks Network NCP plot

  12. Geometric and Hierarchical graphs Geometric (grid-like) network – Small social networks – Geometric and – Hierarchical network have downward NCP plot Hierarchical network

  13. Our work: Large networks • Previously researchers examined community structure of smallnetworks (~100 nodes) • We examined more than 70 different large social and information networks Large real-world networks look completely different!

  14. Example of our findings • Typical example: General relativity collaboration network (4,158 nodes, 13,422 edges)

  15. NCP: LiveJournal (N=5M, E=42M) Better and better communities Worse and worse communities Community score Best community has 100 nodes Community size

  16. Explanation: Downward part • Whiskersare responsible for downward slope of NCP plot NCP plot Largest whisker Whiskeris a set of nodes connected to the network by a single edge

  17. Explanation: Upward part • Each new edge inside the community costs more Φ=2/1 = 2 NCP plot Φ=8/3 = 2.6 Φ=64/11 = 5.8 Each node has twice as many children

  18. Suggested Network Structure Denser and denser core of the network Core contains 60% node and 80% edges Whiskers are responsible for good communities Network structure: Core-periphery, jellyfish, octopus

  19. Caveat: Bag of whiskers What if we allow cuts that give disconnected communities? Cut all whiskers and compose communities out of them

  20. Caveat: Bag of whiskers We get better community scores when composing disconnected sets of whiskers Community score Connected communities Bag of whiskers Community size

  21. Comparison to a rewired network Rewired network: random network with same degree distribution

  22. What is a good model? • What is a good model that explains such network structure? • None of the existing models work Flat and Down Flat Down and Flat Geometric Pref. Attachment Small World Pref. attachment

  23. Forest Fire model works • Forest Fire: connections spread like a fire • New node joins the network • Selects a seed node • Connects to some of its neighbors • Continue recursively As community grows it blends into the core of the network

  24. Forest Fire NCP plot rewired network Bag of whiskers

  25. Conclusion and connections • Whiskers: • Largest whisker has ~100 nodes • Independent of network size • Dunbar number: a person can maintain social relationship to 150 people • Bond vs. identity communites • Core: • Newman et al. analyzed 400k node product network • Largest community has 50% nodes • Community was labeled “miscelaneous”

  26. Conclusion and connections • NCP plot is a way to analyze network community structure • Our results agree with previous work on small networks • Butlarge networks are fundamentally different • Large networks have core-periphery structure • Small well isolated communities blend into the core of the networks as they grow

More Related