1 / 24

Large networks , clusters and Kronecker products

Jure Leskovec (jure@cs.stanford.edu) Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos Faloutsos (CMU), Michael Mahoney (Stanford), Kevin Lang (Yahoo), Anirban Dasgupta (Yahoo).

nerys
Télécharger la présentation

Large networks , clusters and Kronecker products

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jure Leskovec (jure@cs.stanford.edu) Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos Faloutsos (CMU), Michael Mahoney (Stanford), Kevin Lang(Yahoo), AnirbanDasgupta (Yahoo) Large networks, clusters and Kronecker products

  2. Rich data: Networks • Large on-line computing applications have detailed records of human activity: • On-line communities: Facebook (120 million) • Communication: Instant Messenger (~1 billion) • News and Social media: Blogging (250 million) • We model the data as a network (an interaction graph) Can observe and study phenomena at scales not possible before Communication network

  3. Small vs. Large networks • Community (cluster) structure of networks Tiny part of a large social network Collaborations in NetSci(N=380) What is the structure of the network? How can we model that?

  4. [w/ Mahoney, Lang, Dasgupta, WWW ’08] How expressed are communities? S • How community like is a set of nodes? • Idea:Use approximation algorithms for NP-hard graph partitioning problems as experimental probes of network structure. S’ Conductance (normalized cut): • SmallΦ(S) == more community-like sets of nodes

  5. [w/ Mahoney, Lang, Dasgupta, WWW ’08] Network Community Profile Plot • We define: Network community profile (NCP) plot Plot the score of best community of size k k=5 k=7 log Φ(k) Φ(5)=0.25 Φ(7)=0.18 Community size, log k

  6. [w/ Mahoney, Lang, Dasgupta, WWW ’08] NCP plot: Network Science • Collaborations between scientists in Networks [Newman, 2005] Conductance, log Φ(k) Community size, log k

  7. [w/ Mahoney, Lang, Dasgupta, WWW ’08] NCP plot: Large network • Typical example: General relativity collaboration network (4,158 nodes, 13,422 edges)

  8. [w/ Mahoney, Lang, Dasgupta, WWW ’08] More NCP plots of networks

  9. [w/ Mahoney, Lang, Dasgupta, WWW ’08] NCP: LiveJournal(n=5m, e=42m) Better and better communities Communities get worse and worse Φ(k), (conductance) Best community has ~100 nodes k, (community size)

  10. [w/ Mahoney, Lang, Dasgupta, WWW ’08] Community size is bounded! • Each dot is a different network Practically constant!

  11. Structure of large networks Denser and denser core of the network So, what’s a good model? Core contains ~60% nodes and ~80% edges Small good communities Core-periphery (jellyfish, octopus)

  12. [w/ Chakrabarti-Kleinberg-Faloutsos, PKDD ’05] Kronecker product: Definition • Kronecker product of matrices A and B is given by • We define a Kronecker product of two graphs as a Kronecker product of their adjacency matrices N x M K x L N*K x M*L

  13. [w/ Chakrabarti-Kleinberg-Faloutsos, PKDD ’05] Kronecker graphs • Kronecker graph: a growing sequence of graphs by iterating the Kronecker product • Each Kronecker multiplication exponentially increases the size of the graph • One can easily use multiple initiator matrices (G1’, G1’’, G1’’’) that can be of different sizes

  14. [w/ Chakrabarti, Kleinberg, Faloutsos, PKDD ’05] Kronecker graphs Edge probability Edge probability • Kroneckergraphs mimic real networks: • Theorem: Power-law degree distribution, Densification, Shrinking/stabilizing diameter, Spectral properties pij (3x3) (9x9) (27x27) Initiator Starting intuition: Recursion & self-similarity

  15. Various Kronecker initiator matrices

  16. Kronecker graphs: Interpretation • Initiator matrix G1is a similarity matrix • Node u is described with kbinary attributes:u1, u2 ,…, uk • Probabilityof a link between nodes u, v: P(u,v) = ∏ G1[ui, vi] Given a real graph. How to estimate the initiator G1? v 1 0 u a a b b u = (0,1,1,0) 0 v = (1,1,0,1) 1 P(u,v) = b∙d∙c∙b c d c d

  17. Estimating Kronecker graphs • Want to generate realistic networks: How to estimate initiator matrix: • Method of moments[Owen ‘09]: • Compare counts of subgraphs and solve • Maximum likelihood[Leskovec&Faloutsos, ’07]: • arg max P( | G1) • SVD[VanLoan&Pitsianis ‘93]: • Can solve using SVD Compare graphs properties, e.g., degree distribution Given a real network Generate a synthetic network

  18. [w/ Dasgupta-Lang-Mahoney, WWW ’08] Kronecker & Network structure • What do estimated parameters tell us about the network structure? b edges a edges d edges c edges

  19. [w/ Dasgupta-Lang-Mahoney, WWW ’08] Kronecker & Network structure • What do estimated parameters tell us about the network structure? 0.5 edges Core 0.9 edges Periphery0.1 edges Core-periphery (jellyfish, octopus) 0.5 edges

  20. Small vs. Large networks • Small and large networks are very different: G1 = G1 = Scientific collaborations (N=397, E=914) Collaboration network (N=4,158, E=13,422)

  21. Conclusion • Computational tools as probes into the structure of large networks • Community structure of large networks: • Core-periphery structure • Scale to natural community size: Dunbar number • Model:Kronecker graphs • Analytically tractable: provable properties • Can efficiently estimate parameters from data • Implications: • No large clusters: no/little hierarchical structure • Can’t be well embedded – no underlying geometry

  22. Reflections • Why are networks the way they are? • Only recently have basic properties been observed on a large scale • Confirms social science intuitions; calls others into question • What are good tractable network models? • Builds intuition and understanding • Benefits of working with large data • Observe structures not visible at smaller scales

  23. jure@cs.stanford.edu http://cs.stanford.edu/~jure

  24. References • Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, by J. Leskovec, J. Kleinberg, C. Faloutsos, KDD 2005 • Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication, by J. Leskovec, D. Chakrabarti, J. Kleinberg and C. Faloutsos, PKDD 2005 • Scalable Modeling of Real Graphs using Kronecker Multiplication, by J. Leskovec and C. Faloutsos, ICML 2007 • Statistical Properties of Community Structure in Large Social and Information Networks, by J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney, WWW 2008 • Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters, by J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney, Arxiv 2008

More Related