130 likes | 242 Vues
CSE 522 – Algorithmic and Economic Aspects of the Internet. Instructors: Nicole Immorlica Mohammad Mahdian. This lecture. Probabilistic generative models for social networks (in particular web graph). Why look for generative models?. Designing and testing algorithms for the web E.g.:
 
                
                E N D
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian
This lecture Probabilistic generative models for social networks (in particular web graph)
Why look for generative models? • Designing and testing algorithms for the web • E.g.: • Compressing the web graph • Designing crawling strategies • Search algorithms on P2P networks • … • Explaining why web has certain properties • For example, the central limit theorem tells us why we often see the Gaussian distribution in practice. • Is there a similar explanation for the power law distribution? • Predicting what “might” happen in the future • E.g.: An AIDS epidemic? An Internet black out? A residential segregation?
Characteristics of a good model • Simple • Plausible • Exihibits the observed properties • Power law • Small world • Locally dense, globally sparse
Power law distribution • From last lecture: power laws everywhere! • Income distribution (Pareto 1896) • Word frequencies (Estoup 1916, Zipf 1932) • City population (Auerbach 1913, Zipf 1949) • Scientific productivity (Lotka 1926) • Internet graph degree dist (FFF 1999) • Web graph degree dist (BKMRRSTW 2000) • Dist. of file sizes • … • Why?
Models and explanations for power law • Optimization (“power law is the best design”) • Mandelbrot 1953: Zipf’s law is the most efficient design. • Carlson & Doyle 1999, Fabrikant et al. 2002 (HOT) • Monkeys typing randomly • Miller 1957: even a monkey typing randomly can generate a power law. • Multiplicative processes & Log-normal dist. • Gibrat 1930, Champernowne 1955, Gabaix 1999 • Preferential growth (“the rich get richer”) • Simon 1955, Yule 1925
Log-normal distribution • Central limit Thm: Product of many indep. distributions is approximately log-normal.
Multiplicative process and power law • Multiplicative processes can sometimes generate power law instead of log-normal: • Multiplicative process with a minimum Chambernowne 1953, Gabaix 1999 • Random stopping time Montroll and Schlesinger 1982,1983
Preferential growth • The system “grows”. • The probability of a new member joining a group is proportional to its current size. Simon 1955, Yule 1925 (for biological systems) Barabasi and Albert 1999: preferential attachment for web graph
Random graph models • Erdos-Renyi random graphs G(n,p) • n vertices, there is an edge between each pair independently with probability p. • G(n,p) at a glance: • Average degree np. Binomial degree dist. • p < 1/n: union of small simple connected comp. • p > 1/n: a “giant” complex component emerges (still many small connected components) • p > ln(n)/n: connected.
The ACL model • Proposed by Aiello, Chung, and Lu, 2000. • Fix a degree sequence d (e.g., power law). • Put di copies of the i’th vertex. • Pick a random matching. • Contract the di copies of the i’th vertex • Essentially a variant of G(n,p), with the degree distribution explicitly enforced.
Preferential attachment • Start with a graph with one node. • Vertices arrive one by one. • When a vertex arrives, it connects itself to one (m, in general) of the previous vertices, with probability proportional to their degrees.
Preferential attachment • Heuristic analysis (Barabasi-Albert): degree distribution follows a power law with exponent -3. • Theorem (Bollobas, Riordan, Spencer, Tusnady). For d < n1/16, the fraction of vertices that have degree d is almost surely around