270 likes | 438 Vues
On the Eigenvalue Power Law. Milena Mihail Georgia Tech Christos Papadimitriou U.C. Berkeley. &. P2P. WWW. Network and application studies need properties and models of: Internet graphs & Internet Traffic. Shift of networking paradigm: Open, decentralized, dynamic.
E N D
On the Eigenvalue Power Law Milena Mihail Georgia Tech Christos Papadimitriou U.C. Berkeley &
P2P WWW • Network and application studies • need properties and models of: • Internet graphs & Internet Traffic. • Shift of networking paradigm: • Open, decentralized, dynamic. • Intense measurement efforts. • Intense modeling efforts. Routers Internet Measurement and Models
http://www.etc http://www.etc http://www.ZZZ.edu http://www.XXX.com http://www.etc http://www.XXX.net http://www.YYY.com Internet & WWW Graphs Routers exchanging traffic. Web pages and hyperlinks. 10K – 300K nodes Avrg degree ~ 3
Real Internet Graphs Degrees not sharply concentrated around their mean. Average Degree = Constant A Few Degrees VERY LARGE CAIDA http://www.caida.org
WWW measurement: Kumar et al 99 Internet measurement: Faloutsos et al 99 Degree-Frequency Power Law frequency E[d] = const., but No sharp concentration 1 3 4 5 2 10 100 degree
Degree-Frequency Power Law Models by Kumar et al 00, x Bollobas et al 01, x Fabrikant et al 02 Erdos-Renyi sharp concentration E[d] = const., but No sharp concentration E[d] = const., but No sharp concentration frequency 1 3 4 5 2 10 100 degree
Rank-Degree Power Law Internet measurement: Faloutsos et al 99 UUNET Sprint C&WUSA AT&T BBN degree 1 2 3 4 5 10 rank
Eigenvalue Power Law Internet measurement: Faloutsos et al 99 eigenvalue 1 2 3 4 5 10 rank
2 2 4 4 3 3 This Paper: Large Degrees & Eigenvalues UUNET Sprint degrees C&WUSA AT&T BBN eigenvalues 1 2 3 4 5 10 rank
d 1 1 1 1 1 1 1 1 Principal Eigenvector of a Star
2 4 3 Large Degrees
2 4 3 Large Eigenvalues
Main Result of the Paper The largest eigenvalues of the adjacency martix of a graph whose large degrees are power law distributed (Zipf), are also power law distributed. Explains Internet measurements. Negative implications for the spectral filtering method in information retrieval.
Random Graph Model let Connectivity analyzed by Chung & Lu ‘01
Wwith probability at least Ffor large enough Theorem :
Proof : Step 1. Decomposition LR = Vertex Disjoint Stars - LR-extra LL RR
Proof: Step 2: Vertex Disjoint Stars Degrees of each Vertex Disjoint Stars Sharply Concentrated around its Mean d_i Hence Principal Eigenvalue Sharply Concentrated around
LL has edges Proof: Step 3: LL, RR, LR-extra LR-extra has max degree RR has max degree
LL has edges Proof: Step 3: LL, RR, LR-extra LR-extra has max degree RR has max degree
Vertex Disjoint Stars have principal eigenvalues All other parts have max eigenvalue Proof: Step 4: Matrix Perturbation Theory QED
Implication for Info Retrieval Term-Norm Distribution Problem : Spectral filtering, without preprocessing, reveals only the large degrees.
Implication for Info Retrieval Term-Norm Distribution Problem : Spectral filtering, without preprocessing, reveals only the large degrees. Local information. No “latent semantics”.
Implication for Information Retrieval Term-Norm Distribution Problem : Application specific preprocessing (normalization of degrees) reveals clusters: WWW: related to searching, Kleinberg 97 IR, collaborative filtering, … Internet: related to congestion, Gkantsidis et al 02 Open : Formalize “preprocessing”.