Generative Models for the Web Graph

José Rolim Generative Models for the Web Graph

Aim • Reproduce emergent properties: • Distribution site size • Connectivity of the Web • Power law distriubutions • Small World Properties

Classical ModelRandom Graphs • Erdos-Renyi Graph G(n,p) • n number of nodes • p probability of connextion • pc threshold probability • p < pc -many disconnected components • p=pc - a large connected component • p=1 – a complete graph

Limitations • To model the web graph: • Constant number of nodes • Same probability among sommets • etc, etc

Web Page Growth Model • Sites with short term (daily) size fluctuations proportional to their size • Assume an overall growth rate a such that: • S(t+1)=a(1+vb)S(t) • S(t)= # pages of site s at time t • v=+-1 – Bernouilli variable avec prob. 0.5 • b= absolute rate of daily fluctuations

Web Page Growth • Donc: S(T)= aT S(0) πT0 (1+nib) ou: • logS(T)=Tloga+logS(0)+ΣT0 log(1+nib)= l log(1+b)+(T-l)log(1-b) • l= # positive fluctuations • Therefore: S(T) has a lognormal distribution or follows a power law:

Web page growth • Probability P(s) of a site to have s pages: • P(S)=ΣiP(s/bi)P(bi)= Σici/Sgi = c/Sg • Power Law • g has been experimentaly evaluated for the web as between 1.6 and 2.0

Small world models • Properties: • Sparse • Cliquishness • Small Diameter • Two models • Edge-reassigning small world network • Edge addition small world network

Edge reassigning model • Evolution starts with a ring of n nodes and each node connected to d nearest neighbors • Then each edge is randomly reassigned to distant nodes with probability p in a round robin fashion • See example page 10 with n=10 and d=4

Edge addition model • At the original ring additional edges are added randomly giving an expected number • p.d.n/2 new edges • p probability of addition of an edge • See example page 13 • Criticism to small world: • No newpages neither deletion of pages • No deletion of links

Rich get richer • Preferential attachement model • Start with a null graph with no nodes • At each time step add a new node and connect it to m nodes selected randomly with probability proportional to their degree • See ex. page 16

Important measures • Average diameter • Cliquishness ( measure the average density of local connections): • Take a node v sith degree d • Its d neighbors have max=d.(d-1)/2 links • Let cv=real number of links / max • C= Σv cv/V.

Remarks on rich get richer • Reproduces the power law of number of links. • Eg: the probability of a page i to have degree di is A/dic • A is proportional to the square of the network • c is a constant • c was found empirically to be 2.9 and theoretically 3

Criticism on Rich Get Richer • Does not allow reconnection of existing edges • Addition of new edges take place only when new nodes are added

Copy models • At each time step a node is added • With prob. p a new edge is created between this node and a randomly chosen node • With prob. 1-p: we choose randomly a node and uniformly one of the out edges and we link the new node to the node that this chosen edge enters.

Remarks • Why is called copy? • There are more elaborated models which allow addition of more than a edge each time • It is also a sort of « rich get richer »

Applications • Distributed search algorithms • Subgraph patterns and communities • Robusteness and vulnerability • Page rank algorithms

Generative Models for the Web Graph

Generative Models for the Web Graph

Presentation Transcript

CS728 Lecture 5 Generative Graph Models and the Web

Generative Topic Models for Community Analysis

Models of Generative Grammar

Generative Models For Text

Flexible Graph Models for Complex Networks

Graph Algorithms for Modern Data Models

Generative Models vs. Discriminative models

Linear Classification Models: Generative

Generative Models

Graph Models

Generative Models for Image Understanding

Generative Models for Crowdsourced Data

The web graph

Generative Topic Models for Community Analysis

Generative Models

Models of Generative Grammar

Generative Models for Image Analysis

Generative models for automated brain MRI segmentation

Version control for graph-based models

Topic Significance Ranking for LDA Generative Models

Network (graph) Models

Generative Models of Discourse