1 / 19

Generative Models for the Web Graph

José Rolim. Generative Models for the Web Graph. Aim. Reproduce emergent properties: Distribution site size Connectivity of the Web Power law distriubutions Small World Properties. Classical Model Random Graphs. Erdos-Renyi Graph G(n,p) n number of nodes p probability of connextion

laken
Télécharger la présentation

Generative Models for the Web Graph

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. José Rolim Generative Models for the Web Graph

  2. Aim • Reproduce emergent properties: • Distribution site size • Connectivity of the Web • Power law distriubutions • Small World Properties

  3. Classical ModelRandom Graphs • Erdos-Renyi Graph G(n,p) • n number of nodes • p probability of connextion • pc threshold probability • p < pc -many disconnected components • p=pc - a large connected component • p=1 – a complete graph

  4. Limitations • To model the web graph: • Constant number of nodes • Same probability among sommets • etc, etc

  5. Web Page Growth Model • Sites with short term (daily) size fluctuations proportional to their size • Assume an overall growth rate a such that: • S(t+1)=a(1+vb)S(t) • S(t)= # pages of site s at time t • v=+-1 – Bernouilli variable avec prob. 0.5 • b= absolute rate of daily fluctuations

  6. Web Page Growth • Donc: S(T)= aT S(0) πT0 (1+nib) ou: • logS(T)=Tloga+logS(0)+ΣT0 log(1+nib)= l log(1+b)+(T-l)log(1-b) • l= # positive fluctuations • Therefore: S(T) has a lognormal distribution or follows a power law:

  7. Web page growth • Probability P(s) of a site to have s pages: • P(S)=ΣiP(s/bi)P(bi)= Σici/Sgi = c/Sg • Power Law • g has been experimentaly evaluated for the web as between 1.6 and 2.0

  8. Small world models • Properties: • Sparse • Cliquishness • Small Diameter • Two models • Edge-reassigning small world network • Edge addition small world network

  9. Edge reassigning model • Evolution starts with a ring of n nodes and each node connected to d nearest neighbors • Then each edge is randomly reassigned to distant nodes with probability p in a round robin fashion • See example page 10 with n=10 and d=4

  10. Edge addition model • At the original ring additional edges are added randomly giving an expected number • p.d.n/2 new edges • p probability of addition of an edge • See example page 13 • Criticism to small world: • No newpages neither deletion of pages • No deletion of links

  11. Rich get richer • Preferential attachement model • Start with a null graph with no nodes • At each time step add a new node and connect it to m nodes selected randomly with probability proportional to their degree • See ex. page 16

  12. Important measures • Average diameter • Cliquishness ( measure the average density of local connections): • Take a node v sith degree d • Its d neighbors have max=d.(d-1)/2 links • Let cv=real number of links / max • C= Σv cv/V.

  13. Remarks on rich get richer • Reproduces the power law of number of links. • Eg: the probability of a page i to have degree di is A/dic • A is proportional to the square of the network • c is a constant • c was found empirically to be 2.9 and theoretically 3

  14. Criticism on Rich Get Richer • Does not allow reconnection of existing edges • Addition of new edges take place only when new nodes are added

  15. Copy models • At each time step a node is added • With prob. p a new edge is created between this node and a randomly chosen node • With prob. 1-p: we choose randomly a node and uniformly one of the out edges and we link the new node to the node that this chosen edge enters.

  16. Remarks • Why is called copy? • There are more elaborated models which allow addition of more than a edge each time • It is also a sort of « rich get richer »

  17. Applications • Distributed search algorithms • Subgraph patterns and communities • Robusteness and vulnerability • Page rank algorithms

More Related