1 / 51

A Random-Surfer Web-Graph Model

A Random-Surfer Web-Graph Model. (Joint work with Avrim Blum & Hubert Chan ). Mugizi Rwebangira. links.html. index.html. http://cnn.com. resume.html. The Web as a Graph. Consider the World Wide Web as a graph, with web pages as nodes and hyperlinks between pages as edges.

kin
Télécharger la présentation

A Random-Surfer Web-Graph Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

  2. links.html index.html http://cnn.com resume.html The Web as a Graph Consider the World Wide Web as a graph, with web pages as nodes and hyperlinks between pages as edges.

  3. Studying the Web Since the Web emerged there has been a lot of interest in: • Empirically studying properties of the Web Graph. • Modeling the Web Graph mathematically. Benefits of Generative Models: • Simulation – When real data is scarce • Extrapolation – How will the graph change? • Understanding– Inspire further research on real data

  4. Power Law f(x) ~ g(x) if Limx→∞ f(x)/g(x) = 1 e.g (x+1) ~ (x+2) The distribution of a random variable X follows a power law if Prob [X=k]~ Ck-α Example: Prob [X=k]=k-2

  5. Power Law: Prob [X=k]=k-2

  6. Power Law Prob [X=k] ~ Ck-α log Prob [X=k] ~ log C –αlog k Prob [X=k]=k-2 log Prob [X=k]= -2 logk

  7. Power Law: Log-Log plot

  8. Power Law contd. More general definition: Prob [X≥k]~ Ck-α Particularly useful if X takes on real values. Sometimes referred to as “heavy tailed” or “scale free.”

  9. Power Laws inDegree distribution Let G be a graph. Let Xk be the proportion of nodes with degree k in G. Then if Xk~ Ck-α we say that G has power law degree distribution.

  10. Properties of the Web Graph A Power-law degree distribution has been observed in a wide variety of graphs including citation networks, social networks, protein-protein interaction networks and so on. It has also been observed in the Web Graph. [Barabási & Albert]

  11. Outline • Background/Previous Work • Motivation • Models • Theoretical results • Experimental results • Conclusions

  12. Classic Random Graph Models • In the G(n,p) random graph model: • There are n nodes. • There is an edge between any two nodes with probability p. • Was proposed by Erdös and Renyi in 1960s.

  13. Online G(n,p) In this model each new node makes k connections to existing nodes uniformly at random. For this talk we will focus on k = 1, hence the graph will be a tree.

  14. T=1 T=2 T=3 ½ ½ T=4 ⅓ ⅓ ⅓ Online G(n,p)

  15. Properties of Online G(n,p) • E[degree of first node] = 1+ 1/2 +1/3+1/4 + …1/n = (log n) • E[max degree] = (log n) • Xk = Proportion of nodes with degree k E[Xk] = (½k) NOT POWER LAWED!!

  16. Online G(n,p)(n=100,000, average of 100 runs)

  17. Preferential Attachment In the Preferential Attachment model, each new node connects to the existing nodes with aprobability proportional to their degree. [Barabási & Albert]

  18. Degree = in-degree + out-degree T=1 T=2 Deg = 1 Deg = 3 T=3 ¾ ¼ T=4 Deg = 1 Deg = 4 Deg = 1 Preferential Attachment

  19. Preferential Attachment E[degree of 1st node] = √n Preferential Attachment gives a power-law degree distribution. [Mitzenmacher, Cooper & Frieze 03, KRRSTU00]

  20. Preferential Attachment

  21. Other Models Kumar et. al. proposed the “copying model.” [KRRSTU00] Leskovec et. al. propose a “forest fire” model which has some similarites to this work. [LKF05]

  22. Outline • Background/Previous Work • Motivation • Models • Theoretical results • Experimental results • Conclusions

  23. Motivating Questions • Why would a new node connect to nodes of high degree? • Are high degree nodes more attractive? • Or are there other explanations? How does a new node find out what the high degree nodes are?

  24. Motivating Questions Motivating Observation: • Suppose each page has a small probability p of being interesting. • Suppose a user does a (undirected) random walk until they • find an interesting page. • If p is small then this is the same as preferential attachment. • What about other processes and directed graphs?

  25. Outline • Background/Previous Work • Motivation • Models • Theoretical results • Experimental results • Conclusions

  26. Start with a single node with a self-loop. T=1 (½) (½)+ (½) (½)+ (½) (½) • Choose a node uniformly at random • With probability p connect • With probability (1-p) connect to its neighbor T=2 T=3 ¾ ¼ Directed 1-step Random Surfer, p=.5

  27. Directed 1-step Random Surfer It turns out this model is a mixture of connecting to nodes uniformly at random and preferential attachment. Has a power-law degree distribution. But taking one step is not very natural. What about doing a real random walk?

  28. Directed Coin Flipping model • Pick a node uniformly at random. 2. Flip a coin of bias p If HEADS connect to current node, else walk to neighbor D C NEW NODE B A RANDOM STARTING NODE 1. COIN TOSS: TAIL (at node A) 2. COIN TOSS: TAIL (at node B) 3. COIN TOSS: HEAD (at node C)

  29. Directed Coin Flipping model • At time 1, we start with a single node with a self-loop. • At time t, we choose a node uuniformly at random. • We then flip a coin of bias p. • If the coin comes up heads, we connect to the current node. • Else we walk to a random neighbor and go to step 3. “each page has equal probability p of being interesting to us”

  30. Outline • Background/Previous Work • Motivation • Models • Theoretical results • Experimental results • Conclusions

  31. Is Directed Coin-Flipping Power-lawed? We don’t know … but we do have some partial results ...

  32. Virtual Degree Definitions: • Let li(u)be the number of levelidescendents of node u. • l1(u) = # of children • l2(u) = # of grandchildren, e.t.c. Let  = (β1, β2,..) be a sequence of real numbers with 1=1. Thenv(u) = 1 + β1 l1(u) + β2 l2(u) + β3 l3(u) + … We’ll call v(u) the “Virtual degree of u with respect to .”

  33. u v(u) = 1 + β1 (2) + β2 (4) + β3 (0) + β4 (0) + ... # of children # of grandchildren Virtual Degree

  34. u Virtual Degree Easy observation: If we set βi = (1-p)i then the expected increase in deg(u) is proportional to v(u). Expected increase in deg(u) = p/t + (1-p)pl1(u)/t + (1-p)2pl2(u)/t + … = (p/t)v(u)

  35. Virtual Degree • Theorem: There always exist βi such that • For i ≥ 1, |βi| · 1. • As i → ∞, βi →0 exponentially. • The expected increase in v(u) is proportional to v(u). Recurrence:1=1, 2=p, i+1=i – (1-p)i-1 E.g., for p=¾, i = 1, 3/4, 1/2, 5/16, 3/16, 7/64,... for p=½, i = 1, 1/2, 0, -1/4, -1/4, -1/8, 0, 1/16, …

  36. Virtual Degree, continued Let vt(u) be the virtual degree of node u at time t and tu be the time when node u first appears. Theorem: For any node u and time t ≥tu, E[vt(u)] = Θ((t/tu)p) So, the expected virtual degrees follow a power law.

  37. Actual Degree We can also obtain lower bounds on the expected values of the actual degrees: Theorem: For any node u and time t ≥tu, E[degree(u)] ≥ Ω((t/tu)p(1-p))

  38. Outline • Background/Previous Work • Motivation • Models • Theoretical results • Experimental results • Conclusions

  39. Experiments • Random graphs of n=100,000 nodes • Compute statistics averaged over 100 runs. • K=1 (Every node has out-degree 1)

  40. Online Erdös-Renyi

  41. Directed 1-Step Random Surfer, p=3/4

  42. Directed 1-Step Random Surfer, p=1/2

  43. Directed 1-Step Random Surfer, p=1/4

  44. Directed Coin Flipping, p=1/2

  45. Directed Coin Flipping, p=1/4

  46. Undirected coin flipping, p=1/2

  47. Undirected Coin Flipping p=0.05

  48. Outline • Background/Previous Work • Motivation • Models • Theoretical results • Experimental results • Conclusions

  49. Conclusions Directed random walk models appear to generate power-laws (and partial theoretical results). Power laws can naturally emerge, even if all nodes have the same intrinsic “attractiveness”.

  50. Open questions • Can we prove that the degrees in the directed coin-flipping model do indeed follow a power law? • Analyze degree distribution for the undirected coin-flipping • model with p=1/2? • Suppose page i has “interestingness” pi. Can we analyze • the degree as a function of t, i and pi?

More Related