A Random-Surfer Web-Graph Model

A Random-Surfer Web-Graph Model Avrim Blum, Hubert Chan, Mugizi Rwebangira Carnegie Mellon University

The Web as a Graph Consider the World Wide Web as a graph, with web pages as nodes and links between pages as edges. links.html index.html http://cnn.com resume.html Experiments suggest that the degree distribution of the Web-Graph follows a power law [FFF99].

Power Law The distribution of a quantity X follows a power law if Pr (X=k) = Ck-α Taking the logarithm of both sides: log Pr (X=k) = log C –αlog k Thus if we take a log-logplot of a power law distribution we will obtain a straight line.

Previous Work Barabási and Albert proposed the Preferential Attachment model[BA99]: Each new node connects to the existing nodes with aprobability proportional to their degree. It is known that Preferential Attachment gives a power-law distribution. [Mitzenmacher, Cooper & Frieze 03, KRRSTU00] Other models proposed include the “copying model.” [KRRSTU00]

Motivating Questions • Why would a new node connect to nodes of high degree? • Are high degree nodes more attractive? • Or are there other explanations? How does a new node find out what the high degree nodes are? • Motivating Observation: • Suppose each page has a small probability p of being interesting. • Suppose a user does a (undirected) random walk until they find an interesting page. • If p is small then this is the same as preferential attachment. • What about other processes and directed graphs?

Directed 1-Step Random Surfer At time 1, we start with a single node with a self-loop. At time t, a node is chosen uniformly at random, with probability p the new node connects to this node, or with probability 1-p it connects to a random out-neighbor of that node. (Extension: Repeat process k times for each new node to get out-degree k) Note: This model is just another way of stating the directed preferential attachment model.

T=1 T=2 (½) (½)+ (½) (½)+ (½) (½) T=3 ¾ ¼ T=4 (½) (⅓)+ (½) (⅓)+ (½) (⅓)+(½) (⅓) Directed 1-step Random Surfer, p=.5

Directed Coin Flipping model • At time 1, we start with a single node with a self-loop. • At time t, we choose a node uniformly at random. • We then flip a coin of bias p. • If the coin comes up heads, we connect to the current node. • Else we walk to a random neighbor and go to step 3. “each page has equal probability p of being interesting to us”

NEW NODE RANDOM STARTING NODE 1. COIN TOSS: TAIL 2. COIN TOSS: TAIL 3. COIN TOSS: HEAD

Is Directed Coin-Flipping Power-lawed? We don’t know … but we do have some partial results ... Note: unlike for undirected graphs, the case p → 0 is not so interesting since then you just get a star.

Virtual Degree Definition: Let li(u)be the number of levelidescendents ofu. Let i(i ≥ 1) is a sequence of real number with 1=1. Thenv(u) = 1 + ∑βi li(u) (i ≥ 1)

Virtual Degree u v(u) = 1 + β1 l1(u) + β2 l2(u) + β3 l3(u) + β4 l4(u) + ... = v(u) = 1 + β1 (2) + β2 (4) + β3 (0) + β4 (0) + ... Easy observation: If we set βi = (1-p)i then the expected increase in degree(u) is proportional to v(u).

Virtual Degree • Theorem: There always exist βi such that • For i ≥ 1, |βi| · 1. • As i → ∞, βi →0 exponentially. • The expected increase in v(u) is proportional to v(u). Recurrence: 1=1, 2=p, i+1=i – (1-p)i-1 E.g., for p=¾, i = 1, 3/4, 1/2, 5/16, 3/16, 7/64,... for p=½, i = 1, 1/2, 0, -1/4, -1/4, -1/8, 0, 1/16, … Let vt(u) be the virtual degree of node u at time t and tu be the time when node u first appears. Theorem: For any node u and time t ≥tu, E[vt(u)] = Θ((t/tu)p)

Virtual Degree, contd Let vt(u) be the virtual degree of node u at time t and tu be the time when node u first appears. Theorem: For any node u and time t ≥tu, E[vt(u)] = Θ((t/tu)p) We also have some weak concentration bounds. Unfortunately not strong enough: if these could be strengthened then would have a proof that virtual degrees (not just their expectations) follow power law.

Actual Degree We can also obtain lower bounds on the actual degrees: Theorem: For any node u and time t ≥tu, E[l1(u)] ≥ Ω((t/tu)p(1-p))

Experiments • Random graphs of n=100,000 nodes • Compute statistics averaged over 100 runs. • K=1 (Every node has out-degree 1)

Uniform random connections

Directed 1-Step Random Surfer, p=3/4

Directed Coin Flipping, p=1/2

Directed Coin Flipping, p=1/4

Undirected coin flipping, p=1/2

Undirected Coin Flipping p=0.05

Conclusions • Directed random walk models appear to generate power-laws (and partial theoretical results). • Power laws can naturally emerge, even if all nodes have the same intrinsic “attractiveness”. (Even in absence of “role model” as in copying-model)

Open questions • Can we prove that the degrees in the directed coin-flipping model indeed follow a power law? • Analyze degree distribution for undirected coin-flipping model with p=1/2? • Suppose page i has “interestingness” pi. Can we analyze the degree as a function of t, i and pi?

Questions?

A Random-Surfer Web-Graph Model

A Random-Surfer Web-Graph Model

Presentation Transcript

Exponential Random Graph Models

Exponential Random Graph Models (ERGM)

Web Graph Characteristics

Random-Graph Theory

A Graph Model for RDF

The web graph

Soul Surfer

A Fuzzy Web Surfer Model

A Model Using Random Graph Theory

A Random-Surfer Web-Graph Model

The Web is a Graph

Hashi in a Graph-Theoretic Model

The Web as a graph

Finding a maximum independent set in a sparse random graph

Soul Surfer

Web as a graph

Random Graph Models of Social Networks

Random Effects Model

A Fuzzy Web Surfer Model

Random Walk Model