1 / 24

Pagerank

Pagerank. Today. Axiomatic formulation of pagerank Efficient pagerank computation Efficient PPR computation. Voting theory. Consider a democracy where people submit preference lists over candidates.

crevan
Télécharger la présentation

Pagerank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pagerank

  2. Today • Axiomatic formulation of pagerank • Efficient pagerank computation • Efficient PPR computation

  3. Voting theory • Consider a democracy where people submit preference lists over candidates. • A voting rule (or social welfare function) outputs a global ordering of candidates for every set of preference lists. • Example: majority voting, Borda counts,…

  4. Majority voting • Pair up alternatives and eliminate one • Pathology: • X > Y > Z; Y > Z > X; Z > X > Y • Final winner depends on which order the pairs are considered in

  5. Positional voting • Assign scores to each position, aggregate scores for each candidate • Borda count: candidate at position i gets score k – i • Plurality voting: one topmost choice gets any score • Pathology: • Suppose 2 people prefer GodFather > CitizenKane • 3 people prefer CitizenKane > GodFather • If PulpFiction is introduced, • 2 people rank GodFather> PulpFiction > CitizenKane • 3 people rank CitizenKane > GodFather > PulpFiction • GodFather wins

  6. Voting Axioms • Unanimity: If everyone prefers the candidate x to y, then the global ordering also ranks x above y. • Independence of irrelevant alternatives (IIA): For any two candidates x and y, changes in people’s rankings of candidates other than x and y should not affect the relative position of x and y in the global ordering.

  7. Arrow’s (im)possibility theorem • Theorem [Arrow, 1951]: The only function satisfying unanimity and IIA is dictatorship. • Extensions • Similar results hold for social choice functions where a single candidate (winner) must be chosen [Gibber-Satterthwaite, 1977] • Majority rule arises naturally when we restrict the preference domain of people (i.e., impose rules on how they can rank candidates).

  8. Pagerank • Defined as stationary distribution • x = Mx • M : strongly connected (i.e. incorporates teleportation) • Nodes are both voters and candidates • each link is a vote • votes are “transitive” • votes by a node are not a complete preference order

  9. Pagerank Axioms • Isomorphism: The ranking procedure should be independent of the names of the nodes. • Self edge: Adding self loops should not harm a node and should not affect other nodes. • Vote by committee: Importance a gives to b and c by voting shouldn’t change if a votes via committee. • Collapsing: If two nodes vote similarly, and are linked to by disjoint sets of nodes, the ranking does not change when they are collapsed to one node. • Proxy: There is an equal distribution of importance.

  10. Pagerank • Theorem: • Pagerank satisfies all these axioms • Pagerank is the unique measure that satisfies the above axioms. • To ponder: • Are there any set of simpler axioms? • If we relax any one of the axioms, is there any other “natural” measure that fits? • Any such formulation for other importance measures e.g. HITS? • Others such axiomatic formulations: • Clustering [Kleinberg] • Trust based recommendation systems [Anderson, Chayes, et al.] • Collaborative filtering systems [Pennock et al.]

  11. Efficient Pagerank computation

  12. Power Iteration x = 0 while (convergence) • Suppose graph data = set of disk-resident edges • (source, sink, value) • main complexity: number of passes • maintain only x(t) and x(t+1) • Power iteration makes updates to x(t+1) based on x(t)

  13. Tons of work in making PR efficient • Can we reuse previous solutions • x(t), x(t-1)..x(t-4) • Can we make the markov chain mix faster.. • Looking at this as a linear system and applying techniques e.g. Gauss-Seidel… • Exploiting the structure of the web • Intra-host links, dangling nodes • When does ranking stabilize

  14. Efficient updates (McSherry’05) • Basic intuition: • Can we continuously make updates when streaming over the edges • Can we make updates in the same pass “count”? • Does it converge? Is it efficient?

  15. Efficient updates • Basic intuition: • Can we continuously make updates when streaming over the edges • Can we make updates in the same pass “count”? • Does it converge? Is it efficient? • Let residual vector • New algorithm:

  16. Efficient updates: with some care • When faced with edge(s) {(u, v1, c1), ..(u,vk, ck)} pretend update vector and update and • Why does this converge? • Converges if z is chosen carefully. (r(u) = constant in our discussion) • special case: z(u) = y(u) gives power iteration bounds

  17. Different optimizations • Choosing z(u) • z(u) = y(u) • measures “change” to “work” ratio • Group edge from same host together • Lot of edges on the web are intra-host • For each pass, load this block into memory and do multiple passes over them

  18. Result

  19. Efficient PPR computation

  20. Personalized PageRank • = PPR with respect to vector v is the fixed point of • Intuition: • Every 1/c steps, the random walks resets to node v • Naively for all n different PPR vectors • O(n^2) storage • each needs O(n+m) time to compute

  21. Computing PPR faster • Linearity property • Decomposition theorem (Jeh, Widom’03) • each p(u) can be expressed in terms of p(v) of the neighbors

  22. Dynamic Programming • k-th step give PPR over paths of length k • can also get a corresponding bound on the k-th step error • Can run it up to any desired precision

  23. Cutting down space • What about approximate? • Option 1: Rounded DP: • ignore entries in PPR vector < • each vector now has non-zero entries • Option 2: sketched DP • use sketches to keep counts • “Count-min” sketch • Crucial step is to argue that errors do not accrue

  24. Interesting questions • What if the edges arrive incrementally • Captures changes in edge weights due to interactions • [Bahmani, Chowdhury, Goel’11] • Random walks in small memory + small number of passes over data? • later in course • applications in community finding • PPR as a distance metric • can we efficiently search using this?

More Related