1 / 26

Rank Aggregation Methods for the Web

Rank Aggregation Methods for the Web. CS728 Lecture 11. Web Page Ranking Methods Reviewed. PageRank – global link analysis Indegree – local link analysis HITS- topic-based link analysis Voting –NNN and Correlation Graph distance from seed URL length and depth

Télécharger la présentation

Rank Aggregation Methods for the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rank Aggregation Methods for the Web CS728 Lecture 11

  2. Web Page Ranking Methods Reviewed • PageRank – global link analysis • Indegree – local link analysis • HITS- topic-based link analysis • Voting –NNN and Correlation • Graph distance from seed • URL length and depth • Text-based methods (e.g., tf*idf)

  3. Rank Aggregation B D C A F E “Consensus” ranking of all A B D C FE B D C A B C D A F E

  4. Notations for Ranking • Given a universe U, and ordered list τ of a subset of S of U τ=[x1≥ x2≥… ≥xd] , xi in S τ(i) : position of rank of i |τ|: number of elements • full list : τ which contains all the elements in U • partial list : rank only some of elements in U • top d list : all d ranked elements are above all unranked elements • Question: when are two orderings similar? Can you give a distance measure?

  5. Measuring Distance Between Orderings • Spearman’s Footrule Distance • σ,τ :two full list. • σ( i ) :rank of candidate i • Kendall tau distance • Count the number of pairwise disagreementsbetween the two lists

  6. σ τ 1 2 3 4 5 A C E D B C A B D E Example of Ordered-List Distance • Example • S = {A,B,C,D,E} • σ,τ :two full list • Spearman’s Footrule Distance • F(σ,τ) = 1 + 2 + 1 + 0 + 2 = 6 • Kendall tau distance • K(σ,τ) = |{(A,C), (B.D), (B,E), (D,E)}| = 4

  7. Optimal ranking aggregation • Optimality depends on the distance measure we use. • Optimizing with Kendall tau distance, we obtain Kemeny optimal aggregation • Can show satisfiesneutrality and consistency • important properties of rank aggregation functions. • Useful but computationally hard. Kemeny optimal aggregation is NP-hard. • Will show that footrule-optimal is in P.

  8. Two properties relate K and F • For any full lists σ,τ K(σ,τ) ≤ F(σ,τ) ≤ 2 K(σ,τ) So we get a 2-approximation to Kemeny-optimality • Since, if σ is the Kemeny optimal aggregation of full lists τ1 ,…, τk and σ’ optimizes the footrule aggregation then, K(σ’, τ1 ,…, τk ) ≤ 2 K(σ, τ1 ,…, τk )

  9. Condorcet Criteria and SPAM Filters • Condorcet Criterion • An element of S which wins every other in pairwise simple majority voting should be ranked first. • Extended Condorcet Criterion (XCC): • If most voters prefer candidate a to candidate b (i.e., # of i s.t. i(a) < i(b) is at least n/2), then also  should prefer a to b (i.e., (a) < (b)). • XCC is effective in ‘spam-fighting’ and thus good to use in meta-search.

  10. XCC: Not always realizable (a) < (b) < (c) Not realizable

  11. Voting Theory: Desired Properties • Given set of candidates and voter preferences: seek an algorithm that ranks candidates which satisfies a set of desired properties • Which combination of properties are realizable? • 1) Independence from Irrelevant Alternatives: Relative order of a and b in  should depend only on relative order of a and b in 1,…,n. • Ex: if i = (a b c) changes to (a c b), relative order of a,b in  should not change.

  12. Desired Properties: • 2) Neutrality No candidate should be favored to others. • If two candidates switch positions in 1,…,n, they should switch positions also in . • 3) Anonymity No voter should be favored to others. • If two voters switch their orderings,  should remain the same.

  13. Desired Properties: • 4) Monotonicity If the ranking of a candidate is improved by a voter, its ranking in  can only improve. • 5) Consistency If voters are split into two disjoint sets, S and T, and both the aggregation of voters in S and the aggregation of voters in T prefer a to b, then also the aggregation of all voters should prefer a to b.

  14. Desired Properties • 6) No Dictatorship: f(1,…,n) != I • 7) Unanimity (a.k.a. Pareto optimality): If all voters prefer candidate a to candidate b (i.e., i(a) < i(b)for all i), then also  should prefer a to b (i.e., (a) < (b)).

  15. Desired Properties • 8) Democracy: satisfies extended Condorcet Criterion XCC. • Always works for m = 2. • Not always realizable for m ≥ 3. • Theorem [May, 1952]: For m = 2, Democracy is the only rank aggregation function which is monotone, neutral, and anonymous.

  16. Arrow’s Impossibility Theorem [Arrow, 1951] • Theorem: If m ≥ 3, then the only rank aggregation function that is unanimous and independent from irrelevant alternatives is dictatorship. • Won Nobel prize (1972)

  17. 1 2 3 4 C3 C1 . . . C7 C8 C10 C7 C1 . . . C8 C3 C10 C3 C2 . . . C7 C10 C9 C3 C8 . . . C1 C15 C10 Borda’s method • Easy and intuitive - Several “score-based”variants; 1781 • Violates independence from irrelevant alternatives B(c)=iBi(c) Sorted in decreasing order Bi(C8) =1 2 0 13 Bi(c)=the number of candidates ranked below c in  i

  18. Partial lists • Handle partial lists by giving all the excess scores equally among all unranked candidates, Example: Candidates number =100 Ranked candidates number =70 (score: 31~100) =>Assign score 31/30 to each 30 unranked candidates

  19. Footrule optimal aggregation • Footrule optimal aggregation can be computed in polynomial time. is a good approximation of Kemeny optimal aggregation. • Proof : Via minimum cost perfect matching

  20. Markov Chain method for rank aggregation. • States=candidates • Transitions depend on the preference orders given by voters • Basic idea: probabilistically switch to a “better candidate” • Rank candidates based on stationary probabilities!

  21. Markov chain advantages • Handling partial list and top d list by usingavailable comparisons to infer new ones • Handling uneven comparison and list length • Computation efficiency • O(NK) preprocessing,O(K) per step for about O(N) steps

  22. Four ways to build transition Matrix • Current state is candidate a. • MC1: Choose uniformly from multiset of all candidatesthat were ranked at least as high as a by some voter. – Probability to stay at a: ~ average rank of a. • MC2: Choose a voter i uniformly at random and pick uniformly at random from amongthe candidates that the i-th voter ranked at least as highas a. • MC3: Choose a voter i uniformly at random and pick uniformly at random a candidateb. If i-th voter ranked b higher than a, go to b. Otherwise,stay in a. • MC4: Choose a candidate b uniformly at random If most voters rankedb higher than a, go to b. Otherwise, stay in a. – Rank of a ~ # of “pairwise contests” a wins.

  23. A locally Kemeny optimal aggregation is a relaxation of Kemeny Optimality • A locally Kemeny optimal aggregation satisfies the extended Condorcet property and can be computed in “kO(nlogn)” worst case, O(n2) • Many of existing aggregation methods do not satisfy ECC. =>Given τ1 , … ,τk use your favorite aggregation method to obtain a full list μ. And Apply local kemenization to μ with respect to τ1 , … ,τk .

  24. Local Kemenization is a procedure to get locally Kemeny optimal aggregation. • A local Kemenization of a full list with respect to Compute a locally Kemeny optimal aggregation of that is maximally consistent with This approach: (1) preserves the strengths of the initial aggregation . (2) ranks non-spam above spam. (3) gives a result that disagrees with on any pair ( i, j ) only if a majority of the τ’s endorse this disagreement. (4) for every d, 1 ≤ d ≤ | μ |, the restriction of the output is a local Kemenization of the top d elements of μ

  25. How do we perform local kemenization? • Local Kemenization Example! A B F E C D B C A E F D A C F D E B B F D C A E C A B F E D B A DC E F A B D B A B A B CF E D A B DC A B CD B A disagree A>B: 3 A<B: 2 B>D: 4 B<D: 1

  26. Experiments: meta-search K = Kendall distance SF = scaled footrule distance IF = induced footrule distance LK = Local Kemenization

More Related