1 / 45

Kronecker Graphs

Kronecker Graphs. The Kronecker Graph Model ( rmat ). Start with a parameter matrix A For n vertices, take Kronecker products Normalize the entries. Generating Edges. One Method Calculate the whole Kronecker matrix Sample each edge independently according to entry Another Method

eryk
Télécharger la présentation

Kronecker Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kronecker Graphs

  2. The Kronecker Graph Model (rmat) • Start with a parameter matrix A • For n vertices, take Kronecker products • Normalize the entries

  3. Generating Edges • One Method • Calculate the whole Kronecker matrix • Sample each edge independently according to entry • Another Method • Treat parameters as probabilities • Flip coins for each edge

  4. Features • Pro • Fast to generate: parallel and distributed • Few parameters to fit • Self-similarity • Con • Doesn’t have a powerlaw distribution • Parameters aren’t intuitive • May not be connected • Used in Graph500 benchmark [Seshadri, Kolda, Pinar]

  5. Variance of Real Graphs [Moreno, Kirschner, Neville, Vishwanathan]

  6. Web Search and Ranking

  7. Web Search Information Retrieval: Given a query Hugh Laurie, find all documents that mention those words

  8. Web Ranking Before 1998 • Use tf-idf(roughly) • Term frequency – inverse document frequency # of occurrences of in # of occurrences of in , the corpus

  9. Results • It was bad • The best results for a topic may not mention the topic explicitly a lot

  10. What are we missing? • Traditional IR only has the text to work with • We have an information network • The hyperlinks are created by intelligent, rational beings!

  11. 1998 – HITS (J. Kleinberg) • What if we ranked documents by in-links? The power law distribution on in-degree will get us every time.

  12. HITS • Idea: Different pages and different links play different roles • Some pages are AUTHORITIES • Some pages are HUBS

  13. Hubs • What is a good hub? A page is a good hub if it points to many authorities.

  14. Authorities • What is a good authority? A page is a good authority if many hub pages point to it. How can we find good hubs and good authorities?

  15. HITS • Everyone starts with a hub-score of 1 and authority-score of 1 • A-update: For each page p, auth(p) is the sum of the hub-scores of pages that point to p. • H-update: For each page p, hub(p) is the sum of the auth-scores of pages ppointsto.

  16. Formally • M is the adjacency matrix, h the hub-scores and a the auth-scores How many iterations should we do? Calculated on the subgraph that corresponds to the query at hand

  17. Where does HITS fail? • Assumes a bipartite clique structure to the web • Doesn’t allow more general forms of endorsement

  18. PageRank – try 1 • Instead of h and a scores, just one score. PR-update(p) = sum of normalized PR score of each page that points to p

  19. Where does this fail? Hint: The web graph is directed.

  20. Actual PageRank • Make the graph strongly connected by adding epsilon weight links between all pages. • Let A be the normalized adjacency matrix

  21. Calculating with the Power Method • Start with • Calculate • Add to every entry • Normalize and repeat Repeat this times

  22. The Random Surfer Model • What natural process can justify PageRank? • How can we model how people might use the web?

  23. The Random Surfer • Starts at some page on the web • With probability (1-), selects a random link on the page and follows it • With probability , gets bored and jumps to some new random web page.

  24. The Random Surfer • The PageRank vector is the probability that you will visit each website in this process

  25. Random Walks on Graphs 1/3 1 1/3 1/2

  26. Stationary Distributions • What does this process converge to? • Connection between eigenvectors and stationary distributions. Why is the top eigenvalue always 1?

  27. Mixing Time • How long does it take to converge? • Why does PageRank converge in time?

  28. Undirected Graphs • The stationary distribution is proportional to the degree

  29. Spectral Analysis for HITS

  30. Applications and Extensions

  31. Personalized PageRank • What if the surfer didn’t jump randomly? • s can be any distribution over the pages

  32. Uses of Personalized PageRank • Creating personalized search results • Topic-sensitive PageRank • Local community detection • Can you compute it more efficiently than PageRank?

  33. The Intentional Surfer • Click data is collected by • Google/Bing Toolbar • Cookies from ad websites.. • Can use this to get better estimates for click through rates of each link • Modifies our transition probabilities to improve PageRank

  34. Search Engine Optimization • Designing your page with the ranking function in mind • Co-evolves with search engines • Obvious Tricks • Make a collection of websites to point to you • Buy old webpages • Include text in background color font • Paying others to link to you

  35. Link spam detection Spam The web graph

  36. Connection to HITS • If you link to a lot of spam sites, you are probably also spam. (Hub) • If you are linked to by lots of spam sites, you are probably why that spam collection was built. (Authority) • Start with seed sites with Hub, Authority scores of 1.

  37. Trust Propagation • Given some information (i trusts j) or (i does not trust j), how can we model trust in a network?

  38. Types of Trust Propagation • Direct Propagation • Transpose Propagation • Co-citation • Trust Coupling i j k i j i j k m i j m

  39. Distrust Propagation • Trust Only • 1-Step Distrust • Propagated Distrust

  40. Propagating Trust and Distrust • Eigenvalue Propagation • Weighted Linear Combination How do you round this matrix to give trust/distrust?

  41. Experiments • Epinions ‘web-of-trust’ • 841,372 edges labeled + or -. Try all combinations of trust and distrust propagation. What is the best model?

  42. Project Proposals • Email by 9/26 to: isabelle@eecs.berkeley.eduanirban.dasgupta+cs294@gmail.com

More Related