1 / 32

Egalitarian engines?

Egalitarian engines?. S. Fortunato, A. Flammini, F. Menczer & A. Vespignani. Outline. Search engines The Google revolution: PageRank Popularity bias The feared scenario: Googlearchy! Empirical test: Googlocracy? The importance of query topics Outlook. Search Engines.

kaethe
Télécharger la présentation

Egalitarian engines?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Egalitarian engines? S. Fortunato, A. Flammini, F. Menczer & A. Vespignani

  2. Outline • Search engines • The Google revolution: PageRank • Popularity bias • The feared scenario: Googlearchy! • Empirical test: Googlocracy? • The importance of query topics • Outlook

  3. Search Engines “A search engine is a program designed to help find information stored on a computer system such as the World Wide Web …” Wikipedia First search engine: Archie (1990, Internet) First WWW search engine: Wandex (1993)

  4. Timeline

  5. Web pages Nodes Edges Hyperlinks The revolution Invented by S. Brin and L. Page (1998). Novelty: for the first time, a search engine ranks pages according to their relevance in the graph topology of the Web!

  6. Degree distribution of the Web Graph

  7. PageRank It is the prestige measure used by Google to rank Web pages. p(i) ~ probability that a user browsing the Web by clicking from one page to another (i.e. by following hyperlinks) visits page i.

  8. Theoretical/empirical result: the PageRank of a page is approximately proportional to the number of incoming links of the page (link popularity or in-degree)

  9. Google recipe: Web pages are ranked according to their in-degree. Other factors play a role in the ranking, but PageRank is the only factor that treats Web pages like points of a graph, regardless their semantic features. How attractive are Web pages for users? Traffic

  10. Traffic is related to the frequency of visits of Web pages by users. Operative definition: the traffic t to a page is the fraction of times the page is clicked within some period. Question: how does the traffic t grow with the link popularity (in-degree) k of a page?

  11. Null model: in a world where people navigate the Web only by browsing, the traffic t to a page is just the probability to visit a page during this process → PageRank ~ in-degree Null model prediction→ t ~ k In the real Web, navigation by searching is replacing navigation by browsing. What consequences are there on the relation between t and k? Do search engines introduce a popularity bias?

  12. There are three possible scenarios: • t ~ k→ no bias; • t ~ k with α > 1→ googlearchy; • t ~ k with α < 1→ googlocracy. α α

  13. The feared scenario: Googlearchy

  14. Search Dominant Model All users discover and navigate the Web by submitting queries to search engines and looking at the results. • Distribution of clicks on the hits of a hit list; • Relation between the rank of a hit in a hit list and its PageRank/in-degree. Two empirical ingredients:

  15. The fraction of clicks on a hit is our traffic t. Hits are identified by their rank r in the list.

  16. By ordering all Web pages in decreasing values of in-degree, a page with in-degree k will have rank (from cumulative degree distr.) Googlearchy: search engines boost the popularity of the most popular pages much faster than simply surfing on the Web!

  17. 28,124 sites Traffic from Alexa In-degree from Google and Yahoo Analysis repeated after 2 months Empirical test of popularity bias

  18. Data vs. Models Googlocracy?

  19. What are we missing? (at hit list level) L (overall) G The two relations cannot be combined!

  20. The importance of query topics Hit lists depend on the interests of the users and can be of various sizes. In particular, very specific queries lead to small hit lists which often contain little popular Web sites/pages. Similarly, it is unlikely that small hit lists contain very popular sites/pages.

  21. Hit list size distribution

  22. Our model • “Artificial” Web with N pages, labeled from 1 to N; • At each step, a hit list is created such that: 1) all pages have the same probability to appear in the hit list; 2) the size of the hit list is taken from the empirical distribution. • For each hit list, clicks are distributed among the hits according to the empirical distribution • After a sufficient number of hit lists has been created, we check how many clicks went to a page with label/rank r

  23. Data vs “Semantically Correct” model

  24. Conclusions • The use of search engines partially mitigates the rich-get-richer nature of the Web, giving new sites an increased chance of being discovered (compared to surfing alone), as long as they are about specific topics that match the interests of users. • The combination of (i) how search engines index and rank results, (ii) what queries users submit, and (iii) how users view the results, leads to an egalitarian effect (“Googlocracy”).

  25. Reactions

  26. Looks scientific, but actually biased, and not right! A research floats "The Egalitrian Effects of Search Engines" Being on good terms with Google and googling people including blogger and bloging people, still did not stop me from thinking Streets are much better than the rough roads of the past centuries , too ! That is what I thought after reading the full text of the research paper. I do not think the survey methods were right,though they looked very scientific. I have made an experement with Google page ranking, here is a look at how "egalitarian" google in reality is like:

More Related