1 / 17

Getting to knowing the Web

Getting to knowing the Web. How big is the web and how do you measure it? How many people use the web? How many use search engines? What is the shape of the web? How hard is it to go from one page to another? How do people search for information? Can we categorize web searchers?

Télécharger la présentation

Getting to knowing the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Getting to knowing the Web • How big is the web and how do you measure it? • How many people use the web? • How many use search engines? • What is the shape of the web? • How hard is it to go from one page to another? • How do people search for information? • Can we categorize web searchers? • Differences b/w web search & Information Retrieval. • Differences between global and local search. • Differences between search and navigation.

  2. How big is the web? • Number of accessible web pages – May 2005 estimate: 11.5 Billion pages Most recent estimates? ________ • The deep (or hidden or invisible) web “contains 400-550 times more information” (Are they serious?) • Coverage (i.e. the proportion of the web indexed) is crucial for search engines. Today, ____________ pages are indexed

  3. How do you measure the size of web? • Capture-recapturemethod • SE1 = # of pages indexed search engine 1. • QSE2 = # of pages returned by search engine 2 for typical queries. • OVR = # of pages returned by both search engines for typical queries. • Estimate :SE1 / WWW = OVR / QSE2 =>WWW = (SE1 x QSE2) / OVR WWW OVR SE1 QSE2 Lawrence & Giles: Searching the WWW

  4. AÇB Relative Size from Overlap Sample URLs randomly from A Check if contained in B and vice versa AÇ B= (1/2) * Size A AÇ B= (1/6) * Size B (1/2)*Size A = (1/6)*Size B \ Size A / Size B = (1/6)/(1/2) = 1/3 Each test involves: (i) Sampling (ii) Checking(Assume for now that we can do them reliably)

  5. How many people use the web? SEs? • Over 10% of the world’s population were online as of 2004. Today? ________ • Number of broadband users is growing (over 50% of connected Americans use broadband). • Search engine share as of June 2004: • Google (41.6%), Yahoo! (31.5%), MSN (27.4%), AOL (13.6%), Ask Jeeves (7%) Today? _______ • 200 million hits per day to Google (mid 2004). Today? ___

  6. What is the shape of the web? “Map of the Internet” (1998)

  7. ConsiderWeb sites Look at pathsand stronglyconnectedcomponents

  8. What is the shape of the web? Bow-tie shape of the web Broder et.al: Graph structure of the web (2000)

  9. But Why is it a Bowtie? • Maybe is a teapot, a daisy? A cauliflower? • It is a collection of Bowties, because it could not be anything else • Proof by construction

  10. Bowtie Web: Proof by Construction • Start by considering one link per page • Pseudo-trees appear

  11. The second link creates a Bowtie

  12. How hard is it to go from one page to another? • Over 75% of the time there is no directed path from one random web page to another. • When a directed path exists its average length is 16 clicks. • When an undirected path exists its average length is 7 clicks. • Short average path between pairs of nodes is characteristic of a small-world network. Kleiberg: The small-world phenomenon (we will revisit later)

  13. How do people search for information? • Direct navigation • Enter the URL directly into the browser. • Navigation within a directory • Use a web portal as an entry point to the web. • Information seeking on the web is problematic and more users are turning to search engines. Broder: A taxonomy of web search

  14. Can we categorize web searchers? Broder: A taxonomy of web search • Informational ____ % • acquire some information about a topic from web pages. • Navigational ____ % • find a site to start navigation from. • Transactional ____ % • perform some activity mediated by a web site. Think of your own searches. Do you agree? How did Broder found out these categories? How did he measure the percentages?

  15. Web search vs. Info Retrieval • The scale of web search is way beyond traditional information retrieval. • The web is very dynamic. • The web contains an enormous amount of duplication. • The quality of web pages is not uniform. • The range of topics on the web is open. • The web is globally distributed. • Users typical habits are different (short queries, inspect only top-10 pages). • The web is hypertextual.

  16. Differences b/w global & local search • Local search engines on web sites have a bad reputation. • Users often use a web search engine such as Google or Yahoo! to find information on web sites, rather than the local web site search engine. • Many companies do not invest in local search. • Content management is a problem. • Language may be a problem. • Information needs on web sites may be different.

  17. Differences b/w search & navigation • Search – • employing a search engine to find information. • Navigation (or surfing) – • employing a link-following strategy to find information. • The web encourages a combination of search, navigation and browsing.

More Related