170 likes | 254 Vues
Dive into the dimensions of the web - measuring its vastness, exploring user behaviors, and understanding the intricacies of search vs. navigation. Learn about the web's structure, user habits, search engine dynamics, and more.
E N D
Getting to knowing the Web • How big is the web and how do you measure it? • How many people use the web? • How many use search engines? • What is the shape of the web? • How hard is it to go from one page to another? • How do people search for information? • Can we categorize web searchers? • Differences b/w web search & Information Retrieval. • Differences between global and local search. • Differences between search and navigation.
How big is the web? • Number of accessible web pages – May 2005 estimate: 11.5 Billion pages Most recent estimates? ________ • The deep (or hidden or invisible) web “contains 400-550 times more information” (Are they serious?) • Coverage (i.e. the proportion of the web indexed) is crucial for search engines. Today, ____________ pages are indexed
How do you measure the size of web? • Capture-recapturemethod • SE1 = # of pages indexed search engine 1. • QSE2 = # of pages returned by search engine 2 for typical queries. • OVR = # of pages returned by both search engines for typical queries. • Estimate :SE1 / WWW = OVR / QSE2 =>WWW = (SE1 x QSE2) / OVR WWW OVR SE1 QSE2 Lawrence & Giles: Searching the WWW
AÇB Relative Size from Overlap Sample URLs randomly from A Check if contained in B and vice versa AÇ B= (1/2) * Size A AÇ B= (1/6) * Size B (1/2)*Size A = (1/6)*Size B \ Size A / Size B = (1/6)/(1/2) = 1/3 Each test involves: (i) Sampling (ii) Checking(Assume for now that we can do them reliably)
How many people use the web? SEs? • Over 10% of the world’s population were online as of 2004. Today? ________ • Number of broadband users is growing (over 50% of connected Americans use broadband). • Search engine share as of June 2004: • Google (41.6%), Yahoo! (31.5%), MSN (27.4%), AOL (13.6%), Ask Jeeves (7%) Today? _______ • 200 million hits per day to Google (mid 2004). Today? ___
What is the shape of the web? “Map of the Internet” (1998)
ConsiderWeb sites Look at pathsand stronglyconnectedcomponents
What is the shape of the web? Bow-tie shape of the web Broder et.al: Graph structure of the web (2000)
But Why is it a Bowtie? • Maybe is a teapot, a daisy? A cauliflower? • It is a collection of Bowties, because it could not be anything else • Proof by construction
Bowtie Web: Proof by Construction • Start by considering one link per page • Pseudo-trees appear
How hard is it to go from one page to another? • Over 75% of the time there is no directed path from one random web page to another. • When a directed path exists its average length is 16 clicks. • When an undirected path exists its average length is 7 clicks. • Short average path between pairs of nodes is characteristic of a small-world network. Kleiberg: The small-world phenomenon (we will revisit later)
How do people search for information? • Direct navigation • Enter the URL directly into the browser. • Navigation within a directory • Use a web portal as an entry point to the web. • Information seeking on the web is problematic and more users are turning to search engines. Broder: A taxonomy of web search
Can we categorize web searchers? Broder: A taxonomy of web search • Informational ____ % • acquire some information about a topic from web pages. • Navigational ____ % • find a site to start navigation from. • Transactional ____ % • perform some activity mediated by a web site. Think of your own searches. Do you agree? How did Broder found out these categories? How did he measure the percentages?
Web search vs. Info Retrieval • The scale of web search is way beyond traditional information retrieval. • The web is very dynamic. • The web contains an enormous amount of duplication. • The quality of web pages is not uniform. • The range of topics on the web is open. • The web is globally distributed. • Users typical habits are different (short queries, inspect only top-10 pages). • The web is hypertextual.
Differences b/w global & local search • Local search engines on web sites have a bad reputation. • Users often use a web search engine such as Google or Yahoo! to find information on web sites, rather than the local web site search engine. • Many companies do not invest in local search. • Content management is a problem. • Language may be a problem. • Information needs on web sites may be different.
Differences b/w search & navigation • Search – • employing a search engine to find information. • Navigation (or surfing) – • employing a link-following strategy to find information. • The web encourages a combination of search, navigation and browsing.