The “Deep Web”

ISC 110 Final Project Kaila Ryan - 12/12/2013 The “Deep Web”

What is the “Deep Web”? • Web content which is hidden behind an HTML form, and is generally not able to be indexed by search engines (Madhavan et. All, 2009). • Largely made up of web-connected databases (Wright, 2009). • Shopping catalogs • Scientific research data • Public transport information, etc. • Requires “valid input values” to access (Madhavan et. All, 2009). In other words, a query or another similar form of typed input. • Web-crawlers not yet sophisticated enough to automate formulation of relevant queries, so this data cannot be reached by them.

A bit about search engines... • Most modern search engines use automated “web crawler” programs to index websites • Crawlers follow a “trail” of links from webpage to webpage, indexing each new page it finds so that it becomes searchable- part of the “surface web” (Wright, 2009). • Because of the very nature of how they function, traditional crawling methods fail to index some documents, such as: • Databases, which require specific queries to access the information contained in them • Impossible (or at least inefficient and impractical) to use every possible query on every database found. • Task of figuring out how to narrow down possible queries to relevant terminology has been challenging.

Finding the Deep Web: • No single, exhaustive method of locating this data is available- yet. • Many competing theories and projects working toward the creation of functioning Deep Web crawlers and search engines. • Primary methods of locating Deep Web content at present: • Directories, like “The Hidden Wiki” (requires Tor browser) • Referral by current users of a particular site/service/database • Many in the field of Information Science focused on development of technology capable of “surfacing” Deep Web content, through the use of new methods of locating and querying databases, and indexing the results of these queries. • Google has a team dedicated specifically to this task

The Deep Web's value: • You may be asking yourself, “Why should we bother surfacing the 'Deep Net'? What is it worth to us?” • Ability to automate database querying and indexing opens up potential for automated cross-referencing of otherwise unconnected databases. • Invaluable to the field of medical and scientific research. • Important step in the movement toward a semantic web. • Could potentially be used to search for answers to complex questions, for which all of the information is available, but is either not unified, or not easily accessible (“What is the cheapest way to get from X to Y at 9am on a Sunday?”) • In general, ability to discover a wealth of knowledge that is already freely available, but hidden: up to 96% of the Web may be considered the Deep Web.

Sources • Bergman, M. K. (2001, Sept 24). The deep web: Surfacing hidden value. Deep Content, Retrieved from http://grids.ucs.indiana.edu/courses/xinformatics/searchindik/ deepwebwhitepaper.pdf • Bin He, Mitesh Patel, Zhen Zhang, and Kevin Chen-Chuan Chang. 2007. Accessing the deep web. Commun. ACM 50, 5 (May 2007), 94-101. DOI=10.1145/1230819.1241670 http://doi.acm.org/10.1145/1230819.1241670 • Jayant Madhavan, David Ko, Łucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy. 2008. Google's Deep Web crawl. Proc. VLDB Endow. 1, 2 (August 2008), 1241-1252. • Wright, A. (2009, Feb 23). Exploring a 'deep web' that google can't grasp. The New York Times. Retrieved from http://cob.jmu.edu/williamson/mktg470/reading/search/2009/Exploring a ‘Deep Web’ That Google Can’t Grasp.pdf

The “Deep Web”

The “Deep Web”

Presentation Transcript

The Gift of The Magi

Inside Windows Azure Storage : what's new and under the hood deep dive

Windows Touch Deep Dive

Deep neck space infections

Strategies for Defending the Spread Offense with the 3-3

332:578 Deep Submicron VLSI Design Lecture 3 Deep Sub-micron MOS Transistor Theory

UCL Tutorial on: Deep Belief Nets (An updated and extended version of my 2007 NIPS tutorial)

Reducing Incidence of Deep Vein Thrombosis DVT in Post-Surgical Patients

NWSC Math Cohort Meeting

Power Hardware : 2Q Deep Dive

Deep Learning from Speech Analysis/Recognition to Language/Multimodal Processing

Multimodal Deep Learning

A Deep dive into Mentoring… 10 Years of Sustainable Results a nd Good Neighbors!

Stratospheric Water Vapor and Deep Convection

Castel Gandolfo, Ottobre 2005 Agn and Galaxy Evolution

Monolithic sensors in high-voltage deep-submicron technology

At-Speed Test Considering Deep Submicron Effects

CSC2535: 2011 Lecture 5b Object Recognition and Information Retrieval with Deep Belief Nets

Dan Claes University of Nebraska-Lincoln

2007 NIPS Tutorial on: Deep Belief Nets

The Circulation of the Deep Oceans

OCEAN ZONES

The “Deep Web”

The “Deep Web”

Presentation Transcript

The Gift of The Magi

Inside Windows Azure Storage : what&#039;s new and under the hood deep dive

Windows Touch Deep Dive

Deep neck space infections

Strategies for Defending the Spread Offense with the 3-3

332:578 Deep Submicron VLSI Design Lecture 3 Deep Sub-micron MOS Transistor Theory

UCL Tutorial on: Deep Belief Nets (An updated and extended version of my 2007 NIPS tutorial)

Reducing Incidence of Deep Vein Thrombosis DVT in Post-Surgical Patients

NWSC Math Cohort Meeting

Power Hardware : 2Q Deep Dive

Deep Learning from Speech Analysis/Recognition to Language/Multimodal Processing

Multimodal Deep Learning

A Deep dive into Mentoring… 10 Years of Sustainable Results a nd Good Neighbors!

Stratospheric Water Vapor and Deep Convection

Castel Gandolfo, Ottobre 2005 Agn and Galaxy Evolution

Monolithic sensors in high-voltage deep-submicron technology

At-Speed Test Considering Deep Submicron Effects

CSC2535: 2011 Lecture 5b Object Recognition and Information Retrieval with Deep Belief Nets

Dan Claes University of Nebraska-Lincoln

2007 NIPS Tutorial on: Deep Belief Nets

The Circulation of the Deep Oceans

OCEAN ZONES

Inside Windows Azure Storage : what's new and under the hood deep dive