DEEP, DEEP, INVISIBLE WEB: WHAT’S IN IT AND HOW TO ACCESS IT Presented by Gayle O’Connor March 20, 2015
Why Knowledge of Search Techniques is Important More than 975,262468Sites 55 Million Facebook Status Updates Daily 500 Million Tweets Per Day 276 Million Domains Registered Globally 204 Million Emails Sent Per Minute Netcraft provide internet security services including anti-fraud and anti-phishing services, application testing and PCI scanning. We also analyse many aspects of the internet, including the market share of web servers, operating systems, hosting providers and SSL certificate authorities.
Netcraft Statistics In the February 2015 survey Netcraft received responses from 883,419,935 sites and 5,135,229 web-facing computers.
Search Engines Basic Searching • Keywords describe your topic • Most search engines will look for each word separately • Knowledge of Boolean searching critical for success (or is it?)
Search Engines Relevancy All search engines use different relevancy formulas that rank the keywords in your search Important to understand the differences Look to the help files or RTFM! 6/43
Search Engines Relationships Importance of knowing how search engines acquire the data
Sample Search – “Louisiana State Bar” Yahoo – 180,000 Results Bing – 184,000 Results Google– 157,000 results
Boolean Searching • Boolean Searching is a search that allows the user to enter logical expressions to obtain a result • And • Or • Not • Combined • Truncation/Stemming • Wild Card • Exact Phrase • Advanced Search- Google 10/43
Boolean Searching 6.7 Exact Phrase Searching • A phrase search looks for matches to multiple words, in the same order you typed them. • Enclose the entire phrase in quotation marks. It’s that simple! “intellectual property in Japan”
Hidden, Invisible or Deep Web It’s almost impossible to measure the size of the Deep Web. While some early estimates put the size of the Deep Web at 4,000-5,000 times larger than surface web, the changing dynamic of how information is accessed and presented means that the Deep Web is growing exponentially and at a rate that defies quantification. (Source: Bright Planet)
Surface vs Deep Web • Surface Web • The Indexed Web contains at least 4.54 billion pages (Thursday, 26 February, 2015) Source: WorldWideWebSize.com • Deep Web • 550 Billion actual documents includinginformation such as • searchable databases, books, texts, articles, images, etc. • Source: Bright Planet
Example of Deep Web Sites • Patents and Trademark registrations in the USPTO • Delta Airlines fare search or flight status look up
11.0 Accessing the Deep Web Clusty is a meta search engine, meaning it combines results from a variety of different sources, filtering out duplicates and sifting the best content that you might not have seen otherwise to the top of the search results. Digital Librarian: a librarian's choice of the best of the Web Underground Search Engines Underground search engines are search engines designed to dig deep into the hidden, or invisible, part of the Web not easily accessed by general search queries. SurfWax gives you the option to grab results from multiple search engines at the same time. You can also create SearchSets, your own personalized sets (lists) of sources that you save and use over and over. SurfWax is a good tool for delving into the Invisible Web since it retrieves information you won't be able to find with other search engines.
Articles • Search.com • MagPortal • Go Articles
Blogs A blog is a user-generated website where entries are made in journal style and displayed in a reverse chronological order. • Here are six search engines that can help you find a blog you might be interested in following. • AllTop • Ask.com • Technorati**** • Google • LJSeek • Ice Rocket Blog Search
Business Publications • Bizjournals: Division of American City Business Journals, the nation's largest publisher of metropolitan business newspapers. Includes Web sites for each of the company's 41 journals. “intellectual property” • Bpubs.com: Directory based Internet search engine that strives to cover the topic of Business Publications. Developed with the “Business User” in mind “intellectual property”
Demographic Information • US Census Bureau • Social Explorer • Department of Labor 34/43
Discussions • WebLens: Web forums. Chat. Find out what people are saying, with tools for finding online discussion groups, message boards, and online communities on any topic. • Google Groups: Free groups and mailing list service from Google including Usenet groups from 1981 • MSDN: Forum focusing on help for developers in writing applications using Microsoft products and technologies • Omgili: Omgili searches millions of discussions over ten of thousands of forums around the web.
Meta Search • ZapMeta • SurfWax • Ixquick • Clusty • DogPile • IceRocket 36/43
News • Topix.net • NewsVine.com • News Search Portal • NewsLibrary.com
RSS RSS are Aggregators 38/43
RSS Feeds • RSS delivers information and research directly to you with no spam! Getting started with RSS: • Bloglines: Make your own personalized news page tailored to your unique interests from an index of tens of millions of live internet content feeds, including articles, blogs, images and audio. • My Yahoo: Customizable web page with news, stock quotes, weather, and many other features. • FeedDemon: Easy-to-use interface makes it a snap to stay informed with the latest news and information. • Firefox Browser: Choose from over a thousand useful add-ons that enhance Firefox. It’s easy to personalize Firefox to make it your own. 40/43
BOTS From Robot: In short, a bot is a software tool for digging through data. You give a bot directions and it brings back answers. Bots are software applications that run automated tasks over the internet. (ex. web spidering) Wikipedia: Bots are software applications that run automated tasks over the Internet.
Any Questions? 42/43
Gayle M. O’ConnorGMO Marketing Independent Marketing Specialist Particularly to the World of Legal Vendors…Let me help you grow your business! Phone: 206.356.7688 eMail: firstname.lastname@example.org Twitter: @gaylemoconnor LinkedIn: https://www.linkedin.com/in/gayleoconnor Google+: https://plus.google.com/+GayleOConnor/posts