540 likes | 741 Vues
Stop Searching and Start FINDING: Strategies for Effective Web Research. a presentation by Ken Wiseman & IMSA ken@wisemantech.com. Our goals today. Discover the biggest mistakes made by most Internet users: Typing search terms in the wrong box. Using the wrong tool at the wrong time.
E N D
Stop Searching and Start FINDING: Strategies for Effective Web Research a presentation byKen Wiseman & IMSA ken@wisemantech.com
Our goals today ... • Discover the biggest mistakes made by most Internet users: • Typing search terms in the wrong box. • Using the wrong tool at the wrong time. • Talk about the differences between directories and search engines (and when to use each.) • Learn some advanced Google searching techniques. • DO ALL OF THIS IN ENGLISH!
To Start Put the right request in the right place
Put Web addresses in the address box (this is the URL stuff that begins with http://)
Put search terms (the stuff you are looking for) in the search box
The Biggest Mistake Thinking that search tools are card catalogs of the web
9 Billion pages reside on the Web (4/02) • No search tool indexes all of the web. • The largest -Google-indexes less than 30% of the total (3 billion). • Each engine indexes a different set of web pages.
Search Engine Size - Search Engine Showdown - 12/02
The Second Biggest Mistake Using the wrong tool at the wrong time
Three questions • Where would you find the telephone number or address of the Woodfield theatre? • A telephone book • Where you would find the definition of the word “pestilence?” • A dictionary (or in your school yearbook) • Where would you find the name of the war that the Treaty of Westphalia ended? • An encyclopedia
What would happen if you tried to look up the definition of the word “pestilence” in the telephone book?
YAHOO ISN’T A SEARCH ENGINE! ... it is a directory. (but this maybe changing)
Directories • Usually human-compiled guides to the web, where sites are organized by category • Major directories: • MSN • Yahoo • Netscape ODP
What directories are good for ... • “What is the Web page address for some company, organization, or entity?” (or “who makes product X?”) • “Where can I find a list of Web pages that focus on a particular, ‘universal’ topic?” • In other words, directories are GREAT for “telephone book” searches.
What directories AREN’T good for ... • Directories are horrible for “encyclopedia” or “dictionary” searches. • The only exception is if the topic is so universal that the directories have no choice but to link to a page or two that discuss that topic (and even then the selection will be slim.)
Search Engines have three parts: • A spider (also called a "crawler" or a "bot") that goes to every page or representative pages on every Web site that wants to be searchable and reads it, using hypertext links on each page to discover and read a site's other pages.
Search Engines have three parts: • A program that creates a huge index (sometimes called a "catalog") from the pages that have been read. A program that receives your search request, compares it to the entries in the index, and returns results to you.
Directories vs Search Engines • Directories are human-compiled and have a small number of pages in their databases (usually in the low millions) • Search engines are machine-compiled and have a HUGE number of pages in their databases (usually in the hundreds of millions or even the billions)
The Second Biggest Mistake -- Restated Using a directory as if it was a search engine ... and then not understanding why you can’t find anything!
MSN Yahoo Google AOL Ask Jeeves LookSmart Infospace Overture Netscape AltaVista Top search sites – January 2002 -- Courtesy Jupiter Media Metrix
MSN Yahoo Google AOL Ask Jeeves LookSmart Infospace Overture Netscape ODP AltaVista Which ones are directories?
Why Use a Search Engine? 6 Billion+
Secondary results • Most directories use a search engine as a backup (Yahoo and Netscape use Google, almost everyone else uses Inktomi) • Why add the extra step?
How the sites stack up • Most directories (like MSN and AOL) link to 2 or 3 million pages. • Most search engines (like AlltheWeb and Google) link to Billions of pages. -- Courtesy searchenginewatch.com
Why do people predominantly use directories when search engines have more stuff? Because no one ever takes the time to teach us how to use a search engine!
The Third Biggest Mistake Not knowing how to use directories or search engines to actually FIND stuff
Search engine rule #1 Be specific ... because if you aren’t specific, you’ll end up with a bunch of garbage!
Preparing to Search • Formulation of the research question • Identification of important concepts within the question • Identification of search terms to describe those concepts • Consideration of synonyms and variations of those terms • Take a look at Vivisimo clustered results for help • Preparation of the search logic
Search engine rule #2 Use quotes to search for phrases. “ken wiseman”
Use quotes for phrases • To search for phrases, just put your phrase in quotes. • For example, disney fantasyland “pirates of the caribbean” • This would show you all the pages in Google’s index that contain the word disney AND the word fantasyland AND the phrase pirates of the caribbean • By the way, while this search is technically OK, my choice of keywords contains a (deliberate) factual mistake. Can you spot it?
Arr, She Blows! • Pirates of the Caribbean isn’t in Fantasyland, it’s in Adventureland in Orlando and New Orleans Square in Anaheim. • So searching for disney AND fantasyland AND “pirates of the caribbean” probably isn’t a good idea.
Search engine rule #3 Use the + sign to require. Apple+computer
Search engine math:+ & And Apple & Computer Only returns pages with both of these terms on them Limits your search
Search engine rule #4 Use the - sign to exclude. apple -computer
Search Engine Math:- & not Limits your search Women not History Only returns pages that contain one but not the other term on them
Boolean OR • Sometimes the default AND gets in the way. That’s where OR comes in. • The Boolean operator OR is always in caps and goes between keywords. • For example, an improvement over our earlier search would be disney fantasyland OR “pirates of the caribbean” • This would show you all the pages in Google’s index that contain the word disney AND the word fantasyland OR the phrase pirates of the caribbean (without the quotes)
Three Ways to OR at Google • Just type OR between keywords disney fantasyland OR “pirates of the caribbean” • Put your OR statement in parentheses disney (fantasyland OR “pirates of the caribbean”) • Use the | (“pipe”) character in place of the word OR disney (fantasyland | “pirates of the caribbean”) • All three methods yield the exact same results.
Search engine math:OR Broadens your search Women or History Returns every page with either of these terms on them
OR, She Blows! • Just remember, Google’s Boolean default is AND • Sometimes the default AND gets in the way. That’s where OR comes in.
How Insensitive! • Google is not case sensitive. • So, the following searches all yield exactly the same results: disney fantasyland pirates Disney Fantasyland Pirates DISNEY FANTASYLAND PIRATES DiSnEy FaNtAsYlAnD pIrAtEs
Search engine rule #5 Combine symbols as often as possible (see rule #1). +”Martha Washington” –george +revolution
The five rules • Be specific ... because if you aren’t specific, you’ll end up with a bunch of garbage! • Use quotes to search for phrases. • Use the + sign to require. • Use the - sign to exclude. • Combine symbols as often as possible (see rule #1). • Don’t forget OR
Did You Know… • that large chunks of the Web are invisible to most search engines. • That no one has a good handle on the magnitude of the invisible web* • That much of the invisible web is of great value to educators & students
So What? • Would you intentionally exclude large chunks of the Library of Congress’ 12 million documents from your searches? • How about the US Census Bureau? • How about health and medical databases? • Many newspapers?
What Today’s Search Tools Can and Cannot find • Not search tool specific • Search tools were created to handle flat HTML pages. • When confronted with a search box the search tool is stopped unless it has specific instructions on how to handle that input box. • Dynamically created web pages have unusual URLs
What Today’s Search Tools Can and Cannot find • The LII page for Automobile (http://lii.org/search/file/automobiles) is in Google; • The LII page for Motorcycles (http://lii.org/search?title=Motorcycles; query=Motorcycles; searchtype=subject) is not. Do you see why NOT??
Simple Examples Ken Wiseman - 602 Hits None contain my contact info But…