Search Engine Strategies . . . Beyond Yahoo and Google Presented by Linda J. Goff,Head, Instructional ServicesCSUS LibraryFall, 2004 LJG 8/11/04
Today’s Agenda • Web Structure, Jargon & definitions. • How search engines think and work. • Picking the right web search tool. • Searching techniques & tips. • Evaluating your sources - thinking critically about information. • Demonstration.
blogging browser cache cookies html http hypertext link Metasearch Invisible Web phishing portal sites SacLink telnet URL Glossary
What is the World Wide Web? • The World Wide Web (WWW) is a global interactive, dynamic, cross-platform, graphical hypertext information system that runs on the Internet.
The Web is Growing Exponentially Over 10% of World is connected! Internet users estimated to be 605.60 million as of September 2002. Source: http://www.nua.ie/surveys/how_many_online/
3 Main Types of Search Tools:... • Web Directory - Hierarchical - organized in a classification system. • Standard Search Engine– uses mathematical algorithms and boolean searches for keyword searches • Expert Pages–reviewed list.
Expert Pages Infomine - Scholarly Internet Resource Collection http://infomine.ucr.edu/ Librarians Index to the InternetInformation You Can Trusthttp://lii.org/ The WWW Virtual Library http://www.vlib.org CSUS Librarian Guides: http://library.csus.edu/guides/
Now Web search tools can ... • Search multiple search engines simultaneously. • Find sites that answer natural language questions. • Ranks sites by how many links have been made to them. • Sorts matches into folders by categories. • Have advanced searching features • Or - a combination of the above.
Metasearch engines • Search simultaneously across multiple search engines and displays top sites in each: • Dogpile.com • Vivisimo.com • Warning: Some now charge for higher listings, e.g., Overture
Natural Language Search • Type your questions in Natural Language, e.g., AskJeeves.com • Analyzes words, grammar and syntax, and uses "templatics" to look for patterns in the way questions are asked. • Jeeves responds with one or more closely related questions that it already knows the answer to.
Part 2How SearchEngines Thinkand Work Search Engines
Most search engines and databases use Boolean Operators to create search statements, e.g.(domestic or family) and (violence not sexual abuse)
Boolean Operators • AND requires both terms to appear in the items that are retrieved. • OR requires either term to appear in the items that are retrieved. • NOT excludes a term.
Boolean Search Strategy a AND b a b family and violence a OR c a c family or domestic b NOT d b b d violence not sexual abuse
What Search Engines Don’t Search • ‘Bots only crawl the visible web which is only about 20% of everything that is on the Internet. • They don’t look at the “Deep Web”, or “The Invisible Web.”
Invisible Web contains • Commercial databases that charge a fee, e.g., library research databases of periodical articles. • Sites that require membership or a login. • Searchable pages such as catalogs, phone books or directories, e.g.AMA Physician Search.
Library Databases Access • Authentication automatic for users with Web access via CSUS and SacLink. • CSUS users with other Internet Service Providers (AOL, Prodigy etc.) must create a P.I.N. in EUREKA for authentication to access Library databases. • To connect from off campus go to http://www.lib.csus.edu/databases/help/page.
Choose based on your Information Need • Try Noodle Tools: http://www.noodletools.com/debbie/literacies/information/5locate/adviceengine.html
Search Engine Comparisons • Most have built-in search tips or help screens. • Boolean operators, phrase searching and other limiters are often available. • Be aware! Some now charge for higher page placement e.g, Overture.
Handout • See “Searching the Web”handoutof special search features and URLs for most popular search engines. http://libweb.uoregon.edu/guides/searchweb/srchweb-info.html
Part 3 Search Tips & Strategies World Wide Web
Reading Parts of the URLhttp://www.lib.csus.edu/databases/ • The part before the colon is the access method orprotocol, (hypertext transfer protocol). • The part after the double slashes is the net address ordomain nameof the computer where the resource is located. • Thedirectory pathandfilename come after the next slash.
edu- higher education com- commercial firms (+22 million) gov- government agencies mil- military (US) org- general noncommercial organizations net - computer networks int - international organizations State or Country of origin: uk (United Kingdom) ca (Canada) ca.us(California. United States) Common Codes in Domain Names
.info (anyone) .biz (business) .name (individuals) .pro (professionals) .museum (accredited) .aero (Airtransport industry) .coop (business cooperatives) Kids.us (Content and technology restrictions) New Suffixes added by ICANN, effective Spring 2002
Think critically about the information you find on the Web... • Anybody can publish anything on the Web. • There are no editors and no central authorities. • There are no guarantees that the site you find will be there next time you look.
Questions you should ask when evaluating a Web page: • Who is the author or sponsor? • What authority/expertise do they have? • What is the purpose/scope of the page? • Is it current? When was it last updated? • How complete and accurate is the information? Does it have a bias? • How usable is it? Do the the links work?
You must... • Examine assumptions and possible biases. • Distinguish between fact and opinion. • Compare and contrast related pieces of information from other sources (print and online).
Bogus sites proliferate • POP! the First Human Male Pregnancy • http://www.malepregnancy.com • Dihydrogen Monoxide Research • http://www.dhmo.org/ • Clones-R-Us • http://www.d-b.net/dti/
Sites need to be examined carefully and compared • Martin Luther King Jr. – A Historical Examination • http://www.mlking.org • The King Center http://web.archive.org/web/20010208160923/http://thekingcenter.org/ • http://www.thekingcenter.com/
Web Searching Tips • Use unique words or phrases. • Check spelling ! • Use synonyms or multiple spellings (e.g., marijuana marihuana) • Try more than one search engine. • Use words like “research” or “policy” to find more scholarly sites. • Use domain limit feature e.g., Domain:edu or domain:gov
Citing Electronic Sources Look for it on the Library Home Page under Databases and Periodical Indexes. Look on the left for Guides For General & News and click on Citing Electronic Sources. The URL is... http://www.lib.csus.edu/guides/budge/eography.htm.
WARNING • Con artists and scams are proliferating on the Web. • Don’t use your credit card number unless you are assured of a secure system. • Don’t download unfamiliar software. • Don’t give out personal information.
Browser Configuration Tips • Clear the memory cachebefore you begin a search session. It will speed up your response time. • Use the following path for I.E.: Tools -> Internet Options -> Delete Files. For Netscape use : Edit -> Preferences -> Advanced -> Cache • Delete Cookiesat the same screen.
Shortcuts • Use Bookmarks or Favorites • UseGo from the pull-down menus instead of the Backbutton or use the History or right mouse button. • Use the Stop and Reloadbuttons if loading a document takes too long. • CTRL ALT DEL will bring up Windows 2000 Task Manager and you can close the browser if it is not responding.
Part 4 Popular Search Engines World Wide Web
US Digital Media Universe Audience ReachHome & Work UsersJanuary 2003 KEY:GG=Google, YH=Yahoo, MSN=MSN, AOL=AOL, AJ=Ask Jeeves,OVR=Overture (GoTo), IS=InfoSpace, NS=Netscape, AV=AltaVista,LY=Lycos, ELINK=EarthLink.com, LS=LookSmart, http://searchenginewatch.com/reports/netratings.html
Billions Of Textual DocumentsIndexed as of Sept 2, 2003 KEY: GG=Google ATW=AllTheWeb,INK=Inktomi,AV=AltaVista, TMA=Teoma. Source: http://searchenginewatch.com/reports/article.php/2156481
There are specialized search engines for almost every topic • For a list of over 3,000 search engines go to Search Engine Guide:http://www.searchengineguide.com • For detailed information aimed at search professionals try SearchEngineWatch:http://www.searchenginewatch.com
Most Popular Search Engines • All the Web, AltaVista, Gigablast, Ask Jeeves, Dogpile, Google, HotBot, Metacrawler, LookSmart, Lycos, MSN Search, Netscape Search, Teoma, WiseNut and Yahoo! (Source: SearchEngineWatch.com)
Google.com • Result rankings are based on the number of links made to the site from other web pages. • Give you sites that web page creators have “voted” for with their links. • An .edu link counts more than one from a .com page.
Google.com • Most popular search engine & searches greatest number of pages (3.3 million) • Special features include Advanced search, Image, Froogle, Blogger, Google Catalogs etc.
Vivisimo.com • Queries one or more web search engines (Metasearch). • Clusters Documents into groups based on this information. • Groups the documents Orders the groups and the documents within each group. • Displays the hierarchical categories.
Yahoo.com • Originated “Directory” format to organize sites by subject and subheadings. • Can personalize: “My Yahoo”. • Geographic versions “Get Local.”
Teoma.com • Results – ranked list. • Refine – suggestions to narrow your search. • Resources – link collections from experts to enthusiasts. • Watch out for “Sponsored” page results – paid listings.
Hotbot.com • Advanced searching in Hotbot and other search engines lets you limit by: • Language • Domain • Region • Date • Content etc.
Alltheweb.com • Indexes 3.15 billion pages (almost as many as Google). • You can customized your preferences. • Language translator and language settings.
This PowerPoint presentation was prepared by: Linda J. Goff Head, Instructional Services University LibraryCalifornia State University, Sacramento.email@example.com http://www.lib.csus.edu/services/instruction/indiv/ LJG:2/16/2004
Search Engine Comparison • Always try more than one! • http://www.llrx.com/features/searchenginechart.htm