500 likes | 591 Vues
For the Google-Dependent: The Other Search Engines. Michael Hunter Reference Librarian Hobart and William Smith Colleges For Rochester Regional Library Council Member Libraries’ Staff Sponsored by the Rochester Regional Library Council
E N D
For the Google-Dependent:The Other Search Engines Michael Hunter Reference Librarian Hobart and William Smith Colleges ForRochester Regional Library Council Member Libraries’ Staff Sponsored by the Rochester Regional Library Council Supported by Regional Bibliographic Databases and Resources Sharing (RBDB) funds granted by the New York State Library 2008
For Today . . . • Landscape of Search in 2008 • Update on Established Services • New Services • Creating Custom Search Engines
Why do I need more than Google? • The Google effect -- • the single most powerful force in today’s Internet • a private profit-driven company • owns more information on individuals’ search behavior, companies and organizations than any other entity
Why do I need more than Google? • Great potential for misuse/abuse of this information for financial gain • Societies seldom leave basic services (utilities, medical and traffic regulation) totally to the “free market” • Is web search now a “basic service” ???
Search dominance --- • Potential skewing for commercial, political or social purposes • Database composition • Ranking • Privacy • No single search engine can crawl the whole web • Limits search features, results display, consumer and shopping information • http://google-watch.org
Web Search in 2008Who’s crawling the Web? • Google • Yahoo • Live Search (MSN) • Ask owns Teoma • Gigablast • Exalead
Size Estimates 7/9/08Google AND Yahoo!text filetypes in millions
Size Estimates 7/9/08Google AND Yahoo! text filetypes in millions
Convincing others ... • twingine.com • Searches Google and Yahoo, with results in separate frames • jux2.com • A meta for Google, Yahoo and Live, giving rank of each result from each service
Yahoo Open Strategy • Y!OS – major internal and external redesign to unify all Yahoo’s services • Owns Flickr, del.icio.us, Upcoming • “We are building social into everything we do” • Offers more control over what is shared • Easier to set up small social networks • Will open some search technology to developers and users (http://developer.yahoo.com/search/boss/)
Yahoo and the Semantic Web • Will begin to include certain metadata embedded in web pages as search and ranking elements • Dublin Core hCard • Creative Commons hCalendar • RDF hReview • GeoRSS hAtom • Will support Open Search specification allowing crawler access to deep web resources (!!!)
OpenSearch: The Invisible Web made Visible • Helps search engines and invisible web databases communicate through a common set of formats to perform search requests. • Created by Amazon.com and available through Creative Commons • Potentially one of the most significant developments for web search in the last ten years • http://www.opensearch.org
Search.yahoo.com • Search subscription content • Consumer Reports Factiva • Forrester Research Wall St. Jn. (30 days) • LexisNexis FT.com TheStreet.com • Yahoo! Answers answers.yahoo.com • Online community connecting people with questions to people wanting to answer them • 90 million users sharing knowledge worldwide • Feedback and answer reviews encouraged • Limit by Creative Commons (advanced search)
Yahoo’s Search Assist • Ajax-based service that “suggests” terms and shortcuts as you type • Activate by clicking blue arrow below the search box before searching • Also offers “Explore Concepts”, searches “Shortcuts” highly associated with the search terms
Yahoo Pipes - pipes.yahoo.com • Users can combine, filter and display any RSS content • Finished “pipes” can be shared and embedded in other web pages • eg. A pipe for RSS feeds from educational blogs flitering for technology, physics or any other keywords • Version available for the iPhone iphone.pipes.yahoo.com
Mashup • Web application combining data from more than one source into a single tool • Used to • Navigate and visualize large and/or dynamic datasets • Combine data with dimensions of time, distance and location • Juxtaposing data from different sources can reveal new relationships
MSN’s Live.com • Database increasing • Simpler Interface (4/08) • “Rich Answers” blended results • Image search enhancements filter:face filter:portrait filter:bw • NLP question processing improved • Live Search Books and Search Academic ended 5/08
New & Notable at Ask • The Butler is gone! Teoma is in his place! • Smart Search • Web Answers • Zoom • Superior Mapping Tools
Gigablast • Maintains unique database • Offers advanced search features • “Freshness dating limit” estimates the date that a particular page was first published or most recently edited or modified • Custom Topic Search of Gigablast – up to 500 domains (www.gigablast.com/cts.html)
Exalead - www.exalead.com • Launched October 2004, based in France • Maintains its own database • Smaller than most US services (8 billion) • Offers “Narrowing Options” • Advanced features: • Phonetic spelling with “soundslike” • Approximate spelling with “spellslike” • Limits: Site (URL), Filetype (8), Adult content, Language (57!!!)
Wikiasari:Quick rummaging search • “User ranked results” • Open source SE by Jimmy Wales and Amazon • Initial results ordered with algorithms a la Google • Users reorder results, which will be used in ranking of future similar searches
Wikiasari:Quick rummaging search • Strength is in general search topics • Deep, complex or unusual searches will not benefit as much • Intended to rival Google and Yahoo • Edits allowed on all search results • Recently launched • http://search.wikia.com
Kosmix www.kosmix.com • Google interface • Offers overview of results by document type Basic Facts Reviews & Opinions Media People & Community Shopping News • Extensive clustering by subject • Blended results with thumbnails of images, video and audio clips, presentations and reports • Human-created “topic pages” for subjects of current interest
icerocketwww.icerocket.com • Searches Blogs Web MySpace News Image • Link to cached version from Internet Archive’s Wayback Machine • Limited advanced search features • MAY be a Google interface
ChaChawww.chacha.com • Free mobile search service • Requires a (free) account • Text your questions and a human “guide” sends back an answer, limited to 160 characters • Supported by 98% of mobile providers
Clustywww.clusty.com • Metaengine – Source engines for the Web include Live Ask Gigablast Wisenut Open Directory • Searches Web News Images Wikipedia Blogs Jobs • Most extensive clustering capability of any meta (Vivisimo) • Custom “Tabs” run saved searches on engines of you select
Searchcrystal.com:Visual Metasearch • Options include List View, Spiral View and Cluster Display • Results common to more engines appear in the center • Color=source engine • Shape=number of engines retrieving the page • Size=rank position • Web, Blogs, Images, Tagging sites and more…
Google’s Custom Search www.google.com/coop/cse/ • A tool from the Google Coop initiative • Keywords chosen determine content and weighting of results (limit of 100 characters) • Search • Entire web • Your selected sites only • Entire web with selected sites emphasized • Within Coop, a CSE can be created and maintained collaboratively • Stored or Linked versions available
Adding sites to a Stored CSE • Manually • Using Google Marker – bookmarking tool available for Firefox and IE7 • RSS feeds may be included • Add • Full domains www.moma.org • Subdomains www.moma.org/*research* • A single page www.moma.org/modernteachers/
Linked CSE • Sites can be added “in bulk” • Select among your sites for individual queries through specification files • Requires user to host and maintain their own XML specification files • Migration from stored to linked versions possible • More difficult to add single sites • Use G’s Search API’s to integrate other Google services into a CSE
Other CSE’s • Gigablast – Custom Topic Search www.gigablast.com/cts.html • Live Search Macros search.live.com/macros • Rollyo – searches Yahoo www.rollyo.com • Swicki www.swicki.eurekster.com
Semantic Search Systems • Understand the user’s query • Understand Web text • Bring these together for query results that are contextually relevant • Algorithms that match the meanings and not just the words • Natural Language Processing • Concept Mapping
Semantic Search Systems • Expensive and time-consuming for general web search; more possible in subject-specific contexts “What is palladium used for?” • Link-based crawler results: London’s Palladium Theatre • Include the concept map “used for” Sites about the element palladium • Hakia, Powerset, Cognition Search
Post SearchWhat do we do AFTER a search? • The search engine size wars are over • WANTED: Services that help manage, share and update • Web search results • Tagged sites WITH scalability confidentiality “collaborability” across all platforms, devices and file formats
Thank You! Michael Hunter Reference Librarian Hobart and William Smith Colleges Geneva, NY 14456 (315) 781-3552 hunter@hws.edu