1 / 25

Don’t accept the limits of Google! Presentation for the Energy Institute April 2009

Don’t accept the limits of Google! Presentation for the Energy Institute April 2009. Terry Kendrick Information Now Limited terry.kendrick@btconnect.com 01603 628818. Google enough?. Comprehensive? And enough for any searcher?. Biggest?. Best? -ease of use? -sources?. 90% plus

beata
Télécharger la présentation

Don’t accept the limits of Google! Presentation for the Energy Institute April 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Don’t accept the limits of Google!Presentation for the Energy InstituteApril 2009 Terry Kendrick Information Now Limited terry.kendrick@btconnect.com 01603 628818

  2. Google enough? Comprehensive? And enough for any searcher? Biggest? Best? -ease of use? -sources? 90% plus market share for search

  3. Google the biggest?(sometimes but not always ….) Hmmm… but how many hits can you really see anyway? Source: Search 27 April 2009 19.00

  4. Google the biggest?(sometimes but not always ….) Hmmm… but how many hits can you really see anyway? Cuil – 3,126 Source: Search 12 October 2008 20.50

  5. Google best? • Google is great for coverage and accessibility. Academic library resources are better quality : Brophy, J., & Bawden, D. (2005). Is Google enough? Comparison of an internet search engine with academic library resources. Aslib Proceedings, 57(6), 498-5

  6. Comprehensive and all you need? • “There is nothing in this study to explain why web users seem to greatly prefer the Google search engine, since overall the performance of Google and Yahoo is more or less equivalent, and ahead of their competitors. We therefore suppose that the reasons go beyond the criteria of relevance of results” • Jean Veronis . University of Provence “Comparative Study of Six Search Engines” . 2006

  7. Limits of Google • Doesn’t have everything on the web in its cache • Doesn’t show you everything it has got in its cache • Other search engines may have some different material • Even “breaking” Google will only give you up to around 1000 hits per search • Advanced Search is better done directly into the search line rather than through the mask • (But it’s still an excellent search engine!)

  8. First page results – Google, Microsoft, Yahoo, Ask • Among 12,570 random user-defined queries just over 1 percent of first page search results were the same across the engines • The percent of total results unique to one search engine was 88.3 percent. • The percent of total results shared by any two search engines was 8.9 percent. • The percent of total results shared by three search engines was 2.2 percent. • The percent of total results shared by the top four search engines was 0.6 percent. Source: Dogpile, April 2007 Research by: Queensland University of Technology and Pennsylvania State University

  9. Despite Dogpile’s self supporting research there’s a high overlap in the first ten pages or so though, right? Intuitive …. But is it really the case? See: http://ranking.thumbshots.com/

  10. Yahoo Altavista Alltheweb Google Live Ask BBC Searchme Cuil Trovando.it Exalead Quintura A9 …… Ixquick Vivisimo / Clusty Mamma Dogpile ez2www Surfwax Webcrawler Fazzle Killerinfo Icerocket Zuula Mahalo Toolbe Baidu (China)/ Yandex (Russia) Altsearchengines ( top 100) http://altsr.us/ www.thesearchrace.com/ “Must See”Search engines(all .com unless noted otherwise) .

  11. … don’t forget specialist search engines Examples: www.zoominfo.com People /company summary www.base-search.net. Academic search engine www.searchmil.com/ Military search engine … but good for tools and techniques www.truveo.com – video search engine www.questia.com –”world’s largest online library” www.archive.org – includes “wayback machine” www.seeqpod.com / www.songza.com – playable audio files www.bandsintown.com Gigs www.masterseek.com – business directory

  12. Human web: blogs, newsgroups and mailing lists • www.boardreader.com • www.twazzup.com • www.bloogz.com • www.blogpulse.com • www.feedster.com • www.technorati.com • http://groups.google.com/groups • http://google.com/blogsearch • …also Dark Net (see www.darknet.com) such as Bittorrents Searching them

  13. Google’s view on the size of the web • “Recently, even our search engineers stopped in awe about just how big the web is these days — when our systems that process links on the web to find new content hit a milestone: 1 trillion (as in 1,000,000,000,000) unique URLs on the web at once! • the number of individual web pages out there is growing by several billion pages per day. • So how many unique pages does the web really contain? We don't know; we don't have time to look at them all! :-) Strictly speaking, the number of pages out there is infinite -- for example, web calendars may have a "next day" link, and we could follow that link forever, each time finding a "new" page. We're not doing that, obviously, since there would be little benefit to you. But this example shows that the size of the web really depends on your definition of what's a useful page, and there is no exact answer.We don't index every one of those trillion pages -- many of them are similar to each other, or represent auto-generated content similar to the calendar example that isn't very useful to searchers. But we're proud to have the most comprehensive index of any search engine, and our goal always has been to index all the world's data.” • Google Blog http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html

  14. How big is the deep web? “The Deep Web covers somewhere in the vicinity of 900 billion pages of information located through the World Wide Web in various files and formats that the current search engines on the Internet either cannot find or have difficulty accessing. The current search engines find about 8 billion pages at the time of this writing.” Source: Deep Web Research Research 2006 by Marcus P. Zillman Published January 15, 2006 Fall 2007 data: • Google.com indexes 12.5 billion public web pages. • 71 billion static web pages are publicly-available. These pages can easily be found by Google and other search engines. • 6.5 billion static pages are hidden from the public. As private intranet content, these are the corporate pages that are only open to employees of specific companies • 220+ billion database-driven pages are completely invisible to Google. Google therefore = 6% of the internet ? http://netforbeginners.about.com/cs/secondaryweb1/a/secondaryweb.htm

  15. Invisible Web includes key information resources… • Databases • E.g. Companies House • Library catalogues • Picture collections • “Mash –ups” Password protected/ subscription sites • E.g. Newspaper archives

  16. Example databases (many invisible web) • www.oscars.org • http://vads.ahds.ac.uk/collections/ST.html • www.a2a.org.uk • www.ipo.gov.uk • www.ncjrs.gov/abstractdb/Search.asp • http://businesscreditusa.com/index.asp • http://plants.ifas.ufl.edu/search80/NetAns2/ • www.allmusic.com/ • http://aad.archives.gov/aad/ • www.eric.ed.gov • www.istl.org/01-winter/internet.html

  17. Mashups and podcasts • www.folkestonegerald.com/map/ • www.chicagocrime.org/map • www.housingmaps.com • www.yourhistoryhere.com • www.ufomaps.com • www.gypsymaps.com • www.programmableweb.com/matrix Podcasting: www.ipodder.org ; http://britcaster.com/ www.podcast.net; www.podcastcentral.com; Subject specific example: www.jodcast.net/amp/index.html Google maps

  18. Video streaming Academic • www.researchchannel.org • www.britishpathe.com • http://mitworld.mit.edu/index.php • http://web.sls.csail.mit.edu/lectures/ • http://videolectures.net • www.monkeysee.com/ • www.loc.gov/film/arch.html • www.mediachannel.com • http://showbiz.quickfound.net/video_search_and_news.html • www.youtube.com • www.veoh.com • www.eefoof.com • http://communityvideo.aol.com/Main.do • c/f www.video.google.com Community

  19. Open access repositories • www.doaj.org/ • http://oaister.umdl.umich.edu/o/oaister/viewcolls.html • www.freefulltext.com • www.arl.org/sparc/repos/ir.html • http://archives.eprints.org/ • www.sherpa.ac.uk • http://re.cs.uct.ac.za// • www.hw.ac.uk/libwww/irn/irn142/irn142.html large list • http://www.interdok.com/dopp/search.cfm -conference proceedings, not free access

  20. What if? • The bot visits the site but goes away before doing the whole site (eg parts of pages, number of pages)? • Page author used a “No robots” command? • The material was put up last week or is real time? • The content is dynamically generated (cgi asp and others) • Material is graphic or embedded deep (e.g ppt notes pages) • Spelling is wrong! (e.g Mary J Bilge) • Other reasons!

  21. How invisible is the invisible web? • http://oedb.org/library/college-basics/research-beyond-google “Research Beyond Google: 119 Authoritative, Invisible, and Comprehensive Resources” • www.completeplanet.com/ (and Brightplanet – little out of date)) • http://virtualchase.com/search_engines/databases.html • www.freepint.com/gary/direct.htm (very out of date) • www.deepwebresearch.info (up to date – incredibly detailed often techy) • www.turbo10.com (Hmm…..) www.incywincy.com • www.deepdyve.com • http://www.osti.gov/media/deepWebWM_256.html • www.enth.com • www.iage.com/invisible.html • www.weblens.org/invisible.html • www.deepweb.us • www.llrx.com/features/deepweb2009.htm • http://library.laguardia.edu/invisibleweb/webography • Federated search –Deep Web Technologies • Long shot ……… “Search our database” [subject term] • Database [subject term] How do I find these “invisible” resources

  22. Virtual libraries / Gateways / Portals Examples: • www.hw.ac.uk/libWWW/irn/pinakes/pinakes.html • www.intute.ac.uk • www.loc.gov/rr/askalib/virtualref.html • www.loc.gov/rr/international/portals.html • www.lii.org

  23. Different types of subject gateways • www.tasi.ac.uk/advice/using/finding.html • http://www.kidsclick.org/ • http://yahooligans.yahoo.com/ • www.ala.org/gwstemplate.cfm?section=greatwebsites&template=/cfapps/gws/default.cfm • www.anthus.com/CyberDewey/CyberDewey.html • http://library.bendigo.latrobe.edu.au/irs/webcat/ddcindex.htm • http://listverse.com

  24. Google on the future Coming up with elegant, fitting and relevant solutions to meet the challenges ofmobility, modes, media, personalization, location, socialization, and language will take decades. Search is a science that will develop and advance over hundreds of years. Think of it like biology and physics in the 1500s or 1600s: it’s a new science where we make big and exciting breakthroughs all the time. ……. Just like biology and physics several hundred years ago, the biggest advances are yet to come. That’s what makes the field of Internet search so exciting.http://googleblog.blogspot.com/2008/09/future-of-search.html

More Related