1 / 47

Search Engines

Search Engines. HIMA 4160 Fall 2008. Agenda. Components of Search Engine Boolean Search Advance Features of Google. What is web search engine?. Why do we need search engine?. Which search engine do you usually use to search information online?.

faunus
Télécharger la présentation

Search Engines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Search Engines HIMA 4160 Fall 2008

  2. Agenda • Components of Search Engine • Boolean Search • Advance Features of Google

  3. What is web search engine?

  4. Why do we need search engine?

  5. Which search engine do you usually use to search information online?

  6. http://marketshare.hitslink.com/report.aspx?qprid=4

  7. What are in the deep web? • Dynamic webpages • Un-indexed webpages • Frequently updated webpages. • Facebook etc.

  8. Do you think your personal page can be searched on Google? • Yes • No • Maybe

  9. Send your website to Google http://www.google.com/addurl

  10. Search Engines • Crawler • Indexer • Query processor

  11. Search Engines Indexer Indexed webpage saved in database Web Crawler Fetch webpages from the Internet Query processor

  12. Web Crawler

  13. <a href=“http://www.ecu.edu”> East Carolina University </a>

  14. Web Crawler • A computer program that can send http request to various websites • Can follow the hyperlinks to continuously visit WebPages • Similar to a web browser • Also called spider or web robot.

  15. Google Server

  16. Indexing

  17. How to build the Index • Robot – computer program to visit from webpage to webpage through hyperlinks. • You can also submit an URL to Google • After visit a page, a snap shot is store on the search engine’s server • Index file is created – words, frequency, URL and other useful information

  18. Interface

  19. Interface (Yahoo)

  20. Interface (askjeeves)

  21. Interface (Bing)

  22. What’s in common?

  23. How Google Process a Query

  24. Boolean • AND • OR • NOT • female AND student • female OR student • NOT female • (NOT female) AND (NOT student)

  25. Let’s tweak it (Google) • health information management • health AND information AND management • health OR information OR management • “health information management”

  26. Basic Google search features • Choose search term • Usually 1 – 3 words • Capitalization • NOT case sensitive • Automatic “and” query • Health Information Management = Health AND Information AND Management • Automatic exclusion of common words • Where, when, in, single digits, single numbers … …

  27. Basic Google Search Features • Word variation • Diet, dietary • Phrase search • “Health Information Management” • Negative terms • bass -music • I am feeling lucky • First thing in the list

  28. Advanced Google Search • Site search • health information management site:ecu.edu • Filetype search • Health information management filetype:pdf • “+” search • Star Wars Episode + I • Synonym search • ~food ~fact

  29. Advanced Google Search Features • “OR” search • vocation London OR Paris • Number range search • DVD player $100..$200 • Wild card search • East * university

  30. Advanced Google Operator • cache: • cache:www.ecu.edu • link: • link:www.ecu.edu • related: • related:www.ecu.edu • info: • info:www.ecu.edu

  31. Advanced Google operator • define: • define:health • stocks: • stocks:GOOG • allintitle: • allintitle: health information management • intitle: • intitle:health information management • allinurl: • allinurl:ecu hsim • Inurl: • Inurl:ecu hsim

  32. Other things Google can do • Converter • Calculator • Translator • Scholar • Shopper • Usher • Map • Instant messaging • Switchboard • Many more … … http://www.pcmag.com/article2/0,1895,1858681,00.asp.

  33. Additional Information • http://douweosinga.com/

  34. Rank – how do you know which link is important? • Before Google • Conventional information retrieval • After Google • Link analysis

  35. Link Analysis

  36. Page Rank

  37. Google Bomb • “Miserable failure” • Defused by Google in 2006 due to algorithm changes

  38. Google’s business model • How to make money from search? • Google’s business model • Click stream • Pay by clicks

  39. Google become the Information Portal • Google become the portal to information of the world • How scary is it? • “Do no evil”

  40. Tips for Efficient Search • Be clear about what sort of page you seek • Think about what type of organization might publish the page you want • List terms that are likely to appear on the pages you are looking for • Assess the results • Consider a two pass strategy

  41. Summary • Three parts of search engine • How indexer work • How query processor work

More Related