1 / 19

Search Engines

Search Engines. Jan Damsgaard Dept. of Informatics Copenhagen Business School http://www.cbs.dk/staff/damsgaard. Introduction. How to find relevant information on the web a major problem Size, growth, lack of universal semantic organization major impediments Two major strategies

Télécharger la présentation

Search Engines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Search Engines Jan Damsgaard Dept. of Informatics Copenhagen Business School http://www.cbs.dk/staff/damsgaard

  2. Introduction • How to find relevant information on the web a major problem • Size, growth, lack of universal semantic organization major impediments • Two major strategies • Improve users’ search capability by using raw computer power: search engines • Help organize user relevant information into meaningful categories and bundles of services: portals Jan Damsgaard, 2004

  3. Definitions • Search engine • Specific information retrieval software which provides as results URL and descriptions web pages • Portal • Site that forms a major site for users when they connect to web; portals combine directories, services and search capabilities and personalization Jan Damsgaard, 2004

  4. Search Engines • Technical and business solutions that provide these services on a mass scale are important internet phenomena for two reasons: • 1) they obtain immense hit rates and therefore are major points of origin for any internet activity • 2) they are most important means to channel user search and retrieval • Therefore they are strategically important as reflected in the valuations of the search engine companies in the market • www.mediametrix.com • www.nielsen-netratings.com Jan Damsgaard, 2004

  5. FDIM (top ti)

  6. Look at the stickiness Top 10 sites in November 2000 in terms of minutes spend per month Jan Damsgaard, 2004

  7. Where Do Search Engines Develop Market Value? • Market recognition, leading to • popular use and adoption • selling add impressions • long term contracts for search engine functionality • Market assessment of real options associated with the recognition of the tool in the marketplace • future value-added alliance and spin-offs Jan Damsgaard, 2004

  8. Search engine basics • Basic information retrieval techniques • Market trends and capabilities • Awareness of popular assessment metrics for search engine performance • Search engine business models Jan Damsgaard, 2004

  9. How Search Engines Work • Three components: • spider or link crawler software agent • index or catalog database of content • search engine software or combined meta-search engine • Require significant hardware horsepower, server connectivity and database capabilities • If not connected, you submit your links Jan Damsgaard, 2004

  10. How do search engines work • Add keywords to text fields • Critical is the choice of the keywords, possibilities of their combination and how the search engine exploits the results • Multilingual support • Another issue is how it organizes search result Jan Damsgaard, 2004

  11. The most popular search engines

  12. March 2002:Google, WiseNut, AlltheWeb August 2001:Google, Fast, WiseNut April 2001:Google, Fast, MSN (Inktomi) Oct. 2000:Fast, Google, Northern Light July 2000:iWon, Google, AltaVista April 2000:Fast, AltaVista, Northern Light Feb. 2000:Fast, Northern Light, AltaVista Jan. 2000:Fast, Northern Light, AltaVista Nov. 1999:Northern Light, Fast, AltaVista Sept. 1999:Fast, Northern Light, AltaVista Aug. 1999:Fast, Northern Light, AltaVista May 1999:Northern Light, AltaVista, Anzwers March 1999:Northern Light, AltaVista, HotBot January 1999:Northern Light, AltaVista, HotBot August 1998:AltaVista, Northern Light, HotBot May 1998:AltaVista, HotBot, Northern Light February 1998: HotBot, AltaVista, Northern Light October 1997:AltaVista, HotBot, Northern Light September 1997:Northern Light, Excite, HotBot June 1997:HotBot, AltaVista, Infoseek October 1996:HotBot, Excite, AltaVista Popularity over time http://searchengineshowdown.com/stats/size.shtml Jan Damsgaard, 2004

  13. Also specific services • E.g. Google provides • Find pdf files • Stock quotes • Cached links • Similar pages • Who links to you • Specific site • Dictionary definitions • Find Maps Jan Damsgaard, 2004

  14. Major design issues: completeness and relevance The larger the overlap the better in terms of completeness How to organize the results for fast reviewing The set of relevant replies The set of obtained results The smaller the set of not relevant Replies the more relevant search

  15. Page Ranking for Relevance • Biased or unbiased by search engine? • The size of the search space (pages e.g. google addresses currently 1,346,966,000 pages) • Use of keywords: in title, meta-tags information in HTML code, or near top of the page • Use of other facilities like semantic nets or reliability indices (E.g. google uses page ranks and filtering) • Daily, weekly, monthly WebCrawler software refresher • For an analysis see http://www.notess.com/search/ Jan Damsgaard, 2004

  16. Special features of search engines • Multi-lingua searches • Natural language interfaces • Image searches • Agents (specific crawlers and service providers, e-mail, news agents, shopping and trading agents) Jan Damsgaard, 2004

  17. Search Assistance Features • Phrase Searching • finds terms you enter into the search box as a phrase; tells you in results whether any full or partial matches found • Stemming • Ability for search engine to search for variations of word based on stem • Entering "swim" might also find "swims" and maybe "swimming," depending on the search engine, in some other languages more important • Some search engines have stemming switched on by default • Clustering • Allows only one page per site to be represented in the results Jan Damsgaard, 2004

  18. Conclusions • Search engines are key elements of Internet business • Next wave will integrate new interfaces and new access channels (Digital TV, wireless) • Mass scale business with the value of installed base Jan Damsgaard, 2004

More Related