320 likes | 421 Vues
Explore the database content, comparative size, overlap, and future developments of popular search engines. Understand the evolution, challenges, and possibilities in the world of online search. Dive into the search engine showdown!
E N D
Web Search Engines by Greg R. Notess notess@imt.net imt.net/~notess/search
Overview: • Comparing the database content • Change • Comparative Size • Overlap • Looking towards future developments • Portal or Destination • Output sorting
Results are limited by • Database content • The Web sites included • The depth to which they are indexed
If it’s not in the database, the best search engine will not be able to find the Web page
So what’re they like? • Very large databases • Most index all words on page • None index words in images • Let’s see how the databases compare to the real Web
Overall Size Change Is the Web in general • Growing? • Shrinking? • Remaining the same?
What about the rest? • Who’s the biggest? • How to measure? • Actual search results • Verified hits
And over time? • 8/98 -- AltaVista, Northern Light, HotBot • 5/98 -- AltaVista, HotBot, Northern Light • 2/98 -- HotBot, AltaVista, Northern Light • 10/97 -- AltaVista, HotBot, Northern Light • 9/97 -- Northern Light, Excite, HotBot • 6/97 -- HotBot, AltaVista, Infoseek • 10/96 -- HotBot, Excite, AltaVista
Back to change in size • Let’s look at six search engines • Over the course of two years
But at least • They have a high degree of duplication between them • Right?
Try 4 small searches • Using five search engines • How many pages are found by all five or at least by four of them?
And they exclude most: • Content of Adobe PDF and formatted files • The content in most sites requiring a log in • CGI output: data requested by a form • Other dynamically produced data • Pages protected by a robots.txt file • Intranets, pages not linked from anywhere else • Commercial resources with domain limitations • Non-Web resources
Scope Summary: • Inconsistent growth • Not full coverage • Surprisingly low duplication
Positive Side? • Essential for searching the Net • Can be used effectively • Phrase search • Use more than one • Smart searching
Incredibly popular • Even when they fail • But then, since when is finding information always easy?
Overview: • Comparing the database content • Change • Comparative Size • Overlap • Looking towards future developments • Portal or Destination • Output sorting
What is a search engine? • Portal? • Gateway? • Destination?
Search Engine • the software than searches a database
Development • Database of Web pages • adds Supplementary Database • Phone numbers, reference, businesses, news • then adds Subject directory • then Services • email, ISP, shopping, travel agent • now Communities
Portal to Destination? • Driving force • advertising revenue • Keep users longer for more • Conflicts with portal and gateway principle
Future possibilities? • Smaller databases • Less pointing to external pages • Paid advertising or sponsorship for visibility • Rise of search only sites?
Output Development • Initially, “Relevance” ranking • Crude • Not site or URL based • Some site sorting from Excite • No date sorting
Site Sorting • Infoseek, then Lycos, now HotBot • Group together by site • More relevant than prior algorithms • Northern Light includes it in • Custom Folders
Other Output • RealName on AltaVista • Direct Hit on HotBot • Subject Directory Categories • News • Books, CDs, etc. “about search term”
Search Engine Showdown • imt.net/~notess/search • Search engine features • See also • www.searchenginewatch.com • See also • Rich Wiggins, Coming up next . . .
Web Search Engines by Greg R. Notess notess@imt.net imt.net/~notess/search