1 / 34

Today’s Agenda

Today’s Agenda. Exam post-mortem Should I drop? Search Engines: Details & Ramifications Future: 2 Presentations every Thursday Project #2 is coming out next week (Website) Lab: Switching to Javascript. Exam post-mortem. 1. Communication substrates Ethernet USB Wireless Satellite

louisa
Télécharger la présentation

Today’s Agenda

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Today’s Agenda • Exam post-mortem • Should I drop? • Search Engines: Details & Ramifications • Future: • 2 Presentations every Thursday • Project #2 is coming out next week (Website) • Lab: Switching to Javascript

  2. Exam post-mortem 1. Communication substrates • Ethernet • USB • Wireless • Satellite • Infrared

  3. Exam post-mortem 2. Content & Services • email (pop, imap, etc.) • webpage (http) • documents (ftp) • peer-to-peer, chat, IM, etc • video • audio

  4. Exam post-mortem 3. Two or more computers  Network Two or more networks  Inter-network A sub-network is actually part of a network. WAN  Wide Area Network implies long distance. An inter-network can be in the same room.

  5. Exam post-mortem 6. The Internet was a military project DARPANET until 1969/70. The Internet was a research project ARPANET throughout the 70’s (Available only at National Labs and Universities) Bulletin boards and email became available to the general public in the 80’s However, 1990 is a reasonable answer.

  6. Exam post-mortem 7. In 1946 computers didn’t really exist yet. In 1956 people had yet to even envision connecting computers in different locations 8. Web browsing via hypertext was going on in the late 80’s By 1995 e-commerce was already happening Emerged  to come forth from obscurity

  7. Exam post-mortem 19. 200 million hosts in 2002 20. Did you notice that it multiplies by 4 every year. 800,000,000 best answer 500 million to 1 billion was only -1 21. False, In 2010 will there be 100 billion people on the earth?

  8. Exam post-mortem 23-28. 100 cable pro connections is better Costs less More bandwidth The problem is that you’d have to manage 100 separate connections Road Runner probably doesn’t have that many separate lines crossing one point.

  9. Exam post-mortem 38. Perhaps I wasn’t clear but…Domain registration is a yearly cost Internet access is required. Was anyone thinking: At school or work internet access is free. • At school its not! Is your room free? • At work its not! Someone pays. • Netzero is free, right? Good luck.

  10. Q: Should I drop? • A: No! • Here’s why?

  11. Today’s Agenda • Exam post-mortem • Should I drop? • Search Engines: How exactly they work. • Future: • 2 Presentations every Thursday • Project #2 is coming out next week (Website) • Lab: Switching to Javascript

  12. Search Engines • Background • In the early 90’s, people still wore mullets, and • finding info. on the WWW was not easy. • Links were important • Hubs & Authorities emerged.

  13. Typical Academic Webpage Welcome to UCLA’s Neurosurgery Website Here are some academic publications Here are links to other Neurosurgery Websites Typical Personal Webpage Hi my name is Rupert “I don’t have a life” McNerd Here are pictures of Heather Locklear Here are links to other people who have nerdy websites with pictures of Heather Locklear Search Engines

  14. Search Engines • Through • word of mouth • email • message boards • Hubs & Authorities emerged • A hub is a website that links you to other important websites • There are good hubs and bad hubs • An authority is any website that has information, data, etc. • There are good and bad authorities • Some websites are both Authorities and Hubs

  15. Search Engines • The Problem: • People designed “homepages” with lots of links • Not for the benefits of others but • to help themselves find stuff • The lists of links were not necessarily related, organized, or kept up to date • As a result, its hard for ordinary people to find or identify good hubs. • Example • http://alumni.umbc.edu/~efreem2/lynx.html

  16. Search Engines • The Solution: • Comprehensive Directories • The first big one was Yahoo! • Yahoo! began as a student hobby in February 1994 • David Filo and Jerry Yang, Ph.D. candidates in Electrical Engineering at Stanford University • They started their guide to keep track of their personal interests on the Internet. • Eventually, became too long and unwieldy, and they broke them out into categories. • When the categories became too full, they developed subcategories ...

  17. Search Engines • Yahoo! • an acronym for "Yet Another Hierarchical Officious Oracle,“ • Even though much of the process was automated, a lot of human care went into their directory. • Yahoo distinguished itself by using a combination of custom software and human care to make a well-organized and somewhat comprehensive directory of the WWW. • The had experts who would help organize categories • They allowed people to submit web pages, locations, and descriptions

  18. Search Engines • Yahoo’s directory is stored in a database

  19. Search Engines • Yahoo! Directory is created using • User submission • Staff, consultants, etc. • Robots/Spiders (programs that fetch pages automatically and add them to the directory)

  20. Search Engines • Initially, Yahoo’s was not a search engine. • It was a directory. • While it was possible to search the directory using keywords, • Users were not searching the entire WWW • Problem: If you were not in the directory, your site would not be found by a Yahoo search.

  21. Search Engines • Ways to get your site noticed by Yahoo • Fill out an online site submission form • Get lot of people to link their page with your page and hope that a Yahoo staff or robot finds it. • Add lots of meta tags that are consistent with your sites content.

  22. Search Engines • Problem: Great websites pop us so quickly that Yahoo can’t find them all. • User submission (many people don’t submit their site) • Staff, consultants, etc. (you’d need an army) • Robots/Spiders(most effective way to build a directory)

  23. Search Engines • Another Problem: • Robots/Spiders aren’t good at automatically determining • Description • Keywords • Category • even the title • Web pages are often poorly composed, and • Down-right, misleading.

  24. Search Engines • Example: Click here to view your local weather (Actually this will bring you to a porn site and the makers of this web page get 0.5 cents every time some idiot clicks this link) your local weather, your local weather, weather channel, current temp, local weather, your local weather, your local weather, your local weather, your local weather, your local weather, weather channel, current temp, local weather, your local weather, your local weather, your local weather, your local weather, your local weather, weather channel, current temp, local weather, your local weather, your local weather, your local weather, your local weather, your local weather, weather channel, current temp, weather, your local weather, your local weather, your local weather, your local weather, weather channel, current temp, local weather, your local weather, your local weather, your local weather, weather channel, current temp, local local weather, your local weather, your local weather, your local weather, your local weather, weather channel, current temp, local, your local weather, your local weather, your local weather,

  25. Search Engines • Even though these robots/spiders do a poor job of analyzing information • Search engines emerge with directories completely built from information gathered automatically. • As the WWW grows, • directories become more automated to the point where there is little human care involved • search engines compete to try to index the entire WWW

  26. Search Engines • Quantity becomes more important than quality and the Search Engine is born. • (see the history of search engines) • Q: What is the difference between a search engine and a searchable directory? • A: Nothing really. • In fact, some search engines automatically generate a categorized directory from their index database. • If there is a difference… its the quality and correctness of the categories.

  27. Search Engines • Recall the database behind Yahoo’s directory

  28. Search Engines • Recall that robots/spiders do NOT do a good job of determining • Description • Keywords • Category • even the title • Q: So what is actually stored in the database of a search engine?

  29. Search Engines • All you can store is the raw content (i.e., the words)

  30. Search Engines • How to make a search engine. • Send robots out to collect websites • Build an index URL  list of words. • Remove stop words • Invert the index Word  list of URL’s • Design some formula or methodology for ranking URL’s

  31. Search Engines • Despite the problems, search engines dramatically changed the WWW. • People had the notion that the WWW was itself a huge database of information that could be searched. • Most prominent Search Engines • Altavista, Lycos, Infoseek, AskJeeves, Looksmart, Hotbot, Google. • To survive ($$$) search engines became advertising venues

  32. Search Engines • The Big Problem  Too Much Information. • Even if most of the information retrieved by a search engine was relevant to what the user wanted, • users could easily get overwhelmed and give up if the first two or three hits were not appropriate. • The problem of dealing with too much information is a problem that was never really a problem in the past.

  33. Search Engines • Designing an effective search engine was an information management dilemma that had never been seen before. • The information was vastly distributed • Not uniform or consistent • Excessively redundant • Massively large

  34. Next Class • Two presentations

More Related