1 / 21

Searching the Internet

Searching the Internet. CSCI-N 100 Department of Computer and Information Science. Searching the Internet. What is the Internet Does anyone own the Internet How is the Internet controlled. The Internet…. It is not a centrally owned or organized institution. It is not a single entity.

cicada
Télécharger la présentation

Searching the Internet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Searching the Internet CSCI-N 100 Department of Computer and Information Science

  2. Searching the Internet • What is the Internet • Does anyone own the Internet • How is the Internet controlled

  3. The Internet… • It is not a centrally owned or organized institution. • It is not a single entity. • It is not a 'Den of Iniquity' • It is not crawling with eight - year - old children controlling nuclear bombs. • The Internet is not a hive of viruses waiting to attack your computer. • The Internet is not just for pimple-faced teenagers with propeller beanies.

  4. The Internet… • Is a vast repository of information. • Is relatively universal • Is dynamic – changing minute-by-minute

  5. The Internet • InterNIC • - Internet Network Information Center - An international coalition of Internet organization that has what control there is of the Internet • IAB • - Internet Architecture Board - An organization that sets standards for the Internet • ICANN • - Internet Corporation for Assigned Names and Numbers – An organization responsible for the global coordination of the Internet's system of unique identifiers • W3C • World Wide Web Consortium - develops interoperable technologies, specifications, guidelines, software, and tools

  6. Search engines • Search Engines • an information retrieval system • allows one to ask for content meeting specific criteria • list is often sorted with respect to some measure of relevance of the results • use regularly updated indexes to operate quickly and efficiently

  7. Search engines • First search engines • Archie - archive" without the "v" • created in 1990 by a student at in Montreal • program downloaded the directory listings of all the files located on public anonymous FTP (File Transfer Protocol) sites • creating a searchable database of filenames • could not search by file contents

  8. Search engines • Gopher • indexed plain text documents • created in 1991 at the University of Minnesota: Gopher was named after the school's mascot • most of the Gopher sites became websites after the creation of the World Wide Web because these were text files

  9. Search engines • Veronica (Very Easy Rodent-Oriented Net-wide Index to Computerized Archives) • provided a keyword search of most Gopher menu titles in the entire Gopher listings • Jughead (Jonzy's Universal Gopher Hierarchy Excavation And Display) • a tool for obtaining menu information from various Gopher servers

  10. And the answer is … • People have trouble with • How to ask • What to ask • Where to ask • When to ask

  11. How to ask • Search criteria • Build a query • Date • File name • Location • Keyword • Domain • Country

  12. How to ask • Boolean phrases • And, + (plus) • Finds documents containing all of the specified words or phrases • Peanut AND butter finds documents with both the word peanut and the word butter. • Or • Finds documents containing at least one of the specified words or phrases • Peanut OR butter finds documents containing either peanut or butter. The found documents could contain both items, but not necessarily. • Not, - (minus) • Excludes documents containing the specified word or phrase • Peanut NOT butter finds documents with peanut but not containing butter • Wild card (*) • Finds documents with just given information, * fills in the rest • Pea* returns all pages with the phrase pea (Be Careful!!)

  13. What to ask • All of these words • Documents must contain all of the words you list • This exact phrase • Documents must contain these exact words in the order you typed them • Any of these words • Documents must contain at least one of the words you list • None of these words • Documents that contain these words will be omitted from your results

  14. Where to ask • Search engines • Do not really search the World Wide Web directly • Searches a database of the full text of web pages selected from the billions of web pages out there residing on servers • Search engine databases are selected and built by computer robot programs called “spiders” • After spiders find pages, they pass them on to another computer program for "indexing."

  15. Types of Search Tools • Search engines • built by computer robot programs ("spiders") -- not by human selection • NOT organized by subject categories -- all pages are ranked by a computer algorithm • contain full-text (every word) of the web pages they link to -- you find pages by matching words in the pages you want • huge and often retrieve a lot of information -- for complex searches use ones that allow you to search within results • Unevaluated -- contain the good, the bad, and the ugly -- YOU must evaluate everything you find • Google, Yahoo, Ask.com

  16. Types of Search Tools • Subject directories • built by human selection -- not by computers or robot programs • organized into subject categories, classification of pages by subjects -- subjects not standardized and vary according to the scope of each directory • NEVER contain full-text of the web pages they link to -- you can only search what you can see (titles, descriptions, subject categories, etc.) -- use broad or general terms • small and specialized to large, but smaller than most search engines -- huge range in size • often carefully evaluated and annotated (but not always!!)

  17. Directories • Librarians Index • www.lii.org • Infomine • infomine.ucr.edu • AcademicInfo • www.academicinfo.us • About.com • www.about.com • Google Directory • directory.google.com • Yahoo! • dir.yahoo.com

  18. Types of Search Tools • Searchable database contents or the "Invisible Web" • Invisible Web is estimated to offer two to three times as many pages as the visible web • Pages in non-HTML formats (pdf, Word, Excel, Corell suite, etc.) are "translated" into HTML • Script-based pages, whose links contain a ? or other script coding, no longer cause most search engines to exclude them • Pages generated dynamically by other types of database software (e.g., Active Server Pages, Cold Fusion) can be indexed if there is a stable URL somewhere that search engine spiders can find

  19. Types of search engines • Meta-Search Engines • submit keywords in its search box • it transmits your search simultaneously to several individual search engines and their databases of web pages • Meta-search engines do not own a database of Web pages • Examples • Dopgpile.com • Clusty.com • Surfwax.com

  20. References • Module #8: Communication and Internet protocols • http://www.cs.iupui.edu/~aharris/mmcc/mod8/abip.html • Module #2: Communication and the World Wide Web • http://www.cs.iupui.edu/~aharris/mmcc/mod2/abwww.html • World Wide Web Consortium • http://www.w3.org/ • Search engine • http://en.wikipedia.org/wiki/Search_engine

  21. References • The BEST Search EnginesUC Berkeley - Teaching Library Internet Workshops • http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html • http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/FindInfo.html

More Related