1 / 30

Computer Science 1000

Computer Science 1000. Information Searching I. Permission to redistribute these slides is strictly prohibited without permission. World Wide Web – The Basics our next topic examines how to find information on the web we consider a few basic terms here (which you’re probably familiar with):

alijah
Télécharger la présentation

Computer Science 1000

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission

  2. World Wide Web – The Basics • our next topic examines how to find information on the web • we consider a few basic terms here (which you’re probably familiar with): • page/web page • link/hyperlink • site/web site • later in semester, we will revisit web technologies in much more detail

  3. World Wide Web • a system of linked documents accessed via the internet • often simply referred to as the web • sometimes used interchangeably with the internet, but this isn’t exactly correct • the internet is the global network of interconnected devices (computers, routers, etc) that exchange data • the web refers to the documents being stored, the software that broadcasts and receives them, and the protocols used for transmission

  4. Web Page • a document stored and accessed on the web • identified by a unique URL (Uniform Resource Locator) • often referred to simply as a page • today’s web pages are very rich in content • text • images • hyperlinks • videos

  5. Web Site • a collection of related webpages on the internet • typically belong to a common organization or event • example • all pages served by the University of Lethbridge make up its website

  6. Hyperlink • a part of a web page that refers to a different location • often just called a link • hyperlinks can reference: • another place on the same page • another webpage • hypertext: text containing hyperlinks

  7. The Age of Information • the computer, internet, and web have changed how we interact with information • information storage • the amount of available information is significantly greater (and growing rapidly) than even a generation ago • information transmission • large amounts of information are available with a single mouse click, and transfer almost immediately

  8. Information Age – Rapid Onset • the situation has transformed tremendously in your lifetimes • consider the global information capacity: • in 1986: 2.6 exabytes (< 1 CD per person) • in 1993: 15.8 exabytes • in 2000: 54.5 exabytes • in 2007: 295 exabytes (61 CDs per person) • how does one successfully navigate such a mountain of digital content? Martin and Lopez. The World’s Technological Capacity to Store, Communicate, and Compute Information. Science 332:6025 2011

  9. Information Access • even in pre-internet days, there was a wealth of information • large-scale: library • medium-scale: Encyclopaedia set • small-scale: newspaper • strategies developed to manage information • categories • hierarchies • indices

  10. Classification • systematic arrangement in groups or categories according to established criteria – Merriam Webster • in other words, the information is categorized according to relevant features • consider our course notes: • terminology (4 sets of slides) • information searching (2-3 sets of slides) • etc ...

  11. Classification • classification is not specific to digital information • library classification: Library of Congress Classification Dewey Decimal Classification

  12. Classification • classification is not specific to digital information • newspaper classification

  13. Classification • classification level of detail leads to tradeoffs • consider a coarse level of detail • e.g. taxonomy of living organisms • classify organisms according to Domain (Archaea, Bacteria, Eukarya) • advantage: small number of groups • disadvantage: each group is massive

  14. Classification • classification level of detail leads to tradeoffs • consider a fine level of detail • e.g. taxonomy of living organisms • classify organisms according to Genus (Canis, Felis) • advantage: each group reasonably small • disadvantage: massive number of groups • solution: hierarchy

  15. Hierarchy • a decomposition of classifications according to detail • hierarchies contain levels • at the top (root) level, there is typically a small number of broad categories • each category is decomposed into small categories • a classification group is defined by categorization at each level

  16. Hierarchy • organism taxonomy hierarchy: • each Domain categorized into Kingdoms Eukarya Domain: Kingdom: Protista Animalia Fungi Plantae

  17. Hierarchy • organism taxonomy hierarchy: • each Kingdom classified in Phylum • each Phylum classified into Class • and so on .. http://ag.arizona.edu/pubs/garden/mg/entomology/intro.html

  18. Hierarchy • an object is still categorized, but by multiple levels (instead of one) http://schoolworkhelper.net/scientific-taxonomy/

  19. Hierarchy • facilitates efficient searching through exclusion • example (text): • suppose you have a collection of a million items • these items organized into 10 equal-sized groups • each top-level group is also organized into 10 equal subgroups • choosing first category eliminates 900000 items • choosing second category eliminates 90000 items • and so on …

  20. Hierarchy • hierarchies are very popular • consider our previous examples: • Library of Congress Classification

  21. Hierarchy • hierarchies are very popular • consider our previous examples: • Newspaper

  22. Index • a detailed list of words, phrases, and/or topics indicating place of occurrence • in essence, it maps keywords of interest to their location • e.g. a page number • a bottom-up approach to information organization • as opposed to the top-down structure of a hierarchy • particularly popular in printed material • books, magazines, volumes, etc

  23. Index - Example

  24. Index • typically used on small-scale • books and volumes vs. libraries • made efficient through organizational scheme • alphabetical is very common • some overlap with hierarchies • e.g. subtopics

  25. Finding Information – The Web • as discussed, the amount of information on the web is immense • many of the discussed techniques for information finding also apply digitally • classification/hierarchies • indexing

  26. Classification • many commercial websites have a classification structure • navigation bars

  27. Hierarchies • many websites, especially large ones, will also arrange their categories in hierarchical fashion

  28. Partition • a hierarchy where every object occurs only once • organism taxonomy – every species appears only once • some hierarchies are necessarily partitions • e.g. a particular book will only occur at one point in a library classification • however, a partition in some case is not natural • an object might have an inherent fit in more than one classification

  29. Partitions • digital content is often stored using overlapping hierarchies (non-partition) • potentially more intuitive • with hyperlinking, it’s easy to accomplish (two links to the same page) • example (text): • Three Books for Frugal Fashionistas was stored on NPR’s website under: • Home > Arts & Life > Books > Three Books for Frugal Fashionistas • Home > Listen > Latest Program > Three Books for Frugal Fashionistas

  30. Indexes for the Web • unlike hierarchies, indexes are much less common on individual websites • site maps might be considered an index of sorts • however, there are analogous technologies to indexes that pertain to the web as a whole • Search Engines!

More Related