1 / 46

Course Introduction

Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am Fall 2003. Course Introduction. SIMS 202: Information Organization and Retrieval. Credits to Marti Hearst for some of the slides in this lecture. Today. Introductions Course Overview

lucia
Télécharger la présentation

Course Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am Fall 2003 Course Introduction SIMS 202: Information Organization and Retrieval Credits to Marti Hearst for some of the slides in this lecture

  2. Today • Introductions • Course Overview • Administrivia

  3. Today • Introductions • Course Overview • Administrivia

  4. IS202 Teaching Team TA Maria Lawrence TA Mayjane Co Professor Ray Larson Professor Marc Davis

  5. Who Am I? • Professor and Associate Dean at SIMS • Here from the founding of SIMS, faculty member of the “previous school”

  6. What Do I Do? • Research • Design, development and evaluation of information retrieval systems and digital libraries • Cheshire II and III • Bibliometrics of the WWW • Geographic information retrieval (GIR) • Distributed search and retrieval • Applications of Grid computing to (large-scale) IR • Teaching • Information Retrieval • Database Management

  7. Who Am I? • Assistant Professor at SIMS (School of Information Management and Systems) • Background

  8. What Do I Do? • Create technology and applications that will enable daily media consumers to become daily media producers • Research and teaching in the theory, design, and development of digital media systems for creating and using media metadata to automate media production and reuse • Research • Director of the Garage Cinema Research group • Executive Committee member and co-founder of the Center for New Media • Teaching • Multimedia Information • Digital Media Design Studio

  9. Student Introductions • Who are you? • Name • Undergrad degree • Special areas of expertise and interest • Why are you here? • What you want to learn from the course

  10. Today • Introductions • Course Overview • Administrivia

  11. Goals of the Course • Learn about • Design, development, and use of information organization and retrieval systems • Practical and theoretical foundations of information organization and analysis • Evaluation of information access systems • Cognitive and user-centric considerations • Hands-on experience with information systems

  12. Two Main Themes Information Organization and Design Information Retrieval and the Search Process

  13. Information Organization and Retrieval • To organize is to (1) furnish with organs, make organic, make into living tissue, become organic; (2) form into an organic whole; give orderly structure to; frame and put into working order; make arrangements for. • Knowledge is knowing, familiarity gained by experience; person’s range of information; a theoretical or practical understanding of; the sum of what is known. • To retrieve is to (1) recover by investigation or effort of memory, restore to knowledge or recall to mind; regain possession of; (2) rescue from a bad state, revive, repair, set right. • Information is (1) informing, telling; thing told, knowledge, items of knowledge, news. The Oxford English Dictionary, cf. Rowley

  14. (Approximate) Course Schedule • Organization • Overview • Categorization • Knowledge Representation • Metadata Introduction • Controlled Vocabularies Introduction • Thesaurus Design and Construction • Multimedia Information Organization and Retrieval • Metadata for Media • Database Design • XML

  15. Information Properties • Information can be communicated electronically • Broadcasting • Networking • Information can be easily duplicated and shared • Problems of ownership • Problems of control Adapted from ‘Silicon Dreams’ by Robert W. Lucky

  16. Information Hierarchy Wisdom Knowledge Information Data

  17. Information Hierarchy • Data • The raw material of information • Information • Data organized and presented by someone • Knowledge • Information read, heard, or seen and understood • Wisdom • Distilled and integrated knowledge and understanding

  18. Information Where is the Life we have lost in living? Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? -- T.S. Eliot, “The Rock” Where is the information we have lost in data?

  19. Information Life Cycle Creation Active Authoring Modifying Using Creating Organizing Indexing Retention/ Mining Accessing Filtering Storing Retrieval Semi-Active Discard Distribution Networking Utilization Disposition Searching Inactive

  20. Authoring/Modifying • Converting data+information+knowledge to new information • Creating information from observation, thought • Editing and publication • Gatekeeping

  21. Organizing/Indexing • Collecting and integrating information • Affects data, information, and metadata • “Metadata” describes data and information • More on this later • Organizing information • Types of organization? • Indexing

  22. Storing/Retrieving • Information storage • How and where is information stored? • Retrieving information • How is information recovered from storage? • How do we find needed information? • Linked with accessing/filtering stage

  23. Distribution/Networking • Transmission of information • How is information transmitted? • Networks vs. broadcast

  24. Accessing/Filtering • Using the organization created in the O/I stage to: • Select desired (or relevant) information • Locate that information • Retrieve the information from its storage location (often via a network)

  25. Using/Creating • Using information • Transformation of information to knowledge • Knowledge to new data and new information

  26. Key Issues in This Course • How to describe information resources in ways so that they may be effectively used by those who need to use them • Organizing • How to find the appropriate information resources for someone’s (or your own) needs • Retrieving

  27. Key Issues Creation Active Authoring Modifying Using Creating Organizing Indexing Retention/ Mining Accessing Filtering Storing Retrieval Semi-Active Discard Distribution Networking Utilization Disposition Searching Inactive

  28. (Approximate) Course Schedule • Organization • Overview • Categorization • Knowledge Representation • Metadata Introduction • Controlled Vocabularies Introduction • Thesaurus Design and Construction • Multimedia Information Organization and Retrieval • Metadata for Media • Database Design • XML • Retrieval • Introduction to Search Process • Boolean Queries and Text Processing • Statistical Properties of Text and Vector Representation • Probabilistic Ranking and Relevance Feedback • Evaluation • Web Search Issues and Architecture • Interfaces for Information Retrieval

  29. Web Search Questions • What do people search for? • How do people use search engines? • How often do people find what they are looking for? • How difficult is it for people to find what they are looking for? • How can search engines be improved?

  30. What Do People Search for on the Web? • Study by Spink et al., Oct 98 • www.shef.ac.uk/~is/publications/infres/paper53.html • Survey on Excite, 13 questions • Data for 316 surveys

  31. What Do People Search for on the Web? • Topics • Genealogy/Public Figure: 12% • Computer related: 12% • Business: 12% • Entertainment: 8% • Medical: 8% • Politics & Government 7% • News 7% • Hobbies 6% • General info/surfing 6% • Science 6% • Travel 5% • Arts/education/shopping/images 14% • Something is missing…

  32. 4660 sex 3129 yahoo 2191 internal site admin check from kho 1520 chat 1498 porn 1315 horoscopes 1284 pokemon 1283 SiteScope test 1223 hotmail 1163 games 1151 mp3 1140 weather 1127 www.yahoo.com 1110 maps 1036 yahoo.com 983 ebay 980 recipes What Do People Search for on the Web? 50,000 queries from excite 1997 Most frequent terms:

  33. Why Do These Differ? • Self-reporting survey • The nature of language • Only a few ways to say certain things • Many different ways to express most concepts • UFO, flying saucer, space ship, satellite • How many ways are there to talk about history?

  34. 65002930 the 62789720 a 60857930 to 57248022 of 54078359 and 52928506 in 50686940 s 49986064 for 45999001 on 42205245 this 41203451 is 39779377 by 35439894 with 35284151 or 34446866 at 33528897 all 31583607 are 30998255 from What is on the Web? • 30755410 e • 30080013 you • 29669506 be • 29417504 that • 28542378 not • 28162417 an • 28110383 as • 28076530 home • 27650474 it • 27572533 i • 24548796 have • 24420453 if • 24376758 new • 24171603 t • 23951805 your • 23875218 page • 22292805 about • 22265579 com • 22107392 information Source: http://elib.cs.berkeley.edu/docfreq/index.html

  35. 3351 bearfacts 3349 telebears 1909 extension 1874 schedule+of+classes 1780 bearlink 1737 bear+facts 1468 decal 1443 infobears 1227 calendar 989 career+center 974 campus+map 920 academic+calendar 840 map 773 bookstore 741 class+pass 738 housing 721 tele-bears 716 directory 667 schedule 627 recipes 602 transcripts 582 tuition 577 seti 563 registrar 550 info+bears 543 class+schedule 470 financial+aid Intranet Queries (Aug 2000)

  36. Intranet Queries • Summary of sample data from 3 weeks of UCB queries • 13.2% Telebears/BearFacts/InfoBears/BearLink (12297) • 6.7% Schedule of classes or final exams (6222) • 5.4% Summer Session (5041) • 3.2% Extension (2932) • 3.1% Academic Calendar (2846) • 2.4% Directories (2202) • 1.7% Career Center (1588) • 1.7% Housing (1583) • 1.5% Map (1393) • Average query length over last 4 months: 1.8 words • This suggests what is difficult to find from the home page

  37. IR Issues in the Course • What metadata is collected • How the indexes are created • How queries are formed • How documents are ranked • How shortest paths are computed • How the system is built • … among other things! • This is just an introduction! Much more on these issues in the second half of the course

  38. Course Format • Most classes will be lecture/discussion sessions • Lecture ~60 minutes • Discussion ~20 minutes • For each class students will prepare discussion questions for each reading and help lead discussion • Some classes will be working sessions • Information Organization Summary and Phone Project Update • Phone Project Presentations • Final Review • Some classes will be exams • In Class Midterm Exam • Final Exam

  39. IS202 Course Project

  40. Phone Project Goals • Experience the actual process of information organization and retrieval • Especially as regards mobile media metadata creation, sharing, and (re)use • Work in small, focused teams performing a variety of tasks • Image capture, cataloging, and application design • Explore and design new applications for an emerging information organization and retrieval platform • Develop an ongoing resource for SIMS (an annotated photo database) for • Internal research and teaching • External promotional and informational purposes

  41. Phone Project Requirements • Create engaging and useful application scenarios and photos • Create a shared, reusable resource of annotated photos • All photos will be stored in one directory • Design your metadata • So that all photos would be accessible from all applications • Not only for the needs of your particular application, but also for the reusability of your photos and metadata

  42. Assignments and Exams • Approximately 12 assignments • Most due within one week to ten days • Many related to the Phone Project • Sometimes “checked”, sometimes graded • Final exam (during finals week) • Grading • Assignments: 60% • Not evenly weighted • Final: 25% • Class Participation: 15%

  43. Today • Introductions • Course Overview • Administrivia

  44. Readings • Course reader • Will be available in about a week (will announce) • Textbooks • Modern Information Retrieval, Baeza-Yates and Ribiero-Neto (Eds.), Addison Wesley, 1999 • The Organization of Information, Arlene G. Taylor, Libraries Unlimited, 1999,

  45. Homework (!) • Read the handouts • Borges, Dennett, and Reddy • Write one or two paragraphs on • What is information, according to your background or area of expertise? • Due in class this Thursday, Aug 29

  46. Next Time • More information about the Phone Project • More on what is information? • And how much of it is out there?

More Related