1 / 29

Course Overview: An Introduction to Information Retrieval and Applications

Course Overview: An Introduction to Information Retrieval and Applications. J. H. Wang Feb. 23, 2011. Instructor & TA. Instructor J. H. Wang ( 王正豪 ) Assistant Professor, CSIE, NTUT Office: R1534, Technology Building E-mail: jhwang@csie.ntut.edu.tw Tel: ext. 4238

lamar-mckee
Télécharger la présentation

Course Overview: An Introduction to Information Retrieval and Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 23, 2011

  2. Instructor & TA • Instructor • J. H. Wang (王正豪) • Assistant Professor, CSIE, NTUT • Office: R1534, Technology Building • E-mail: jhwang@csie.ntut.edu.tw • Tel: ext. 4238 • Office Hour: 10:00-12:00 am, every Wednesday and Thursday • TA • Mr. Lin (林承翰): 2011.ir.ta@gmail.com • R1424, Technology Building NTUT CSIE

  3. Course Description • Course Web Page • http://www.ntut.edu.tw/~jhwang/IR/ • Time: 13:10-16:00pm, Wed. • Classroom: R327, 6th Teaching Building • Textbook: • Christopher D. Manning, Prabhakar Raghavan and Hinrich Schuetze, Introduction to Information Retrieval, Cambridge University Press, 2008. • Available online • International Student Edition, imported by Kai-Fa (開發) Publishing • Prerequisites: • Basic knowledge of data structures and algorithms, linear algebra, and probability theory • Programming experience is necessary for projects NTUT CSIE

  4. Additional References • References: • Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval: The Concepts and Technology behind Search, Addison-Wesley, 2011. • This is the second edition of their book Modern Information Retrieval in 1999. (華通) • Stefan Buettcher, Charles L.A. Clarke, and Gordon V. Cormack, Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010. • Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, Addison-Wesley, 2010. (全華) NTUT CSIE

  5. More Books on IR • Gerald Salton, Automatic information organization and retrieval, McGraw-Hill, 1968. • Gerald Salton and M.J. McGill, Introduction to modern information retrieval, McGraw-Hill, 1983. • Two classics, but out-of-print. • C. J. van Rijsbergen, Information Retrieval, Butterworths, 1979. • The classic. More than 40 years old, but still worth reading. • K. Sparck Jones, P. Willett, Readings in Information Retrieval, Morgan Kaufmann, 1997. • A collection of classical IR papers. (out of print) • I.H. Witten, A. Moffat, T.C. Bell. Morgan Kaufmann, Managing Gigabytes, 1999. • The authority on index construction and compression. NTUT CSIE

  6. Grading Policy • Homework assignments and programming exercises: 40% • Mid-term exam: 25% • Term project (including the proposal): 35% NTUT CSIE

  7. Programming Exercises and Term Project • At least two programming exercises • Team-based (at most 4 persons per team) • You can either write your own code or reuse existing open source code • Topics: (to be announced…) • The term project • Either team-based system development (the same as programming exercises) • Or academic paper presentation • But, you should do it on your own (only 1 person), NOT team-based • A proposal is required around midterm (Apr. 2011) • Introduction, methods, experiment designs NTUT CSIE

  8. Online Submission • Submission instructions • Programs, project proposals, and project reports in electronic files must be submitted to the TA online at: • http://140.124.183.39/ir/ • Before submission: • User name: Your student ID • Please change your default password at your first login NTUT CSIE

  9. What this Course is NOT about • This course will NOT tell you • The tips and tricks when using search engines, although power users might have better ideas on how to improve them • There’re plenty of books and websites on that… • How to find books in libraries, although it’s somewhat related to the basic concepts of IR • How to make money on the Web, although the currently largest search engine did it NTUT CSIE

  10. What’s Information Retrieval NTUT CSIE

  11. On Wikipedia NTUT CSIE

  12. On GeoNet NTUT CSIE

  13. On Google Maps NTUT CSIE

  14. On Google News NTUT CSIE

  15. On Blogs NTUT CSIE

  16. Or More Related Keywords • South Island • Christchurch • Canterbury • Christchurch Cathedral • … NTUT CSIE

  17. What if We Search in Chinese NTUT CSIE

  18. And More… • 南島 • 第二大城 • 基督城 • 大教堂 • … • And other languages… NTUT CSIE

  19. What Is Information Retrieval? • “Information retrieval is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.” (Salton, 1968) NTUT CSIE

  20. Goal • Information retrieval (IR): a research field that targets at effectively and efficiently searching information in text and multimedia documents • In this course, we will introduce the basic text and query models in IR, retrieval evaluation, indexing and searching, and applications for IR NTUT CSIE

  21. A Big Picture NTUT CSIE

  22. User Interface user need Text Text Operations Doc representation logical view Query Expansion Indexing user feedback inverted file query Inverted Index Retrieval Document Collection retrieved docs Ranking ranked docs NTUT CSIE

  23. Topics • Text IR • Indexing and Searching • Query Languages and Operations • Retrieval Evaluation • Modeling • Boolean model • Vector space model • Probabilistic model • Applications for IR • Multimedia IR • Web Search • Digital Libraries NTUT CSIE

  24. Organization of the Textbook • Basics in IR (focus) • Inverted indexes for boolean queries (Ch.1-5) • Term weighting and vector space model (Ch. 6-7) • Evaluation in IR (Ch. 8) • Advanced Topics • Relevance feedback (Ch. 9) • XML retrieval (Ch. 10) • Probabilistic IR (Ch. 11) • Language models (Ch. 12) • Machine learning in IR • Text classification (Ch. 13-15) • Document clustering (Ch. 16-18) • Web Search • Web crawling and indexes (Ch. 19-20) • Link analysis (Ch. 21) NTUT CSIE

  25. Pointers to Other Topics • Cross-language IR • Image, video, and multimedia IR • Speech retrieval • Music retrieval • User interfaces • Parallel, distributed, and P2P IR • Digital libraries • Information science perspective • Logic-based approaches to IR • Natural language processing techniques NTUT CSIE

  26. Tentative Schedule • Before midterm • Boolean retrieval (1 wk) • Indexing (2 wks) • Vector space model and evaluation (2 wk) • Relevance feedback (1 wk) • Probabilistic IR (2 wk) • After midterm • Text classification (1 wk) • Document clustering (1 wk) • Web search (2 wks) • Advanced topics: CLIR, IE, … (2 wks) • Term Project Presentation (3 wks) NTUT CSIE

  27. Generic Resources • Wikipedia page on Information Retrieval: http://en.wikipedia.org/wiki/Information_retrieval • Information Retrieval Resources: http://www-csli.stanford.edu/~hinrich/information-retrieval.html NTUT CSIE

  28. Academic Resources • Journals • ACM TOIS: Transactions on Information Systems • JASIST: Journal of the American Society of Information Sciences • IP&M: Information Processing and Management • Conferences • ACM SIGIR: International Conference on Information Retrieval • ACM CIKM: Conference on Information Knowledge and Management • JCDL: ACM/IEEE Joint Conference on Digital Libraries • TREC: Text Retrieval Conference NTUT CSIE

  29. Thanks for Your Attention! NTUT CSIE

More Related