1 / 12

Web IR/NLP Group (WING) @ NUS

Web IR/NLP Group (WING) @ NUS . Min-Yen Kan School of Computing National University of Singapore http://wing.comp.nus.edu.sg/. Support staff (undergraduate) System administrators System programmers Undergraduate Projects 4 this year (ask me about topics).

ricky
Télécharger la présentation

Web IR/NLP Group (WING) @ NUS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web IR/NLP Group (WING) @ NUS Min-Yen Kan School of Computing National University of Singapore http://wing.comp.nus.edu.sg/

  2. Support staff (undergraduate) System administrators System programmers Undergraduate Projects 4 this year (ask me about topics) PI: Min-Yen KAN (NLP and IR/DL) Postdoc: Su Nam KIM (Multiword Expressions) PhDs: Hendra SETIAWAN (Stat MT) Long QIU (Scenario Templates) Yee Fan TAN (Web Record Linkage) Jin ZHAO (Math IR) Jesse PRABAWA (UI/HCI for DLs) Ziheng LIN (Summarization) Web IR/NLP Group @ NUS One of many groups doing these type of research at NUS Will go over NLP then DL for today MSRA Web-Scale NLP Worshop (Daedeok, Korea)

  3. Information Extraction • Keyphase Extraction • Idea: Use section information as evidence (ICADL 07) • Scenario Template Generation (Long Qiu) • Aim: to generate database rows from similar news events Charley landedfurther south on the Gulf Coast than predicted, … The hurricane … was weakenedand is movingover South Carolina At least 21 missing after the storm hit… But Tokage had weakenedby the time it passed over Tokyo, where it had left little damage before movingout to sea. • Model context and cluster to convergence using EM (EMNLP 06) MSRA Web-Scale NLP Worshop (Daedeok, Korea)

  4. Using less data • URL Classification (WWW 04) http://www.usatoday.com/stories/080502/ent/hilton.html http://www.cancersupportgroup.org/forum/230.html • Classifies 1000’s of URLs per minute, with 2/3rds of full text accuracy • Useful for focused crawling, web mining applications MSRA Web-Scale NLP Worshop (Daedeok, Korea)

  5. Question-Answering (Hang Cui) • Our Approaches to QA • Use of external resources from Web & WordNet (SIGIR04) • Employ dependency & SRL for answer extraction (SIGIR05, 06) • Soft pattern analysis of definitional patterns (WWW 05) • Explore temporal relationships and events • Extend techniques to precise passage retrieval • Came2nd (in 2003, 2004 & 2005) in TREC QA Task • Licensed technology to company in legal search • Current focus • Relation-based IE & QA – continue focus on linguistic knowledge • Ontology-based Interactive QA – leverage on domain knowledge • Searching for answers and mining terminology from the Web MSRA Web-Scale NLP Worshop (Daedeok, Korea)

  6. doc1 doc2 doc3 doc1 doc2 doc3 s1 doc1 doc2 doc3 s1 s2 s1 s2 s3 Summarization (Ziheng Lin) • Document Concept Lattice Model (IPM 07) • Aim to find list of sentences that result in minimal info lost • Extract key concept terms, and build concept lattice • Perform sentence extraction that covers max concept terms • Participated in DUC, came in 1st (2005) and 2nd (2006) • Pioneered iterative construction model for graph-based summarization (DUC 07) MSRA Web-Scale NLP Worshop (Daedeok, Korea)

  7. 上网页 on a page 数据输域的上网页 data entry fields on a page 集合的数据输域的上网页 of a coll. data entry fields on a page 表单是集合的数据输域的上网页 Statistical Machine Translation (Hendra Setiawan) a form is a page on data entry fields of a coll. 表单是网页上的数据输域的集合 Function Word Based Reordering (ACL 07) a page is a coll. of data entry fields on a page MSRA Web-Scale NLP Worshop (Daedeok, Korea)

  8. Commercial record linkage (Yee Fan Tan) • Addresses • Dongwon Lee, 110 E. Foster Ave. #410, State College, PA, 16802 • LEE Dong, 110 East Foster Avenue Apartment 410, Univ. Park, PA 16802-2343 • Products • Honda Fix vs. Honda Jazz • Apple iPod Nano 4GB vs. 4GB iPod nano 4GB • Idea: use web as additional context for disambiguation and clustering (JCDL 06, WIDM 07) • Placed 3rdin Web People Search Task (WEPS 2007) MSRA Web-Scale NLP Worshop (Daedeok, Korea)

  9. Multi(ple) Extensions • Multimodal Alignment • Lyrics with Audio (ACM MM 04) • Slides with Paper(JCDL 07) • Current and future work: • Extracted Terminology with User Tagging Text in Focus Slide in Focus MSRA Web-Scale NLP Worshop (Daedeok, Korea)

  10. Focusing on the User Understanding user searches better • Known item search (JCDL 2005) • Faceted classification of web queries (WebQ 2007) • Building better user interfaces (Jesse Prabawa) • Revisiting library catalog interfaces to better support searching(JCDL 2007) MSRA Web-Scale NLP Worshop (Daedeok, Korea)

  11. Putting it all together We’re building a niche academic research repository • e.g., MS Libra, CiteSeer, DBLP, Google Scholar What? Another one? What’s the catch? • The user interaction and community involvement is central • Overcome faults of imperfect machine learning • Platform for researching how web-scale NLP actively involves user feedback and mechanisms for channeling this What about Web NLP / IR? • My group emphasizes practical outcomes and deliverables • Find research within industry and practical problems • Multilingual, multimedia, web-as-data angles likely to continue MSRA Web-Scale NLP Worshop (Daedeok, Korea)

  12. Other pointers (NUS-wide) • Text Processing Seminar (with archived slides) http://wing.comp.nus.edu.sg/chimetext • Machine Learning (Graphical Models) Reading Group http://groups.google.com/group/mlnus/ • NLP Reading Group http://wing.comp.nus.edu.sg/NLPReading/index.php/Main_Page <AD> Shameless plug for my group: http://wing.comp.nus.edu.sg </AD> Thanks for listening! MSRA Web-Scale NLP Worshop (Daedeok, Korea)

More Related