1 / 8

CMUDIR group: TDT Supervised Tracking

CMUDIR group: TDT Supervised Tracking. Yi Zhang (with Jamie Callan) Carnegie Mellon University School of Computer Science Language Technology Institute. Overview. Outsider’s experience First participation in TDT Supervised tracking task

calais
Télécharger la présentation

CMUDIR group: TDT Supervised Tracking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMUDIR group: TDT Supervised Tracking Yi Zhang (with Jamie Callan) Carnegie Mellon University School of Computer Science Language Technology Institute

  2. Overview • Outsider’s experience • First participation in TDT • Supervised tracking task • Combine Rocchio and logistic regression models for bias variance trade off • Performance analysis • Work in progress • Conclusion

  3. First Time in TDT • About our group: CMUDIR • Several years of research in TREC adaptive filtering task • First time in TDT • Supervised tracking is different from TREC adaptive filtering • Burtiness, topic definition… • Supervised tracking maybe similar to TREC adaptive filtering • Comparatively easy for us to enter • Much help from TDTers make our first participation possible

  4. initialization document stream delivered docs Tracking System … Topic (Binary Classifier) (Utility Function) Logistic_Rocchio Learning Accumulated docs Document Labels Tracking System

  5. Rocchio + threshold => wR wR • Step 1: Rocchio + threshold => wR • Step 2: • Step 3: Use wm as logistic regression prior mean • Step 4: Estimate posterior distribution of parameter, and use wMAP=w* Profile Learning: Using Bayesian Prior to Combine Classifiers (Zhang SIGIR 2004) Document space (N) Logistic Regression Parameter space (N+1)

  6. Performance • Efficiency • 2-3 hours for TDT5 supervised tracking task for utility optimization • 1 CPU P4 2.4GHz, 512RAM • Effectiveness (CMU8) • Utility: 449.17 • Scaled utility: 0.7281

  7. What does a human do? Our solution Unified Framework Use Expert Knowledge Bayesian Prior (SIGIR 04) Bayesian Graphical Models Bayesian Active Learning (ICML 03) Ask questions Use multiple forms of evidence Graphical Models (In Progress) More Work on Filtering (Supervised Tracking) Challenges: limited user supervision

  8. Summary • Outsider’s view • Supervised tracking task is an easy entry into TDT for people already familiar with TREC adaptive filtering • TREC-style system can do well • Low effort with good result • TREC adaptive filtering task is very similar to TDT supervised tracking task • Used the TREC filtering task system for supervised tracking • Effort focused on understanding TDT data format and converting the data to what our system can handle • Very good performance for utility measure optimization • Other issues • Bias sampling problem • only have labels for documents delivered • Burst ness

More Related