80 likes | 201 Vues
This report presents an overview of CMU's inaugural participation in the TDT supervised tracking task, highlighting the methodologies adopted and performance outcomes. We explore the integration of Rocchio and logistic regression models to achieve an effective bias-variance trade-off, emphasizing the differences between supervised tracking and TREC adaptive filtering. Our findings showcase a notable performance, with insights into efficiency and utility optimization, as well as challenges faced in limited user supervision. This work-in-progress aims to refine our approach leveraging expert knowledge and Bayesian graphical models for enhanced performance.
E N D
CMUDIR group: TDT Supervised Tracking Yi Zhang (with Jamie Callan) Carnegie Mellon University School of Computer Science Language Technology Institute
Overview • Outsider’s experience • First participation in TDT • Supervised tracking task • Combine Rocchio and logistic regression models for bias variance trade off • Performance analysis • Work in progress • Conclusion
First Time in TDT • About our group: CMUDIR • Several years of research in TREC adaptive filtering task • First time in TDT • Supervised tracking is different from TREC adaptive filtering • Burtiness, topic definition… • Supervised tracking maybe similar to TREC adaptive filtering • Comparatively easy for us to enter • Much help from TDTers make our first participation possible
initialization document stream delivered docs Tracking System … Topic (Binary Classifier) (Utility Function) Logistic_Rocchio Learning Accumulated docs Document Labels Tracking System
Rocchio + threshold => wR wR • Step 1: Rocchio + threshold => wR • Step 2: • Step 3: Use wm as logistic regression prior mean • Step 4: Estimate posterior distribution of parameter, and use wMAP=w* Profile Learning: Using Bayesian Prior to Combine Classifiers (Zhang SIGIR 2004) Document space (N) Logistic Regression Parameter space (N+1)
Performance • Efficiency • 2-3 hours for TDT5 supervised tracking task for utility optimization • 1 CPU P4 2.4GHz, 512RAM • Effectiveness (CMU8) • Utility: 449.17 • Scaled utility: 0.7281
What does a human do? Our solution Unified Framework Use Expert Knowledge Bayesian Prior (SIGIR 04) Bayesian Graphical Models Bayesian Active Learning (ICML 03) Ask questions Use multiple forms of evidence Graphical Models (In Progress) More Work on Filtering (Supervised Tracking) Challenges: limited user supervision
Summary • Outsider’s view • Supervised tracking task is an easy entry into TDT for people already familiar with TREC adaptive filtering • TREC-style system can do well • Low effort with good result • TREC adaptive filtering task is very similar to TDT supervised tracking task • Used the TREC filtering task system for supervised tracking • Effort focused on understanding TDT data format and converting the data to what our system can handle • Very good performance for utility measure optimization • Other issues • Bias sampling problem • only have labels for documents delivered • Burst ness