CMUDIR group: TDT Supervised Tracking

CMUDIR group: TDT Supervised Tracking Yi Zhang (with Jamie Callan) Carnegie Mellon University School of Computer Science Language Technology Institute

Overview • Outsider’s experience • First participation in TDT • Supervised tracking task • Combine Rocchio and logistic regression models for bias variance trade off • Performance analysis • Work in progress • Conclusion

First Time in TDT • About our group: CMUDIR • Several years of research in TREC adaptive filtering task • First time in TDT • Supervised tracking is different from TREC adaptive filtering • Burtiness, topic definition… • Supervised tracking maybe similar to TREC adaptive filtering • Comparatively easy for us to enter • Much help from TDTers make our first participation possible

initialization document stream delivered docs Tracking System … Topic (Binary Classifier) (Utility Function) Logistic_Rocchio Learning Accumulated docs Document Labels Tracking System

Rocchio + threshold => wR wR • Step 1: Rocchio + threshold => wR • Step 2: • Step 3: Use wm as logistic regression prior mean • Step 4: Estimate posterior distribution of parameter, and use wMAP=w* Profile Learning: Using Bayesian Prior to Combine Classifiers (Zhang SIGIR 2004) Document space (N) Logistic Regression Parameter space (N+1)

Performance • Efficiency • 2-3 hours for TDT5 supervised tracking task for utility optimization • 1 CPU P4 2.4GHz, 512RAM • Effectiveness (CMU8) • Utility: 449.17 • Scaled utility: 0.7281

What does a human do? Our solution Unified Framework Use Expert Knowledge Bayesian Prior (SIGIR 04) Bayesian Graphical Models Bayesian Active Learning (ICML 03) Ask questions Use multiple forms of evidence Graphical Models (In Progress) More Work on Filtering (Supervised Tracking) Challenges: limited user supervision

Summary • Outsider’s view • Supervised tracking task is an easy entry into TDT for people already familiar with TREC adaptive filtering • TREC-style system can do well • Low effort with good result • TREC adaptive filtering task is very similar to TDT supervised tracking task • Used the TREC filtering task system for supervised tracking • Effort focused on understanding TDT data format and converting the data to what our system can handle • Very good performance for utility measure optimization • Other issues • Bias sampling problem • only have labels for documents delivered • Burst ness

CMUDIR group: TDT Supervised Tracking

CMUDIR group: TDT Supervised Tracking

Presentation Transcript

tdt 2002 straw man

Therapeutic Day Treatment (TDT)

TDT 69

Supervised Writing

CMU TEAM-A in TDT 2004 Topic Tracking

Unsupervised and Supervised Tracking

Thoughts about the TDT

Supervised Learning

RMIT at TDT 2003

UMass at TDT 2000

Introduction to TDT 4235

Tracking Group Overview

Tracking Group Testbeam Needs

LC Tracking Group: General Introduction

Discrete TDT calculation

TDT 4242