Fine-tuning Ranking Models:

Fine-tuning Ranking Models: Vitor Jan 29, 2008 Text Learning Meeting - CMU a two-step optimization approach With invaluable ideas from ….

Motivation • Rank, Rank, Rank… • Web retrieval, movie recommendation, NFL draft, etc. • Einat’s contextual search • Richard’s set expansion (SEAL) • Andy’s context sensitive spelling correction algorithm • Selecting seeds in Frank’s political blog classification algorithm • Ramnath’s thunderbird extension for • Email Leak prediction • Email Recipient suggestion

Help your brothers! • Try Cut Once!, our Thunderbird extension • Works well with Gmail accounts • It’s working reasonably well • We need feedback.

Thunderbird plug-in Leak warnings: hit x to remove recipient Suggestions: hit + to add Pause or cancel send of message Email Recipient Recommendation Timer: msg is sent after 10sec by default Classifier/rankers written in JavaScript

Email Recipient Recommendation 36 Enron users

Email Recipient Recommendation Threaded [Carvalho & Cohen, ECIR-08]

Aggregating Rankings [Aslam & Montague, 2001]; [Ogilvie & Callan, 2003]; [Macdonald & Ounis, 2006] • Many “Data Fusion” methods • 2 types: • Normalized scores: CombSUM, CombMNZ, etc. • Unnormalized scores: BordaCount, Reciprocal Rank Sum, etc. • Reciprocal Rank: • The sum of the inverse of the rank of document in each ranking.

Aggregated Ranking Results [Carvalho & Cohen, ECIR-08]

Intelligent Email Auto-completion TOCCBCC CCBCC

Can we do better? • Not using other features, but better ranking methods • Machine learning to improve ranking: Learning to rank: • Many (recent) methods: • ListNet, Perceptrons, RankSvm, RankBoost, AdaRank, Genetic Programming, Ordinal Regression, etc. • Mostly supervised • Generally small training sets • Workshop in SIGIR-07 (Einat was in the PC)

Pairwise-based Ranking Goal: induce a ranking function f(d) s.t. Rank q d1 d2 d3 d4 d5 d6 ... dT We assume a linear function f Therefore, constraints are:

Ranking with Perceptrons • Nice convergence properties and mistake bounds • bound on the number of mistakes/misranks • Fast and scalable • Many variants[Collins 2002, Gao et al 2005, Elsas et al 2008] • Voting, averaging, committee, pocket, etc. • General update rule: • Here: Averaged version of perceptron

Rank SVM [Joachims, KDD-02], [Herbrich et al, 2000] • Equivalent to maximing AUC Equivalent to:

Loss Function

Loss Functions • SVMrank • SigmoidRank Not convex

Fine-tuning Ranking Models Base ranking model Final model Base Ranker Sigmoid Rank e.g., RankSVM, Perceptron, etc. Non-convex: Minimizing a very close approximation for the number of misranks

Gradient Descent

Results in CC prediction 36 Enron users

Set Expansion (SEAL) Results [Wang & Cohen, ICDM-2007] [Listnet: Cao et al. , ICML-07]

Results in Letor

Learning Curve TOCCBCC Enron: user lokay-m

Learning Curve CCBCC Enron: user campbel-m

Regularization Parameter s=2 TREC3 TREC4 Ohsumed

Some Ideas • Instead of number of misranks, optimize other loss functions: • Mean Average Precision, MRR, etc. • Rank Term: • Some preliminary results with Sigmoid-MAP • Does it work for classification?

Thanks

Fine-tuning Ranking Models: