280 likes | 420 Vues
This presentation outlines a novel approach for efficiently selecting the top-k advertisements based on Machine Learning scores in the context of user queries. By leveraging an inverted index architecture, the method optimizes for strict latency constraints while dealing with vast amounts of data. The proposed solution includes binary classification and L2-regression techniques, enabling fast retrieval of relevant items. Experimentation with both synthetic simple and complex models proves the effectiveness of the approach, addressing challenges associated with high-dimensional data and maintaining performance under strict conditions.
E N D
Fast Top-k Retrieval for Model Based Recommendation Deepak Agarwal (Yahoo! Research) Maxim Gurevich (Google) Presented by Guang LING
Outline • Motivation • Problem definition • The approach • Binary classification • L2-regression of scores • Experiments • Conclusion
Motivation • Suppose that we • Are a search engine company (Google, say) • Want to display ads given a query • Have ML score for each ads given a query • Given a query • How to select the top-k ads to display • In a very short amount of time
Motivation Request User profile Pages News Ads • Challenges: • Many users/requests • Many content items • Strict latency constraints • Increasingly complex matching logic
Traditional IR solutions • Exploit content overlap matching function(tf-idf/cosine similarity) • Queries and documents “live” in the same high-dimensional space • Allows effectively reducing query result space • Highly optimized inverted index architecture • Joins inverted lists of query terms • Returns shortlist of result candidates • Few candidates undergo complex re-ranking
Inverted index architecture Inverted index architecture Bag of words representation of documents Now given a query “canon camera”
Inverted index architecture Inverted index architecture Bag of words representation of documents Now given a query “canon camera”
Index based pre-filtering Expensive (query, ad1) (ad1,score’1) (ad1,score1) ML model Top-2 (ada,scorea) (query, ad2) (ad2,score2) (ad2,score’2) query … … … (adb,scoreb) (query, adn) (adn,scoren) (adK,score’K) Inverted index ML model Top-2 (ada,scorea) query (adx,score’x) (adx,scorex) (ady,score’y)
Problem definition • Terminology: queries and documents • scr(d,q) – the (black-box) ML score of d on q • Goal: given q, find k items from D with highest scr(d,q) • Reduce to an inverted index query • Leverage extensive work on efficient inverted indexing • Challenges • How to construct the index • How to query it
Prior work • Learning to rank • A different problem:second-stage reranking of few documents retrieved by the first stage • We are building the first stage given the second stage • S. Goel, J. Langford, and A. Strehl. Predictive indexing for fast search [NIPS08] • A heuristic for building the index given an ML function and a query log • Fast and simple index building and retrieval • Not the standard dot product scoring • Does not support the standard docId sorted indices - harder to integrate into existing systems • Lower accuracy
The approach • Let ascr(d,q) be from the class of functions amenable to indexing: vector dot product • q = q is the original (sparse) query vector • d is not directly known • For each document: find d such that q’d scr(d,q) • Index d-s • Given q, query the index and retrieve top-K candidates according to ascr • Compute the true ML scores of candidates and return the top-k
Constructing the index: an optimization problem • Objective: find D={d1, d2,…, dn} minimizing score loss on a representative query load Q • Sparsification • d-s are high dimensional • Dense d-s will result in prohibitive index size • Add index size constraint:
Relaxing the problem • Do not know how to optimize directly • Relax the L0 index size constraint to L1 • Relax the objective function • Binary classification of being in top-k • L2 regression of ML scores
Binary classification • For each document d • Learn vector d that predicts whether d is among top-k on qQ • Predict by simple thresholding operator q’d > • Let y(q,d) be an indicator (-1,1) of whether d is among top-k on q • Efficiently solvable [Liblinear]
L2 regression of scores • For all pairs (q,d): minimize the discrepancy between true and approximate scores • Again, decomposable by documents • Efficiently solvable by a coordinate descent algorithm
Practical issues • Vectors d contain negative values • Less efficient retrieval • Independent solution for each document • Easy to parallelize • Easy to add new documents
Experiments • Experiment setup • Synthetic model – simple • 10K document, 10K terms (words), 12K queries • For each term, generate a random permutation of 10K documents, assign weight (1-1/100)^i to the term for document at position i • Queries are length 3 terms generated from power-law distribution • Final score are summed score of individual scores for each term
Experiments • Experiment setup • Synthetic model – complex • 10K document, 10K terms (words), 12K queries • For each term, generate a random permutation of 10K documents, assign weight (1-1/100)^i to the term for document at position i • Queries are length 5 terms generated from power-law distribution • In addition, each pair and triplet of terms are associated with a random permutation of documents and induced scores • Final score are summed score of individual scores for each term, pair and triplet
Experiments • Experiment setup • CTR model • Computational advertising dataset • Logistic regression model • 50K documents (ads), sampled 50K queries • Trained on a day’s live traffic
Experiments • Datasets • Two synthetic models: simple and complex • |D|=10K, |Q|=10K, 2K test queries • CTR model • |D|=50K, |Q|=50K, 50K test queries from a following day • Baselines • Random: k random documents • Static: fixed set of k documents with highest average scores • Predictive: Predictive indexing [Goel et al.]
Evaluation metrics • Recall: exact retrieval of true top-k • Overly conservative • Score loss: average loss in the score of retrieved docs • Captures application specific utility, e.g., CTR
Retrieval latency: CTR model • Disclaimer: prototype implementation • Brute-force (scoring all 50K ads): 4s per impression • Scoring top-100 candidates: 9ms • Top-100 retrieval • Baselines: ~0 (negligible) • Our approach: ~15ms
Index construction • ~1min per document (prototype implementation) • Trivially parallelizable • Easy to add new documents
Conclusions • A practical method for indexing black-box ML models • Integrates with existing indexing systems • Scales well to large itemsets • Tunable space-speed-accuracy tradeoff