260 likes | 387 Vues
This presentation discusses a model-based approach to the content match problem in computational advertising, inspired by the “Bandits for Taxonomies” paper. The focus is on maximizing ad clicks through effective matching of advertisements to webpages while minimizing waste in learning the best options. By employing multi-level bandit policies that leverage structured information about ad classes, we enhance click-through rates (CTR) estimation and improve ad allocation strategies. Experimental results indicate significant performance gains in click volume and reduced mean squared error.
E N D
Computational advertising Kira Radinsky Slides based on material from the paper “Bandits for Taxonomies: A Model-based Approach” by Sandeep Pandey, Deepak Agarwal, DeepayanChakrabarti, VanjaJosifovski, in SDM 200
The Content Match Problem Ads Ads DB Advertisers Ad Impression: Showing an add to a user
The Content Match Problem Ads Ads DB Advertisers (Click) Ad click: user click leads to revenue for ad server and content provider
The Content Match Problem Ads Ads DB Advertisers (Click) The Content Match Problem: Match ads to pages to maximize clicks
The Content Match Problem Ads Ads DB Advertisers (Click) • Maximizing the number of clicks means: • For each webpage, find the ad with the bestClick-Through Rate (CTR) • But, without wasting too many impressions in learning this.
Background: Bandits Bandit “arms” (Unknown payoff probabilities) • Pull arms sequentially so as to maximize the total expected reward • Estimate payoff probabilities • Bias the estimation process towards ‘better’ arms.
Background: Bandits Solutions • Try 1: Greedy solution: • Compute the sample mean of an arm ‘A’ by dividing the total reward received from the arm by the number of times the arm has been pulled. • At each time step – choose the arm with the highest sample mean. • Try 2: Naïve solution: • Pull each arm an equal number of times • Epsilon-greedy strategy: • The best bandit is selected for a propotion of of the trials. • Another bandit is randomly selected (with uniform probability) for a proportion of
Background: Bandits Bandit “arms”are ads Webpage1 pages Webpage2 Webpage3 ads
Background: Bandits Ads One instance of the MAB problem Unknown CTR Webpages • Content Match = A matrix • Each row is a bandit • Each cell has an unknown CTR
Background: Bandits Priority1 Priority2 Priority3 Bandit Policy: Assign Priority to each arm “Pull” arm with max priority and observe reward Update priorities Allocation Estimation
Background: Bandits • Why not simply apply a bandit policy directly to the problem? • Converges too slowly with instances of MAB and each bandit with arms per instance • Additional structure is available, we wish to use it.
Multi-level Policy Ads classes Webpages classes Consider only two levels.
Multi-level Policy Ad parent classes Apparel Computers Travel Ad child classes Apparel Computers Block One MAB problem instance Travel Idea: CTRs in a block are homogeneous
Multi-level Policy • CTR in a block are homogeneous • Used in allocation (picking ad for each new page) • Used in estimation(updating priorities after each observation)
Multi-level Policy - Allocation A C T ? Page classifier A C T • Classify webpage page class, parent page class • Run bandit on ad parent classes pick one ad parent class • The two above steps results in a block
Multi-level Policy - Allocation A C T ? Page classifier A C T • Classify webpage page class, parent page class • Run bandit on ad parent classes pick one ad parent class • The two above steps results in a block • Run bandit among cells pick one ad class • (In general, continue from root to leaf final ad)
Multi-level Policy - Allocation A C T ? Page classifier A C T • Bandits at higher levels: • Use aggregated information • Have fewer bandit arms • Quickly figure out the best ad parent class
Multi-level Policy • CTR in a block are homogeneous • Used in allocation (picking ad for each new page) • Used in estimation(updating priorities after each observation)
Multi-level Policy - Estimation • CTR in a block are homogeneous • Observations from one cell also give information about others in the block. • How can we model this dependence? A C T A C T
Multi-level Policy - Estimation Shrinkage Model A C T A #impressions in cell #clicks in cell C T All cells in a block come from the same distribution
Multi-level Policy - Estimation • Intuitively, this leads to shrinkage of cell CTRs towards block CTRs A C T A C T Beta prior (“block CTR”) Observed CTR Estimated CTR
Experiments (S. Panday et al. 2007) Root Depth 0 20 nodes Depth 1 Use this 2 levels 221 nodes Depth 2 ~7000 nodes Depth 7 Taxonomy Structure
Experiments (S. Panday et al. 2007) • Data collected over a 1 day period • Collected from only one server, under some other ad-matching rules (not out bandit). • ~229M impressions • CTR values have been linearly transformed for purpose of confidentiality
Experiments (S. Panday et al. 2007) Clicks Number of pulls Multi-level gives much higher #clicks!
Experiments (S. Panday et al. 2007) Mean-squared Error Number of pulls Multi-level gives much better MSE – it learnt more from its explorations.
Conclusions • When having a CTR guided system, exploration is a key component. • Short term penalty for the exploration needs to be limited (exploration budge) • Most exploration mechanisms use a weighted combination of the predicted CTR rate (average) and the CTR uncertainty (variance) • Exploration in a reduced dimensional space: class hirerchy • Top down traversal of the hirerchy to determine the class of the ad to show