1 / 88

Online Advertising Open lecture at Warsaw University February 25/26, 2011

Please interrupt me at any point!. Online Advertising Open lecture at Warsaw University February 25/26, 2011. Ingmar Weber Yahoo! Research Barcelona ingmar@yahoo-inc.com. Disclaimers & Acknowledgments.

alisa
Télécharger la présentation

Online Advertising Open lecture at Warsaw University February 25/26, 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Please interrupt me at any point! Online AdvertisingOpen lecture at Warsaw UniversityFebruary 25/26, 2011 Ingmar Weber Yahoo! Research Barcelona ingmar@yahoo-inc.com

  2. Disclaimers & Acknowledgments • This talk presents the opinions of the author. It does not necessarily reflect the views of Yahoo! Inc. or any other entity. • Algorithms, techniques, features, etc. mentioned here might or might not be in use by Yahoo! or any other company. • Many of the slides in this lecture are based on tables/graphs from the referenced papers. Please see the actual papers for more details.

  3. Review from last lecture • Lots of money • Ads essentially pay for the WWW • Mostly sponsored search and display ads • Sp. search: sold using variants of GSP • Disp. ads: sold in GD contracts or on the spot • Many computational challenges • Finding relevant ads, predicting CTRs, new/tail content and queries, detecting fraud, …

  4. Plan for today and tomorrow • So far • Mostly introductory, “text book material” • Now • Mostly recent research papers • Crash course in machine learning, information retrieval, economics, … Hopefully more “think-along” (not sing-along) and not “shut-up-and-listen”

  5. But first … • Third party cookies www.bluekai.com (many others …)

  6. Efficient Online Ad Serving in a Display Advertising Exchange Keving Lang, Joaquin Delgado, Dongming Jiang, et al. WSDM’11

  7. Not so simple landscape for D A Advertisers “Buy shoes at nike.com” “Visit asics.com today” “Rolex is great.” Publishers A running blog The legend of Cliff Young Celebrity gossip Users 32m, likes running 50f, loves watches 16m, likes sports Basic problem: Given a (user, publisher) pair, find a good ad(vertiser)

  8. Ad networks and Exchanges • Ad networks • Bring together supply (publishers) and demand (advertisers) • Have bilateral agreements via revenue sharing to increase market fluidity • Exchanges • Do the actual real-time allocation • Implement the bilateral agreements

  9. Middle-aged, middle-income New Yorker visits the web site of Cigar Magazine (P1) D only known at end. User constraints: no alcohol ads to minors Supply constraints: conservative network doesn’t want left publishers Demand constraints: Premium blogs don’t want spammy ads

  10. Valid Paths & Objective Function

  11. Depth-first search enumeration Algorithm A Worst case running time? Typical running time?

  12. Algorithm B US pruning Worst case running time? Sum vs. product? Optimizations? Upper bound Why? D pruning

  13. Reusable Precomputation Cannot fully enforce D Depends on reachable sink … … which depends on U What if space limitations? How would you prioritize?

  14. Experiments – artificial data

  15. Experiments – real data

  16. Competing for Users’ Attention: On the Interplay between Organic and Sponsored Search Results Christian Danescu-Niculescu-Mizil, Andrei Broder, et al. WWW’10 What would you investigate? What would you suspect?

  17. Things to look at • General bias for near-identical things • Ads are preferred (as further “North”) • Organic results are preferred • Interplay between ad CTR and result CTR • Better search results, less ad clicks? • Mutually reinforcing? • Dependence on type • Navigational query vs. informational query • Responsive ad vs. incidental ad

  18. Data • One month of traffic for subset of Y! search servers • Only North ads, served at least 50 times • For each query qi most clicked ad Ai* and most clicked organic result Oi* • 63,789 (qi, Oi*, Ai*) triples • Bias?

  19. (Non-)Commercial bias? • Look at A* and O* with identical domain • Probably similar quality … • … but (North) ad is higher • What do you think? • In 52% ctrO > ctrA

  20. Correlation av. ctrA av. ctrO ctrA ctrO For given (range of) ctrO bucket all ads.

  21. Navigational vs. non-navigational av. ctrA av. ctrO ctrO ctrA Navigational: antagonistic effect Non-navigational: (mild) reinforcement

  22. Dependence on similarity Bag of words for title terms (“Free Radio”, “Pandora Radio – Listen to Free Internet Radio, Find New Music”) = 2/9

  23. Dependence on similarity av. ctrA av. ctrA

  24. A simple model Want to model Also need:

  25. A simple model Explains basic (quadratic) shape of overlap vs. ad click-through-rate

  26. Improving Ad Relevance in Sponsored Search Dustin Hillard, Stefan Schroedl, Eren Manavoglu, et al. WSDM’10

  27. Ad relevance  Ad attractiveness • Relevance • How related is the ad to the search query • q=“cocacola”, ad=“Buy Coke Online” • Attractiveness • Essentially click-through rate • q=“cocacola”, ad=“Coca Cola Company Job” • q=*, ad=“Lose weight fast and easy” Hope: decoupling leads to better (cold-start) CTR predictions

  28. Basic setup • Get relevance from editorial judgments • Perfect, excellent, good, fair, bad • Treat non-bad as relevant • Machine learning approach • Compare query to the ad • Title, description, display URL • Word overlap (uni- and bigram), character overlap (uni- and bigram), cosine similarity, ordered bigram overlap • Query length • Data • 7k unique queries (stratified sample) • 80k query-ad judged relevant pairs

  29. Basic results – text only Precision = (“said ‘yes’ and was ‘yes’”)/(“said ‘yes’”) Recall = (“said ‘yes’ and was ‘yes’”)/(“was ‘yes’”) Accuracy = (“said the right thing”)/(“said something”) F1-score = 2/(1/P + 1/R) harmonic mean < arithmetic mean What other features?

  30. Incorporating user clicks • Can use historic CTRs • Assumes (ad,query) pair has been seen • Useless for new ads • Also evaluate in blanked-out setting

  31. Translation Model In search, translation models are common Here D = ad Good translation = ad click Typical model Maximum likelihood (for historic data) A query term An ad term Any problem with this?

  32. Digression on MLE • Maximum likelihood estimator • Pick the parameter that‘s most likely to generate the observed data Example: Draw a single number from a hat with numbers {1, …, n}. You observe 7. Maximum likelihood estimator? Underestimates size (c.f. # of species) Underestimates unknown/impossible Unbiased estimator?

  33. Remove position bias • Train one model as described before • But with smoothing • Train a second model using expected clicks • Ratio of model for actual and expected clicks • Add these as additional features for the learner

  34. Filtering low quality ads • Use to remove irrelevant ads • - Don‘t show ads below relevance threshold Showing fewer ads gave more clicks per search!

  35. Second part of Part 2

  36. Estimating Advertisability of Tail Queries for Sponsored Search Sandeep Pandey, Kunal Punera, Marcus Fontoura, et al. SIGIR’10

  37. Two important questions • Query advertisability • When to show ads at all • How many ads to show • Ad relevance and clickability • Which ads to show • Which ads to show where Focus on first problem. Predict: will there be an ad click? Difficult for tail queries!

  38. Word-based Model Query q has words {wi}. Model q‘s click propensity as: Good/bad? Variant w/o bias for long queries: Maximum likelihood attempt to learn these: s(q) = # instances of q with an ad click n(q) = # instances of q without an ad click

  39. Word-based Model Then give up …each q only one word

  40. Linear regression model Different model: words contribute linearly Add regularization to avoid overfitting of underdetermined problem Problem?

  41. Digression Taken from: http://www.dtreg.com/svm.htm and http://www.teco.edu/~albrecht/neuro/html/node10.html

  42. Topical clustering • Latent Dirichlet Allocation • Implicitly uses co-occurrences patterns • Incorporate the topic distributions as features in the regression model

  43. Evaluation • Why not use the observed c(q) directly? • “Ground truth” is not trustworthy – tail queries • Sort things by predicted c(q) • Should have included optimal ordering!

  44. Learning Website Hierarchies for Keyword Enrichment in Contextual Advertising Pavan Kumar GM, Krishna Leela, Mehul Parsana, Sachin Garg WSDM’11

  45. The problem(s) • Keywords extracted for contextual advertising are not always perfect • Many pages are not indexed – no keywords available. Still have to serve ads • Want a system that for a given URL (indexed or not) outputs good keywords • Key observation: use in-site similarity between pages and content

  46. Preliminaries • Mapping URLs u to key-value pairs • Represent webpage p as vector of keywords • tf, df, and section where found • Goals: • Use u to introduce new kw and/or update existing weights • For unindexed pages get kw via other pages from same site Latency constraint!

  47. What they do • Conceptually: • Train a decision tree with keys K as attribute labels, V as attribute values and pages P as class labels • Too many classes (sparseness, efficiency) • What they do: • Use clusters of web pages as labels

More Related