Introduction • Advertising on the Internet = $$$ • Especially search advertising and web page advertising • Problem: • Selecting ads relevant to the users, and profitable for the publishers and advertisers • In this paper: • A new framework for associating ads with web pages based on Genetic Programming • Finding a ranking function to increase overall precision and minimize the number of misplacements
Search advertising • Paid placement strategy • Advertiser pays a placement fee to get a prominent position in ad lists • The most popular = Keyword-targeted advertising • Query terms matched against ads keywords (provided by advertisers) • Has led to Content-targeted advertising • Use of the content of a web page to select ads to display
Ads • Ad is composed of • Title + textual description + hyperlink • Set of keywords chosen by advertiser • Describe the topics that should appear on the web page • Pay for each keyword • Campaign = group of ads related to one product • Only one ad per campaign should be placed in a web page in order to ensure fair use of the page advertising space
Vector Space Model Document Query
Problem to address • Ranking ads according to their relevance to the page ranking function • Indicates how relevant is an ad to a web page: rank(ai,p) • Tools: • Statistics about the structural parts of the ads • Term frequency in an ad or in a group of ads, number of ads, … • Keywords provided by the advertisers
Genetic Programming • Machine learning technique to find solutions optimized for certain problem characteristics • Based on biological inheritance and evolution • 3 key components: • Individuals • Fitness function: indicator of individuals performance • Genetic operations, to create more diverse and better individuals • Reproduction: breed new individuals identical to their parents • Crossover: breed an individual sharing attributes with 2 parents • Mutation: simulates the deviations occurring during reproduction
Individuals • Individuals are presented using a tree structure – easy parsing, easy implementing and interpretation: Nodes in the tree represent functions, leaves represent terminals (statistics about structural parts of the ads and the information provided by the advertisers like keywords )
List of Terminals • P Stands for different structural parts of the ad (keywords, title, etc.) • G indicated whether the ads are grouped. Functions:
Evaluation • The evaluation function should take into consideration the number of relevant ads and the order in which they appear, that is it should be a combination of precision and recall. An example is given by:
Genetic Operators • The genetic operations in the model are those commonly used in GP, that is – mutation, crossover and reproduction. • Mutation – implemented in such a way that a randomly selected sub-tree is replaced by a new sub-tree, also created randomly. • Crossover – taking two trees and exchanging randomly selected sub nodes of these trees forming two new children. • Reproduction – Cloning a selected individual to the next generation.
Best Individual • The last step in the GP is determining the best individual to applied to the test set. • The natural choice id the individual with best performance in the training set, but due to over fitting during the learning, the best individuals evolve over Ng generations are applied to a second document collection, which we call a validation set. • Avg. performance of individual in the training and validation sets. • Corresponding standard deviation.
Experiments • Test Collection: A set of 100 pages extracted from a Brazilian newspaper. Cover different subjects. 50 pages were used as training, 30 pages for validation, and 20 pages for test. • Relevant ads: Same pooling method used to evaluate the TREC web based collection. • The size of the population was fixed at 750 individuals. • Crossover, Mutation and Reproduction at rates 85%, 10% and 5%, respectively. • 30 generations.
Results • Comparing results against the AAK_H method which consists the Cosine similarity function to match a page to the ad.
Conclusions • Genetic Programming => discover a good ranking estimation • Very effective in placing ads in web pages • By using a real ad collection and web pages from a Brazilian newspaper, the GP model obtained a gain of 61.7% over the baseline method.