550 likes | 576 Vues
Learn about evolving controllers through competitive fitness in the game of tag, a model of behavior control and locomotion direction. Studying agent behavior evolution through coevolving strategies and genetic programming.
E N D
Blackjack &The game of tag Presented by Leonid Leontiev
Game of tag Competition, Coevolution and the Game of Tag Craig W. Reynolds Electronic Arts 1450 Fashion Island Boulevard San Mateo, CA 94404 USA telephone: 415-513-7442, fax: 415-571-1893 creynolds@ea.com cwr@red.com
Game of tag introduction • Tag is a children’s game based on symmetrical pursuit and evasion • Tag is played by two or more, one of whom is designated as “it” • The it player chases the others, who all try to escape
Background • Tag is intended as a simple model of behavior based oncontrol of locomotion direction, or steering • Testcase to learn about evolving controllers for related, butmore complex tasks • A player’s fitness is determined by how well it performs when placed in competition with several opponents chosen randomly from the coevolving population of players
Goals • Studytheuse of competitive fitness in the evolution of agent behavior • Automatically discover a controller throughevolution based solely on competition between controllers • Analyzeapproachthat stands in contrast to evolving controllers bypitting them against a static, predetermined expert strategy
History • 1992 John Koza “Genetic Programming: onthe Programming of Computers by Means ofNaturalSelection” • 1993 Pete Angeline’s work on coevolution of players for the game of Tic Tac Toe, using competitive fitness • 1994 Smith R. E. work oncoevolution of strategies for the game of Othello • 1994Sims, K. “Evolving 3D Morphology andBehavior by Competition”
Experimental Design • Genetic Programming is used to evolve control programs for simulated vehicles • No static, predetermined control program • The vehicles are abstract autonomousagents, moving atconstant speed on a two dimensional surface • Jobof control program is to inspect the environment and to compute a steering angle
Experimental Design For each player, at each simulation step: • Its controlprogram is run to determine a steering angle • Thevehicle's heading is altered by this angle • The vehicle is moved a fixed distance along its new heading • Tags are detected and handled • The step length istypically 125% longer for “it”
Experimental Design • No simulation of force, mass, accelerationor momentum • Always two players in a tag game • The playing field is featureless • Fitness is defined to be the portion of time (simulationsteps) spent not being it
Experimental Design • The entire state of the worldconsists of: • a flag indicating who is it • the relativeposition of the opponent's vehicle
Experimental Design • Seriesof 4 games is played • Thetwo players alternate starting as it for each game of theseries • Before each game: • The players are given random initial headings • Randomly positioned within a starting box measuring about 3.5 vehicle-body-lengths on a side • Tag the opponent – getting to within one vehicle length
Experimental Design • Each game consisted of 25 simulation steps • A player's score fora game is the number-of-non-it-steps divided by 25
Experimental Design • To determine a player's fitness, it is pitted against 6 randomly chosenplayers from the existing population • Scores from these 24 games are averaged together to obtain the final fitness value
Genetic Programming • Steady State GeneticProgramming (SSGP) • choosing twoparent programs from the population • creating a newoffspringprogram from parents by applying crossover operator and mutation • testing the fitness of thenew program • choosing a program to remove from the population to make room • adding the new program into the population
Problems • Mediocre-but-lucky program may receive undeservedly high fitness and going on to dominate thepopulation • Competitive fitness values aremeasured relative to the population at a certain point intime • Because steady state genetic computation proceeds individual by individual, there is no demarcation of generations.
Size limitation • Measured in term of the total number of functions and/orterminals • When a program sizeexceeds this limit, the hoistgenetic operator [Kinnear 1994]is used to find a smaller (but hopefully still fit)subexpression
Results • These experiments were run on Macintosh Quadra 950 workstations. In this implementation a fitness test consisting of 24 tag games takes 7 to 12 seconds to run, depending on program size.
Run A • A population of 5000 individuals. • Both players moved at the same speed • Most popular strategies at the early stage: • Evasion vehiclesimply travel in a straight line • Pursuit strategiesappear to have been looping (constant steering angle) and“stumblers” that seemed to move erratically, but managed tocreep slowly towards their target.
Run A cont. • Later an improved evasion strategy appeared: if the pursuer is behind you, go straightahead, otherwise turn randomly(if-it <pursuer-branch> (max 0 (local-y))) • The pursuers got to be very good at picking offthe easy targets, the inefficient evaders
Run A cont. • At the end stage of run A pursuit strategyused acompetent but inefficient “three phase” technique
Run C • A population of 1000 individuals • Mutation was added in an attempt to prevent the loss of diversity observed in earlier runs • Many games consisted of a chase featuring near-optimal pursuit and evasion
Fitness of the optimal player placed in competition with the evolving population
Run Ccont. • After 215415 individuals were processed (215 generations), therewere 4 individuals with the same best fitness value • One of these was comparedto the optimal player in a series of 100 games • Got a score of 49.3%
Run G • Did not segregate the pursuer and evader code • The change seemed to make the problem harder to solve • Used a larger limit on program size(100)
Fitness of the optimal player placed in competition with the evolving population
Run G cont. • Individual 113520 was the best of population • The program size is 98 • Many strange behavioral traits • Pursuit behavior has a reasonable two phase strategy foropponents up to 5 units ahead but is very inept foropponents further away • The evasion behavior is stronglyasymmetrical
Individual 113520code (% (% (if-it (abs (local-x)) (iflte (iflte (local-x) 0.57168305 (local-x) (+ (iflte (local-y) (iflte (local-y) (if-it (local-x) (abs (local-x))) (iflte 0.40530929 0.26004231 (abs (local-x)) (local-y)) (if-it 0.40530929 0.57168305)) (min (abs (local-x)) (+ (local-x) (localx))) (local-x)) (local-x))) 0.57168305 (local-x) (+ (iflte (local-y) (iflte (local-y) (if-it (local-x) (local-x)) (iflte 0.40530929 (local-x) (abs (iflte (local-x) 0.37254661 0.32281655 (local-x))) (local-x)) (if-it 0.40530929 (abs (local-x)))) (min 0.1637349 (iflte (local-x) (local-y) (abs (iflte (abs (local-x)) (max (ifit (local-y) (abs 0.53183758)) (local-x)) 0.32281655 (local-x))) 0.53183758)) (local-x)) (local-x)))) (+ (local-x) (local-x))) (iflte (- (abs 0.53183758) (if-it (% 0.57168305 (local-y)) (- 0.1637349 (local-y)))) 0.40530929 (abs 0.53183758) 0.83426005))
Conclusions • Using the game of tag to test relative fitness, artificialevolution was able to discover skillful, near-optimal tag players • Good results were obtained despite the random selection of opponents and based only on relative performance fitness • The population’s averageperformance was within 10% of the optimal player, and thebest of population individual performed within a fewpercentage points of optimal (in run C)
Conclusions • The quality of evolved playersapproached, but did not reach, that of the optimal player • Possible reasons: • Fundamental limitation ofcompetitive fitness • Flawin the experimental design • Limitations of genetic population size andlength of runs
Blackjack Evolving Strategies in Blackjack David B. Fogel Natural Selection, Inc. 3333 N. Torrey Pines Ct., Suite 200 La Jolla, CA 92037 USA dfogel@natural-selection.com
Background on blackjack • Blackjack also known as 21 • Player or players compete against the dealer or “house.” • The rules vary by casino, and even by country. • The variations are insignificant, but affect the potential profitability of player strategies.
Blackjack rules • The dealer and each player receives two cards. The dealer turns the first of his cards face up and the other remains face down. • The object is to come as close to 21 as possible without going “busted.” • Each card is counted as its face value, • Face cards counting 10 • Aces being counted as1 or 11
Blackjack rules • If the first two cards dealt to the player yield 21, this is called “blackjack” • If the dealer’s up card is an ace, the player may purchase “insurance” for half the amount of the player’s wager. If the dealer has blackjack, the player wins 2:1
Blackjack rules • If the player has two cards of equal denomination on the deal, he may split the cards into two new hands. • Also, on the initial deal, when the player has two cards, he has the option of “doubling down” • If the player goes over 21, he busts and immediately loses his wager. • If the player stands at a value less than or equal to 21, the play proceeds to the dealer,
History • The intelligent player can win consistently at blackjack by “counting cards,” using the history of which cards have been played • 1956 – “The Optimum Strategy in Blackjack”, Dr. Roger Baldwin • 1962 – “Beat the Dealer”, Dr. Edward Thorp
History • Thorp analyzed the player advantage, using his basic strategy, when the (single) deck contained all 16 tens, and when a number of the tens were removed. • +0.13% advantage with all 16 tens • −1.85% disadvantage with 12 tens • −3.13% disadvantage with 8 tens • −2.14% disadvantage with 4 tens • +1.62% advantage when no ten remained. • No linear relationship between the number of tens and the player’s advantage or disadvantage.
Basic strategy • The player makes the same play in the same setting without respect to which cards have been played in prior hands • If the player mimicked the dealers rules, the player faced a disadvantage between -5.56% and -6.78% with 95% confidence
Counting strategy • Computer simulation has shown that the player can have an advantage over the house by altering his strategy based on the distribution of cards played in prior hands • Player advantage after removing all of the cards of a given rank • The most significant single card is the 5
Counting strategy Type of Missing Advantage % Card Player Aces ............................ -2.42 Twos ........................... +1.75 Threes ......................... +2.14 Fours ........................... +2.64 Fives ........................... +3.58 Sixes ........................... +2.40 Sevens ........................ +2.05 Eights .......................... +0.43 Nines ........................... -0.41 Tens ............................ +1.62
EvolvingBasicStrategies • Starting with Gollehon's basic strategy and three random variants of the strategy • 3 million simulated hands on a single deck, reshuffling after 2/3 of the deck had been played • Strategies were represented as entries in matrices describing decisions