Evolving B oard Game Players Without Using Expert Knowledge

Evolving BoardGame Players Without Using Expert Knowledge A presentation of research by Amit Benbassat Advisor: Moshe Sipper. Includes results: A. Benbassat and M. Sipper “Evolving Lose-Checkers Players using Genetic Programming” IEEE Conference on Computational Intelligence and Games (CIG'10), 2010 New yet unpublished results.

Synopsis • Tree based GP in a nutshell. • Applying tree based GP to Lose Checkers. • Expanding work to other games. • Available projects.

A Bit About Tree-Based GP • A method of solving problems by evolving solver programs. • The programs are represented in memory in tree form (i.e. the genomes are trees). • Initially promoted mostly through the efforts of John Koza.

IFT − + * ≤ + SQRT X 3 X 3 * + 1 X X Y X Y X Tree-Based GP Turning expressions into a tree shaped data structure: • (X + 1) – (√X) • IF (X≤3) THEN ((X+Y) + 3) ELSE ((X*Y)*X)

IFT IFT + + * * ≤ ≤ X X 3 3 X X 3 3 * * + + Y Y X X Y Y X X Generic Genetic Operators:Self-Replication

Generic Genetic Operators:Rebuild Mutation IFT + − * ≤ 4 X 3 Y X 3 * + X Y X Y

Generic Genetic Operators:Two-Way Crossover − IFT + + SQRT − ≤ 4 1 X X 3 X 3 Y + Y X

Synopsis • Previous results in games using GP and GAs. • Applying tree based GP to Lose Checkers. • Design. • Algorithm and operators. • Results. • Expanding work to other games. • Conclusions and future work.

Applying GP to Lose Checkers:From Genotype to Phenotype • Used strongly typed tree based GP. • Trees are seen as board-state evaluators. • The individual players are built around the evaluator, using it (integrated with alpha-beta search) to decide which move to take.

Terminal Nodes

Terminal Nodes (cont’d)

Function Nodes

Applying GP to Lose Checkers • Algorithm: Generate random population consisting of individuals of tree height 5 for generation 0. Repeat for each generation i Evaluate fitness. Selection(). Procreation(XOprob,mutProb).

Fitness Calculations • The system supports a sequence of guides. • Each guide has a number of rounds assigned to it. • Each guide has a number of games per round assigned to it. • The system also supports play between individuals in the population (referred to in the EA literature as coevolution) and a parameter coPlayNum for number of games. • Players get 1 fitness point for winning a game and 0.5 points for a draw.

Fitness Calculations (cont’d) for each guide i do for j ← 1 to guide i‘sNumber of rounds do Have every individual in the population deemed fit enough play guide i’s round size games against guide i. Have every individual in the population play coPlayNum games as black against coPlayNum random opponents in the population.

Selection Repeat until number of parents selected is equal to original population size Randomly choose two different individuals from population : I1 and I2 ifI1.Fitness > I2.Fitness then Select a copy of I1 for parent population. else Select a copy of I2 for parent population.

Genetic Operators:Local Mutation • Every tree node N returning a floating point value was assigned a number. • This number was initialized to 1.0 and acted as a factor for the return value. • Local mutation is a slight change in the node’s factor. Returns f1*(A+B) Returns f2*(A+B) + + <f1> <f2> A A B B

Genetic Operators:One-Way Crossover − IFT + + SQRT − ≤ 1 4 1 X X 3 X 3 Y + Y X

Procreation(XOprob,mutProb) While there remain at least 2 unselected individuals. find two unselected individuals I1I2 at random. with probability XOprob If I1.Fitness > I2.Fitness use one-way XO to transfer genes from I1 to I2. Else use two-way XO between I1 and I2. For each individual I1 in population. with probability mutProb choose a node in I1‘s tree at random and mutate it by either rebuild or local mutation.

Opponents • There is no known simple evaluation function for Lose Checkers. • All hand-crafted players used the random function to evaluate non-trivial board-states. • Two types of opponents were written in code: • The random player. • An α-β player of depth d with a random evaluation function.

Quality of α-β Players • To insure that α-β players using a random evaluation function are indeed proficient players, their performance was tested. • Each test tournament consists of 10000 games.

Results with Search Againstα-β Players • Using lookahead 3, playing 1000 games against αβ3.

Results with Search Againstα-β Players (cont’d) • Using lookahead 3, playing against various opponents.

Results with Search Againstα-β Players: Parameters • Run parameters: • Population 150, 120 generations. • No guide play, 50 co-play games as black, search depth 3. • maximum tree depth: • 12 in runs 44A-49A. • 14 in runs 56A-61A • XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5.

Evolving Players using Deeper Search • Results with players using lookahead 4.

Results with Search Againstα-β Players: Parameters • Run parameters: • Population 50, 70 generations. • guide play: • 20 games (in 2 rounds of 10) against αβ5. • 20 co-play games as black. • Search depth 4. • maximum tree depth of 10. • XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5.

The Role of Mobility • Initial runs with search produced tepid results. • The introduction of the mobility terminal greatly improved those results. • Mobility is a general principle which apllies to many board games, and often associated with a high level of play.

Synopsis • Tree based GP in a nutshell. • Applying tree based GP to Lose Checkers. • Expanding work to other games. • New results in Lose Checkers. • 10X10 Checkers. • Reversi. • Dodgem. • Conclusions and future work.

New Results in Lose Checkers • Results with players using lookahead 4.

New Results in Lose Checkers (cont’d) • Run parameters: • Population: 120-150 • Generations: 90-100. • Guide play: • 10 games against αβ2 in two of the runs. • 20-40 co-play games as black. • Search depth 4. • Maximum tree depth of 14. • XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5.

10x10 Checkers • 10x10 Board. • Objective: To eliminate all opponent pieces or render all opponent pieces immobile. • Rules: As in 8x8 version.

Quality of α-β Players • Evolved players were tested against α-β players that chose a material evaluation function at random for each turn. • To insure that α-β players are indeed proficient players, their performance was tested. • Each test tournament consists of 10000 games.

10x10 Checkers Results

10x10 Checkers Results (cont’d) • Run parameters: • Population: 100-150 • Generations: 100 • No guide play. 25-50 co-play games as black. • Search depth 4. • Maximum tree depth 13-14. • XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5.

8x8 Reversi • Popular board game. AKA Othello. • 8x8 board. • Each piece has black side and white side. • Each player places piece on her turn, flipping trapped opponent pieces. • Objective: Maximize number of friendly pieces on the board.

Reversi Specific Terminals

Quality of α-β Players • Evolved players were tested against α-β players that chose a material evaluation function at random for each turn. • To insure that α-β players are indeed proficient players, their performance was tested. • Each test tournament consists of 10000 games.

Reversi Results

Reversi Results (cont’d) • Run parameters: • Population: 120 • Generations: 100 • No guide play. 25-40 co-play games as black. • Search depth 4. • Maximum tree depth of 14. • XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5.

Dodgem

Synopsis • Tree based GP in a nutshell. • Applying tree based GP to Lose Checkers. • Expanding work to other games. • Available projects.

Your mission (should you decide to accept it) • Choose a game. • Write game program in C and interface with Java system. • Write game specific terminal nodes and adjustments if necessary. • Run it, document results, produce report.

Games

My Current Areas of Interest. • Games with high branching factor. • Games with random element. • Multiplayer games. • Games with partial information.

Another project. I want to check my selective crossover operator. • Adapt system to a toy problem. • Execute runs with selective XO and with typical XO using several parameter sets. • Compare and analyze results. • Write report.

Evolving B oard Game Players Without Using Expert Knowledge

Evolving B oard Game Players Without Using Expert Knowledge

Presentation Transcript

B OARD P RESENTATION 2006

B OARD of G OVERNORS

O n B oard D roid

B OARD of G OVERNORS

B OARD of G OVERNORS

Knowledge Without Understanding

B OARD of G OVERNORS

C hange C ontrol B oard

Using Computer Game to Share Knowledge

Knowledge without boundaries

2007-08 S ENIOR B OARD

Game without Frontiers

Knowledge without boundaries

B oard of Trustees Meeting

Expert Any Xbox Game Using This Assistance

B OARD of G OVERNORS

B OARD of G OVERNORS

B OARD of G OVERNORS

Game Theory is Evolving

ZEAL WITHOUT KNOWLEDGE