Machine Learning in Computer Game Players
490 likes | 825 Vues
Machine Learning in Computer Game Players. Chikayama & Taura Lab. M1 Ayato Miki. Outline. Introduction Computer Game Players Machine Learning in Computer Game Players Tuning Evaluation Functions Supervised Learning Reinforcement Learning Evolutionary Algorithms Conclusion.
Machine Learning in Computer Game Players
E N D
Presentation Transcript
Machine Learning in Computer Game Players Chikayama & Taura Lab. M1 Ayato Miki
Outline • Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion
1. Introduction • Improvements in Computer Game Players • DEEP BLUE defeated Kasparov in 1997 • GEKISASHI and TANASE SHOGI on WCSC 2008 • Strong Computer Game Players are usually developed by strong human players • Input heuristics manually • Devote a lot of time and energy to tuning
Machine Learning for Games • Machine Learning enables automatic tuning using a large amount of data • It is not necessary for a developer to be an expert of the game
Outline • Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion
2. Computer Game Players • Games • Game Trees • Game Tree Search • Evaluation Function
Games • Turn system games • ex. tic-tac-toe, chess, shogi, poker, mah-jong… • Additional Classification • two player or otherwise • zero-sum or otherwise • deterministic or non-deterministic • perfect or imperfect information • Game Tree Model
Game Trees ← player’s turn ← move 2 move 1 → ← opponent’s turn
Game Tree Search • ex. Minimax search algorithm 5 Max 5 3 Min Min 5 8 3 6 Max Max 3 1 5 4 8 2 3 0 1 6 4 2
Game Tree Search • Difficult to search up to leaf nodes • 10^220 possible positions in shogi • Stop search at practicable depth • And “Evaluate” nodes • Using Evaluation Function
Evaluation Function • Estimate the superiority of the position • Elements • feature vector of the position • parameter vector feature vector of position s parameter vector
Outline • Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion
3. Machine Learning inComputer Game Players • Initial work • Samuel’s research [1959] • Learning objective • What do Computer Game Players Learn ?
Samuel’s Checker Player [1959] • Many useful techniques • Rote learning • Quiescence search • 3-layer neural network evaluation function • And some machine learning techniques • Learning through self-play • Temporal-difference learning • Comparison training
Learning Objective • Opening Book • Search Control • Evaluation Function
Learning Evaluation Functions • Automatic construction of evaluation function • Construct and select a feature vector automatically • ex. GLEM [Buro, 1998] • Difficult • Tuning evaluation function parameters • Make a feature vector manually and tune its parameters automatically • Easy and effective
Outline • Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion
4. Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithm
Supervised Learning • Provide the program with example positions and their exact evaluation values • Adjusts the parameters in a way that minimizes the error between the evaluation function outputs and the exact values ・・・ 20 50 50 40
Difficulty of Hard Supervised Training • Manual labeling positions • Quantitative evaluation Consider more soft approach
Comparison Training • Soft Supervised Training • Require only relative order for the possible moves • Easier and more intuitive >
Bonanza [Hoki, 2006] • Comparison training using records of expert games • Simple relative order The expert move other moves >
Bonanza Method • Based on the Optimal Control Theory • Minimize the Cost Function J example positions in the records total number of example positions error function
Bonanza Method Error Function child position with move m total number of possible moves the move played in the record minimax search value order discriminant function
Order Discriminant Function • Sigmoid Function • k is the parameter to control the gradient • When , T(x) is Step Function • In this case, the error function means “the number of moves that were considered to be better than the move in the record”
Bonanza • 30,000 professional game records and 30,000 high rating game records in SHOGI CLUB 24 were used • The weight parameters of about 10,000 feature elements were tuned • And won in the World Computer Shogi Championship 2006
Problem of Supervised Learning • It is costly to accumulate a training data set • It takes a lot of time to label manually • Using expert records has been successful • But how if not enough expert records ? • New games • Minor games • Other approach without a training set • ex. Reinforcement Learning (Next)
4. Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithm
Reinforcement Learning • The learner gets “a reward” from the environment • In the domain of game, the reward is final outcome(win/lose) • Reinforcement learning requires only the objective information of the game
Reinforcement Learning +10 +20 -10 +30 +60 -30 +120 -60 +60 +200 -100 +100 Inefficient in Games…
Temporal-Difference Learning +10 +10 +30 +15 +60 +10 +80 +100
TD-Gammon [Tesauro, 1992] • Trained through self-play
Problems of Reinforcement Learning • Falling into a local optimum • Lack of playing variation • Solutions • Add intentional randomness • Play against various players (computer/human) • Credit Assignment Problem (CAP) • Not clear which action was effective
4. Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithm
Evolutionary Algorithm Initialize Population Randomly Vary Individuals Evaluate “Fitness” Apply Selection
Research of Fogel et al. [2004] • Evolutionary algorithm forchess player • Using open-source chess program • Attempt to tune its parameters
Initialization • Make initial 10 parents • Initialize parameters with random values
Variation • Create 10 offsprings from each surviving parent by mutating parental parameters Gaussian random variable strategy parameter
Evaluate Fitness and Selection • Each player plays ten games against randomly selected opponents • Ten best players become parents of the next generation Select 10 opponents randomly
Tuned Parameters • Material value • Positional value • Weights and biases of three neural networks
Three Neural Networks • Each network has 3 Layers • Input = Arrangement of specific areas (front 2 rows, back 2 rows, and center 4x4 square) • Hidden = 10 Units • Output = Worth of the area arrangement 16 input 10 hidden 1 output
Result • Initial Rating = 2066 (Expert) • Rating of open-source player • Best Rating = 2437 (Senior Master) • But the program cannot yet compete with other strongest chess programs (R2800~) 10 independent trials (Each has 50 generations)
Outline • Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion
Future Work • Automatic position labeling • Using records or computer play • Sophisticated reward • Consider opponent’s strength • Move analysis for credit assignment • Experiment in other games