1 / 45

490 likes | 778 Vues

Machine Learning in Computer Game Players. Chikayama & Taura Lab. M1 Ayato Miki. Outline. Introduction Computer Game Players Machine Learning in Computer Game Players Tuning Evaluation Functions Supervised Learning Reinforcement Learning Evolutionary Algorithms Conclusion.

Télécharger la présentation
## Machine Learning in Computer Game Players

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Machine Learning in Computer Game Players**Chikayama & Taura Lab. M1 Ayato Miki**Outline**• Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion**1. Introduction**• Improvements in Computer Game Players • DEEP BLUE defeated Kasparov in 1997 • GEKISASHI and TANASE SHOGI on WCSC 2008 • Strong Computer Game Players are usually developed by strong human players • Input heuristics manually • Devote a lot of time and energy to tuning**Machine Learning for Games**• Machine Learning enables automatic tuning using a large amount of data • It is not necessary for a developer to be an expert of the game**Outline**• Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion**2. Computer Game Players**• Games • Game Trees • Game Tree Search • Evaluation Function**Games**• Turn system games • ex. tic-tac-toe, chess, shogi, poker, mah-jong… • Additional Classification • two player or otherwise • zero-sum or otherwise • deterministic or non-deterministic • perfect or imperfect information • Game Tree Model**Game Trees**← player’s turn ← move 2 move 1 → ← opponent’s turn**Game Tree Search**• ex. Minimax search algorithm 5 Max 5 3 Min Min 5 8 3 6 Max Max 3 1 5 4 8 2 3 0 1 6 4 2**Game Tree Search**• Difficult to search up to leaf nodes • 10^220 possible positions in shogi • Stop search at practicable depth • And “Evaluate” nodes • Using Evaluation Function**Evaluation Function**• Estimate the superiority of the position • Elements • feature vector of the position • parameter vector feature vector of position s parameter vector**Outline**• Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion**3. Machine Learning inComputer Game Players**• Initial work • Samuel’s research [1959] • Learning objective • What do Computer Game Players Learn ?**Samuel’s Checker Player [1959]**• Many useful techniques • Rote learning • Quiescence search • 3-layer neural network evaluation function • And some machine learning techniques • Learning through self-play • Temporal-difference learning • Comparison training**Learning Objective**• Opening Book • Search Control • Evaluation Function**Learning Evaluation Functions**• Automatic construction of evaluation function • Construct and select a feature vector automatically • ex. GLEM [Buro, 1998] • Difficult • Tuning evaluation function parameters • Make a feature vector manually and tune its parameters automatically • Easy and effective**Outline**• Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion**4. Tuning Evaluation Functions**• Supervised Learning • Reinforcement Learning • Evolutionary Algorithm**Supervised Learning**• Provide the program with example positions and their exact evaluation values • Adjusts the parameters in a way that minimizes the error between the evaluation function outputs and the exact values ・・・ 20 50 50 40**Difficulty of Hard Supervised Training**• Manual labeling positions • Quantitative evaluation Consider more soft approach**Comparison Training**• Soft Supervised Training • Require only relative order for the possible moves • Easier and more intuitive >**Bonanza [Hoki, 2006]**• Comparison training using records of expert games • Simple relative order The expert move other moves >**Bonanza Method**• Based on the Optimal Control Theory • Minimize the Cost Function J example positions in the records total number of example positions error function**Bonanza Method**Error Function child position with move m total number of possible moves the move played in the record minimax search value order discriminant function**Order Discriminant Function**• Sigmoid Function • k is the parameter to control the gradient • When , T(x) is Step Function • In this case, the error function means “the number of moves that were considered to be better than the move in the record”**Bonanza**• 30,000 professional game records and 30,000 high rating game records in SHOGI CLUB 24 were used • The weight parameters of about 10,000 feature elements were tuned • And won in the World Computer Shogi Championship 2006**Problem of Supervised Learning**• It is costly to accumulate a training data set • It takes a lot of time to label manually • Using expert records has been successful • But how if not enough expert records ? • New games • Minor games • Other approach without a training set • ex. Reinforcement Learning (Next)**4. Tuning Evaluation Functions**• Supervised Learning • Reinforcement Learning • Evolutionary Algorithm**Reinforcement Learning**• The learner gets “a reward” from the environment • In the domain of game, the reward is final outcome(win/lose) • Reinforcement learning requires only the objective information of the game**Reinforcement Learning**+10 +20 -10 +30 +60 -30 +120 -60 +60 +200 -100 +100 Inefficient in Games…**Temporal-Difference Learning**+10 +10 +30 +15 +60 +10 +80 +100**TD-Gammon [Tesauro, 1992]**• Trained through self-play**Problems of Reinforcement Learning**• Falling into a local optimum • Lack of playing variation • Solutions • Add intentional randomness • Play against various players (computer/human) • Credit Assignment Problem (CAP) • Not clear which action was effective**4. Tuning Evaluation Functions**• Supervised Learning • Reinforcement Learning • Evolutionary Algorithm**Evolutionary Algorithm**Initialize Population Randomly Vary Individuals Evaluate “Fitness” Apply Selection**Research of Fogel et al. [2004]**• Evolutionary algorithm forchess player • Using open-source chess program • Attempt to tune its parameters**Initialization**• Make initial 10 parents • Initialize parameters with random values**Variation**• Create 10 offsprings from each surviving parent by mutating parental parameters Gaussian random variable strategy parameter**Evaluate Fitness and Selection**• Each player plays ten games against randomly selected opponents • Ten best players become parents of the next generation Select 10 opponents randomly**Tuned Parameters**• Material value • Positional value • Weights and biases of three neural networks**Three Neural Networks**• Each network has 3 Layers • Input = Arrangement of specific areas (front 2 rows, back 2 rows, and center 4x4 square) • Hidden = 10 Units • Output = Worth of the area arrangement 16 input 10 hidden 1 output**Result**• Initial Rating = 2066 (Expert) • Rating of open-source player • Best Rating = 2437 (Senior Master) • But the program cannot yet compete with other strongest chess programs (R2800~) 10 independent trials (Each has 50 generations)**Outline**• Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion**Future Work**• Automatic position labeling • Using records or computer play • Sophisticated reward • Consider opponent’s strength • Move analysis for credit assignment • Experiment in other games

More Related