320 likes | 338 Vues
Learning in Computer Go. David Silver. The Problem. Large state space Approximately 10 172 states Game tree of about 10 360 nodes Branching factor of about 200 Evaluating a position is hard No good heuristics known Volatile Highly non-linear. Four ways to evaluate a position.
E N D
Learning in Computer Go David Silver
The Problem • Large state space • Approximately 10172 states • Game tree of about 10360 nodes • Branching factor of about 200 • Evaluating a position is hard • No good heuristics known • Volatile • Highly non-linear
Four ways to evaluate a position • Don’t even try • Hand-crafted heuristic • Monte Carlo simulation • Learned heuristic
Four choices about learning • What to learn • How to learn • State representation • Knowledge representation
What to learn • Global evaluation function • Shape • Life and death • Connectivity • Eyes
Global evaluation function • Several related concepts • Evaluation function • Heuristic • Value function • What to evaluate • Probability of winning • Expected score • How to evaluate • Sum of point territory estimates • Other approaches?
Shape • Local pattern information • Move recommendations • Learning shape from expert games • Stoutamire, Enderton, Van der Werf, Dahl • Learning shape by RL • NeuroGo v3
Life and Death • Two problems: • Will a group live or die? • Can a group live or die? • Solving the ‘can’ question • Alpha-beta search with learned heuristic [Wolf] • Solving the ‘will’ question • Supervised learning using rich feature set [Werf] • Reinforcement learning, averaged over group [Dahl]
Connectivity • Correlation between two points • Estimate potential groups of stones • Estimate potential regions of empty points • ‘Will connect’ (NeuroGo v3) • Reinforcement learning of local connectivity. • Pathfinding module for global connectivity. • Connectivity map used for learning global evaluation function
What else can we learn? • Eyes • Heuristics for endgame • Many other features…
How to learn • Reinforcement Learning • Supervised Learning • Combined Approaches • Evolutionary Methods
Reinforcement Learning • Temporal Difference Learning • Schraudolph, Dayan, Sejnowski • Enzenberger (NeuroGo) • Dahl (Honte) • Variants of TD() • TD(0) • TD() • TD-leaf() • Training methodology • Self-play • Expert games (Q-learning)
Supervised Learning • Learn to mimic expert play • Expert move as +ve training example • Random move as -ve training example • Need a ranking metric and error function • e.g. Stoutamire, Enderton, Van der Werf, Dahl • Learn from labelled final game positions • e.g. final score, life and death • Data is either noisy or sparse
Combined approaches • Can combine elements of both reinforcement and supervised learning. • e.g. Dahl’s Honte • Search • Local searches for eyes, connections, life and death • Global search using learned territory evaluation • Supervised learning • Local move prediction (shape) • Reinforcement learning • Life and death • Territory
Evolutionary Methods • Evolve a neural network to evaluate game positions • Donnelly, Lubberts, Richards, Rutquist • Evolve rules to match positions [Kojima] • ‘Feed’ rules according to matches • Split successful rules • Weight rules according to success in predicting response • Different kinds of rule • Flexible (production rules) • Fixed (within radius from move) • Semi-fixed (within radius of move, empty points only)
State Representation • Invariances • Graph representations • Feature selection • Dimensionality reduction
Invariances • Go board has many symmetries • Rotational • Reflectional • Colour inversion • Invariant under translation • Edges must be dealt with • Schraudolph, Dayan, Sejnowski
Graph Representations • Connected blocks are also (approximately) invariant. • Graepel’s ‘Common Fate Property’ • Used previously by Baum, Stoutamire, Enzenberger. • Generate a graph between units • Turn connected blocks and empty intersections into nodes • Turn adjacencies between units into edges • Learn on graph representation • Learn relationships between units (NeuroGo v2)
Feature selection • Raw board representation can be enhanced by any number of features • Comparison of important features (Werf) • Most significant: Stones, Liberties, Last Move • Also significant: Edge, Captures, Nearby stones • Trade-off between feature complexity and training time
Dimensionality Reduction • Can use feature extraction techniques • Werf compares a variety of algorithms • PCA performs well all round • Modified Eigenspace Separation Transform does even better • A combination may be best overall
Knowledge Representation • Pattern Databases • Neural Networks • Rules • Decision Trees • Others
Pattern Databases • Successful in commercial games • Can be learned in similar format • Go++ combines handcrafted pattern database and professional shape database (trade secret!)
Neural Networks • Can learn and represent pattern information • Successfully used in practice • Multilayer perceptrons + backpropagation • e.g. Schraudolph, Enzenberger, Werf, Dahl • Variants • Resilient backpropagation (Werf) • Linear architecture (e.g. Werf)
Rules • Horn clauses • Deductive inferencing (Kojima) • Production rules • Evolutionary approach (Kojima)
Decision Trees • Encodes patterns in concise, flexible form • Tilde (Ramon, Blockeel) • Relational representation language • Inductive logic programming • Successfully learns nakade shapes • Learned heuristic compares favourably to GoTools at life and death.
Other representations • Support Vector Machines (Graepel) • Boltzmann Machines (Stern, MacKay)
Conclusions • Common successful ideas • General approach • My approach
Common successful ideas • Global evaluation function • Reinforcement learning • Exploiting invariances • Carefully selected features • Neural network • Local move prediction • Supervised learning • +ve expert move, -ve random move • Neural network • But hasn’t led to a strong Go program
General Approach • There are many different approaches to learning in Go. • Focus on what to learn, and why it will help to play stronger Go. • What do we want to evaluate? • What knowledge do we need? • Which features will help? • Then select appropriate learning algorithms. • How should we train? • How should knowledge be represented?
My Approach • What to learn • Win/lose value function • How to learn • Reinforcement learning • Options • State representation • Predictive state representation • Can/will features • Knowledge representation • Kanerva code (high dimensional patterns) • Linear architecture