1 / 32

Learning in Computer Go

Learning in Computer Go. David Silver. The Problem. Large state space Approximately 10 172 states Game tree of about 10 360 nodes Branching factor of about 200 Evaluating a position is hard No good heuristics known Volatile Highly non-linear. Four ways to evaluate a position.

kathlync
Télécharger la présentation

Learning in Computer Go

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning in Computer Go David Silver

  2. The Problem • Large state space • Approximately 10172 states • Game tree of about 10360 nodes • Branching factor of about 200 • Evaluating a position is hard • No good heuristics known • Volatile • Highly non-linear

  3. Four ways to evaluate a position • Don’t even try • Hand-crafted heuristic • Monte Carlo simulation • Learned heuristic

  4. Four choices about learning • What to learn • How to learn • State representation • Knowledge representation

  5. What to learn • Global evaluation function • Shape • Life and death • Connectivity • Eyes

  6. Global evaluation function • Several related concepts • Evaluation function • Heuristic • Value function • What to evaluate • Probability of winning • Expected score • How to evaluate • Sum of point territory estimates • Other approaches?

  7. Shape • Local pattern information • Move recommendations • Learning shape from expert games • Stoutamire, Enderton, Van der Werf, Dahl • Learning shape by RL • NeuroGo v3

  8. Life and Death • Two problems: • Will a group live or die? • Can a group live or die? • Solving the ‘can’ question • Alpha-beta search with learned heuristic [Wolf] • Solving the ‘will’ question • Supervised learning using rich feature set [Werf] • Reinforcement learning, averaged over group [Dahl]

  9. Connectivity • Correlation between two points • Estimate potential groups of stones • Estimate potential regions of empty points • ‘Will connect’ (NeuroGo v3) • Reinforcement learning of local connectivity. • Pathfinding module for global connectivity. • Connectivity map used for learning global evaluation function

  10. What else can we learn? • Eyes • Heuristics for endgame • Many other features…

  11. How to learn • Reinforcement Learning • Supervised Learning • Combined Approaches • Evolutionary Methods

  12. Reinforcement Learning • Temporal Difference Learning • Schraudolph, Dayan, Sejnowski • Enzenberger (NeuroGo) • Dahl (Honte) • Variants of TD() • TD(0) • TD() • TD-leaf() • Training methodology • Self-play • Expert games (Q-learning)

  13. Supervised Learning • Learn to mimic expert play • Expert move as +ve training example • Random move as -ve training example • Need a ranking metric and error function • e.g. Stoutamire, Enderton, Van der Werf, Dahl • Learn from labelled final game positions • e.g. final score, life and death • Data is either noisy or sparse

  14. Combined approaches • Can combine elements of both reinforcement and supervised learning. • e.g. Dahl’s Honte • Search • Local searches for eyes, connections, life and death • Global search using learned territory evaluation • Supervised learning • Local move prediction (shape) • Reinforcement learning • Life and death • Territory

  15. Evolutionary Methods • Evolve a neural network to evaluate game positions • Donnelly, Lubberts, Richards, Rutquist • Evolve rules to match positions [Kojima] • ‘Feed’ rules according to matches • Split successful rules • Weight rules according to success in predicting response • Different kinds of rule • Flexible (production rules) • Fixed (within radius from move) • Semi-fixed (within radius of move, empty points only)

  16. State Representation • Invariances • Graph representations • Feature selection • Dimensionality reduction

  17. Invariances • Go board has many symmetries • Rotational • Reflectional • Colour inversion • Invariant under translation • Edges must be dealt with • Schraudolph, Dayan, Sejnowski

  18. Graph Representations • Connected blocks are also (approximately) invariant. • Graepel’s ‘Common Fate Property’ • Used previously by Baum, Stoutamire, Enzenberger. • Generate a graph between units • Turn connected blocks and empty intersections into nodes • Turn adjacencies between units into edges • Learn on graph representation • Learn relationships between units (NeuroGo v2)

  19. Learning Relations in NeuroGo (v2)

  20. Feature selection • Raw board representation can be enhanced by any number of features • Comparison of important features (Werf) • Most significant: Stones, Liberties, Last Move • Also significant: Edge, Captures, Nearby stones • Trade-off between feature complexity and training time

  21. Feature comparison in NeuroGo (v3)

  22. Dimensionality Reduction • Can use feature extraction techniques • Werf compares a variety of algorithms • PCA performs well all round • Modified Eigenspace Separation Transform does even better • A combination may be best overall

  23. Knowledge Representation • Pattern Databases • Neural Networks • Rules • Decision Trees • Others

  24. Pattern Databases • Successful in commercial games • Can be learned in similar format • Go++ combines handcrafted pattern database and professional shape database (trade secret!)

  25. Neural Networks • Can learn and represent pattern information • Successfully used in practice • Multilayer perceptrons + backpropagation • e.g. Schraudolph, Enzenberger, Werf, Dahl • Variants • Resilient backpropagation (Werf) • Linear architecture (e.g. Werf)

  26. Rules • Horn clauses • Deductive inferencing (Kojima) • Production rules • Evolutionary approach (Kojima)

  27. Decision Trees • Encodes patterns in concise, flexible form • Tilde (Ramon, Blockeel) • Relational representation language • Inductive logic programming • Successfully learns nakade shapes • Learned heuristic compares favourably to GoTools at life and death.

  28. Other representations • Support Vector Machines (Graepel) • Boltzmann Machines (Stern, MacKay)

  29. Conclusions • Common successful ideas • General approach • My approach

  30. Common successful ideas • Global evaluation function • Reinforcement learning • Exploiting invariances • Carefully selected features • Neural network • Local move prediction • Supervised learning • +ve expert move, -ve random move • Neural network • But hasn’t led to a strong Go program

  31. General Approach • There are many different approaches to learning in Go. • Focus on what to learn, and why it will help to play stronger Go. • What do we want to evaluate? • What knowledge do we need? • Which features will help? • Then select appropriate learning algorithms. • How should we train? • How should knowledge be represented?

  32. My Approach • What to learn • Win/lose value function • How to learn • Reinforcement learning • Options • State representation • Predictive state representation • Can/will features • Knowledge representation • Kanerva code (high dimensional patterns) • Linear architecture

More Related