1 / 14

Honte, a Go-Playing Program Using Neural Nets

Honte, a Go-Playing Program Using Neural Nets. Frederik Dahl. Combined approach. Supervised learning Shape evaluation Reinforcement learning Group safety Territory Heuristic evaluation Influence Search Capture Connectivity Life and death. Architecture.

jud
Télécharger la présentation

Honte, a Go-Playing Program Using Neural Nets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Honte, a Go-Playing Program Using Neural Nets Frederik Dahl

  2. Combined approach • Supervised learning • Shape evaluation • Reinforcement learning • Group safety • Territory • Heuristic evaluation • Influence • Search • Capture • Connectivity • Life and death

  3. Architecture

  4. Shape evaluation: Multilayer perceptron • 190 inputs • Receptive field of radius 3 • Distance to edge • Liberties • Captured stones • 50 hidden nodes • Single output • Will an expert play here?

  5. Shape evaluation:Training and performance • Trained on 400 expert games • Expert move used as positive example (+1) • Random legal move as negative example (0) • Error backpropagation • error = target - eval • Performance measured by treating prediction as evaluation function • What percentage of legal moves are ranked below the expert move?

  6. Shape evaluation:Results

  7. Local search • Selective search for local goals • Capture • Connectivity • Life and death • Only considers moves suggested by shape evaluating network • Deep and narrow search • Captures common-sense knowledge

  8. Group safety evaluation:Multilayer perceptron • Groups defined by connectable blocks • 13 inputs • Number of stones in group • Number of liberties in group • Number of proven eyes • Average opponent influence over liberties • 20 hidden nodes • 1 output • Probability of group survival

  9. Group safety evaluation:Temporal difference learning • Trained by self-play • Reward signal for the group is the average final safety of stones • 0 = captured • 1 = survived • TD(0) is used, replaying games backwards • Very simple idea: • error = eval(next) - eval(now)

  10. Influence evaluation • Consider random walks from an intersection • How likely to end up at a black or white stone? • Can also take account of group safety estimates

  11. Territory evaluation • Another multilayer perceptron • 4 Inputs • Revised influence (for both sides) • Distance from edge • 10 hidden nodes • 1 output • Predicted territory value • Trained by TD(0) using eventual territory value as reward signal

  12. Playing strength • Playing 19x19 Go • Approximately even against Handtalk 97-06e • Wins more than 50% against Ego 1.0 • Weaknesses • Confuses group safety with group strength • Has no concept of the aji of a group

  13. New version of WinHonte 1.03 Neural net to evaluate sente/gote Trial version available online! Recent work

  14. Conclusions • Go knowledge can be learned • Combining different forms of knowledge can be a good idea • Multilayer perceptrons provide a flexible representation • Local search can be used effectively as input features for learning

More Related