Learning Shape in Computer Go

Learning Shape in Computer Go David Silver

A brief introduction to Go • Black and white take turns to place down stones • Once played, a stone cannot move • The aim is to surround the most territory • Usually played on 19x19 board

Capturing • The lines radiating from a stone are called liberties • If a connected group of stones has all of its liberties removed then it is captured • Captured stones are removed from the board

Atari Go (Capture Go) • Atari Go is a simplified version of Go • The winner is the first player to capture • Often used to teach Go to beginners • Circumvents several tricky issues • The game only finishing by agreement • Ko (local repetitions of position) • Seki (local stalemates)

Computer Go • Computer Go programs are very weak • Search space is too large for brute force techniques • No good evaluation functions • Human intuition (shape knowledge) has proven difficult to capture. • Why not learn shape knowledge? • And use it to learn an evaluation function?

Local shape • Local shape describes a pattern of stones • It is used extensively by current Computer Go programs (pattern databases) • Inputting local shape by hand takes many years of hard labour • We would like to: • Learn local shapes by trial and error • Assign a value for the goodness of a shape • Just how good is a particular shape?

Enumerating local shapes • In these experiments all possible local shapes are used as features • Up to a small maximum size (e.g. 2x2) • A local shape is defined to be: • A particular configuration of stones • At a canonical position on the board • Local shapes are used as binary features by the learning algorithm

Invariances • Each canonical local shape can be: • Rotated • Reflected • Inverted • So each position may cause updates to multiple instances of each feature.

Algorithm • Value function is learnt for afterstates • Move selection is done by 1-ply greedy search (ε = 0) over value function • Active local shapes are identified • Linear combination is taken • Sigmoid squashing function is applied • Backups are performed using TD(0) • Reward of +1 for winning, 0 for losing

Value function approximation

Training procedure • The challenge: • Learn to beat the average liberty player • So learning algorithm was trained specifically against the average liberty player • The problem: learning is very slow, since the agent almost never wins any games by chance. • The solution: mix in a proportion of random moves until the agent wins 50% of all games. • Reduce the proportion of randomness as the agent learns to win more games.

Training procedure • The two pint challenge: • Learn to beat the average liberty player • So learning algorithm was trained specifically against the average liberty player • The problem: learning is very slow, since the agent almost never wins any games by chance. • The solution: mix in a proportion of random moves until the agent wins 50% of all games. • Reduce the proportion of randomness as the agent learns to win more games.

Results for different shape sizes

Results for different board sizes

Shapes learned (1x1)

Conclusions • Local shape information is sufficient to beat a naïve rule-based player • Significant shapes can be learned • The ‘goodness’ of shapes can be learned • A linear threshold unit can provide a reasonable evaluation function • Enumerating all local shapes reaches a natural limit at 3x3 • Training methodology is crucial

Future work • Learn shapes selectively rather than enumerating all possible shapes • Learn shapes to answer specific questions • Can black B4 be captured? • Can white connect A2 to D5? • Learn non-local shape: • Use connectivity relationships • Build hierarchies of shapes

Learning Shape in Computer Go

Learning Shape in Computer Go

Presentation Transcript

Nintendo a go-go Computer games: Learning tools for the digital native?

Computer Learning

Scalable Learning in Computer Vision

Active Learning in Computer Science?

Computer Vision: 3D Shape Reconstruction

IGB GO —— A self-learning GO program

Computer Learning

Computer Learning

Learning GO

Computer Learning

Computer Learning

LEARNING ON THE GO!

Learning Shape in Computer Go

A Contribution to Reinforcement Learning; Application to Computer Go

Computer Go : A Go player

Learning to Go: Mobile Learning

Learning in Computer Go

Pattern Matching in Computer Go

Reinforcement Learning of Local Shape in the Game of Atari-Go

Computer Learning

Learning Shape Names For Kids | Shape Learning Apps For Kids