200 likes | 220 Vues
Introducing the basics of Go strategy, local shape learning in Computer Go, enumeration of possible shapes, training methods, and future research directions in mastering the game.
E N D
Learning Shape in Computer Go David Silver
A brief introduction to Go • Black and white take turns to place down stones • Once played, a stone cannot move • The aim is to surround the most territory • Usually played on 19x19 board
Capturing • The lines radiating from a stone are called liberties • If a connected group of stones has all of its liberties removed then it is captured • Captured stones are removed from the board
Capturing • The lines radiating from a stone are called liberties • If a connected group of stones has all of its liberties removed then it is captured • Captured stones are removed from the board
Atari Go (Capture Go) • Atari Go is a simplified version of Go • The winner is the first player to capture • Often used to teach Go to beginners • Circumvents several tricky issues • The game only finishing by agreement • Ko (local repetitions of position) • Seki (local stalemates)
Computer Go • Computer Go programs are very weak • Search space is too large for brute force techniques • No good evaluation functions • Human intuition (shape knowledge) has proven difficult to capture. • Why not learn shape knowledge? • And use it to learn an evaluation function?
Local shape • Local shape describes a pattern of stones • It is used extensively by current Computer Go programs (pattern databases) • Inputting local shape by hand takes many years of hard labour • We would like to: • Learn local shapes by trial and error • Assign a value for the goodness of a shape • Just how good is a particular shape?
Enumerating local shapes • In these experiments all possible local shapes are used as features • Up to a small maximum size (e.g. 2x2) • A local shape is defined to be: • A particular configuration of stones • At a canonical position on the board • Local shapes are used as binary features by the learning algorithm
Invariances • Each canonical local shape can be: • Rotated • Reflected • Inverted • So each position may cause updates to multiple instances of each feature.
Algorithm • Value function is learnt for afterstates • Move selection is done by 1-ply greedy search (ε = 0) over value function • Active local shapes are identified • Linear combination is taken • Sigmoid squashing function is applied • Backups are performed using TD(0) • Reward of +1 for winning, 0 for losing
Training procedure • The challenge: • Learn to beat the average liberty player • So learning algorithm was trained specifically against the average liberty player • The problem: learning is very slow, since the agent almost never wins any games by chance. • The solution: mix in a proportion of random moves until the agent wins 50% of all games. • Reduce the proportion of randomness as the agent learns to win more games.
Training procedure • The two pint challenge: • Learn to beat the average liberty player • So learning algorithm was trained specifically against the average liberty player • The problem: learning is very slow, since the agent almost never wins any games by chance. • The solution: mix in a proportion of random moves until the agent wins 50% of all games. • Reduce the proportion of randomness as the agent learns to win more games.
Conclusions • Local shape information is sufficient to beat a naïve rule-based player • Significant shapes can be learned • The ‘goodness’ of shapes can be learned • A linear threshold unit can provide a reasonable evaluation function • Enumerating all local shapes reaches a natural limit at 3x3 • Training methodology is crucial
Future work • Learn shapes selectively rather than enumerating all possible shapes • Learn shapes to answer specific questions • Can black B4 be captured? • Can white connect A2 to D5? • Learn non-local shape: • Use connectivity relationships • Build hierarchies of shapes