Advancements in AI: Temporal Difference Learning in Chess Programming
160 likes | 305 Vues
This research explores the implementation of Artificial Intelligence and Temporal Difference Learning algorithms within a computerized chess program. It investigates the challenges of searching through vast data, integrating heuristic searches, and developing effective evaluation functions for assessing chess moves. The project develops a multi-stage chess game using Python, progressing from text-based interaction to an AI capable of learning optimal evaluation strategies. Results and testing reveal the complexities of machine learning in game environments and offer insights into performance variability.
Advancements in AI: Temporal Difference Learning in Chess Programming
E N D
Presentation Transcript
By James Mannion Computer Systems Lab 08-09 Period 3 The Implementation of Artificial Intelligence and Temporal Difference Learning Algorithms in a Computerized Chess Program
Abstract • Searching through large sets of data • Complex, vast domains • Heuristic searches • Chess • Evaluation Function • Machine Learning
Introduction • Games • Minimax search • Alpha-beta pruning • Only look 2-3 moves into the future • Estimate strength of position • Evaluation function • Can improve heuristic by learning
Introduction • Seems simple, but can become quite complex. • Chess masters spend careers learning how to “evaluate” moves • Purpose: can a computer learn a good evaluation function?
Background • Claude Shannon, 1950 • Brute force would take too long • Discusses evaluation function • 2-ply algorithm, but looks further into the future for moves that could lead to checkmate • Possibility of learning in distant future
Development • Python • Stage 1: Text based chess game • Two humans input their moves • Illegal moves not allowed
Development • Stage 2: Introduce a computer player • 2-3 ply • Evaluation function will start out such that choices are based on a simple piece-differential where each piece is waited equally
Development • Stage 3: Learning • Temporal Difference Learning • Weight adjustment: • w←w + a*(Pt - Pt-1)*∂wPt-1 • a = 200/(199 + n) • P = 1/(1 + e-h) • h = w1(j1 – k1) + … +w5(j5 – k5)
Testing • Learning vs No Learning • Two equal, piece-differential players pitted against each other. • One will have the ability to learn • Multiple Games • Weight values and win-loss differential tracked over the length of the test
Results • Weights changed • This affected performance • Equilibrium values reached • Program actually got worse at chess • Probably due to code error
References • Shannon, Claude. “Programming a Computer for Playing Chess.” 1950 • Beal, D.F., Smith, M.C. “Temporal Difference Learning for Heuristic Search and Game Playing.” 1999 • Moriarty, David E., Miikkulainen, Risto. “Discovering Complex Othello Strategies Through Evolutionary Neural Networks.” • Huang, Shiu-li, Lin, Fu-ren. “Using Temporal-Difference Learning for Multi-Agent Bargaining.” 2007 • Russell, Stuart, Norvig, Peter. Artificial Intelligence: A Modern Approach. Second Edition. 2003. • Asgharbeygi, Nima, Stracuzzi, David and Langley, Pat.“Relational Temporal Difference Learning”.