Lecture 6: Adversarial Search & Games

Lecture 6:Adversarial Search & Games Reading: Ch. 6, AIMA Rutgers CS440, Fall 2003

Adversarial search • So far, single agent search – no opponents or collaborators • Multi-agent search: • Playing a game with an opponent: adversarial search • Economies: even more complex, societies of cooperative and non-cooperative agents • Game playing and AI: • Games can be complex, require (?) human intelligence • Have to evolve in “real-time” • Well-defined problems • Limited scope Rutgers CS440, Fall 2003

Games and AI Rutgers CS440, Fall 2003

Games and search • Traditional search: single agent, searches for its well-being, unobstructed • Games: search against an opponent • Consider a two player board game: • e.g., chess, checkers, tic-tac-toe • board configuration: unique arrangement of "pieces" • Representing board games as search problem: • states: board configurations • operators: legal moves • initial state: current board configuration • goal state: winning/terminal board configuration Rutgers CS440, Fall 2003

X X X X X X X X X X O O O X X O X Wrong representation • We want to optimize our (agent’s) goal, hence build a search tree based on possible moves/actions • Problem: discounts the opponent Rutgers CS440, Fall 2003

X X X X X X O X X X X X O O O O O X X O X X X O X X O Better representation: game search tree • Include opponent’s actions as well Agent move Full move Opponent move Agent move 5 10 1 Utilities (assigned to goal nodes) Rutgers CS440, Fall 2003

Game search trees • What is the size of the game search trees? • O(bd) • Tic-tac-toe: 9! leaves (max depth= 9) • Chess: 35 legal moves, average “depth” 100 • bd ~ 35100 ~10154 states, “only” ~1040 legal states • Too deep for exhaustive search! Rutgers CS440, Fall 2003

F -7 G -5 H 3 I 9 J -6 K 0 L 2 M 1 N 3 O 2 Utilities in search trees • Assign utility to (terminal) states, describing how much they are valued for the agent • High utility – good for the agent • Low utility – good for the opponent computer's possible moves A 9 opponent'spossible moves B -5 C 9 D 2 E 3 terminal states board evaluation from agent's perspective Rutgers CS440, Fall 2003

B -7 C -6 D 0 E 1 B C D E F -7 G -5 H 3 I 9 J -6 K 0 L 2 M 1 N 3 O 2 Search strategy • Worst-case scenario: assume the opponent will always make a best move (i.e., worst move for us) • Minimax search: maximize the utility for our agent while expecting that the opponent plays his best moves: • High utility favors agent => chose move with maximal utility • Low move favors opponent => assume opponent makes the move with lowest utility A1 A computer's possible moves opponent'spossible moves terminal states Rutgers CS440, Fall 2003

B -5 B -5 B C -6 C -6 C D 0 D 0 D E E 1 E 1 A A1 A A A A max min B B B C C C D D D E E E F -7 F -7 F -7 G -5 G -5 G -5 H 3 H 3 H 3 I 9 I 9 I 9 J -6 J -6 J -6 K 0 K 0 K 0 L 2 L 2 L 2 M 1 M 1 M 1 N 3 N 3 N 3 O 2 O 2 O 2 Minimax algorithm • Start with utilities of terminal nodes • Propagate them back to root node by choosing the minimax strategy Rutgers CS440, Fall 2003

Complexity of minimax algorithm • Utilities propagate up in a recursive fashion: • DFS • Space complexity: • O(bd) • Time complexity: • O(bd) • Problem: time complexity – it’s a game, finite time to make a move Rutgers CS440, Fall 2003

Reducing complexity of minimax (1) • Don’t search to full depth d, terminate early • Prune bad paths • Problem: • Don’t have utility of non-terminal nodes • Estimate utility for non-terminal nodes: • static board evaluation function (SBE) is a heuristic that assigns utility to non-terminal nodes • it reflects the computer’s chances of winning from that node • it must be easy to calculate from board configuration • For example, Chess: SBE = α* materialBalance + β* centerControl + γ* … material balance = Value of white pieces - Value of black piecespawn = 1, rook = 5, queen = 9, etc. Rutgers CS440, Fall 2003

Minimax with Evaluation Functions • Same as general Minimax, except • only goes to depth m • estimates using SBE function • How would this algorithm perform at chess? • if could look ahead ~4 pairs of moves (i.e., 8 ply) would be consistently beaten by average players • if could look ahead ~8 pairs as done in a typical PC, is as good as human master Rutgers CS440, Fall 2003

Reducing complexity of minimax (2) • Some branches of the tree will not be taken if the opponent plays cleverly. Can we detect them ahead of time? • Prune off paths that do not need to be explored • Alpha-beta pruning • Keep track of while doing DFS of game tree: • maximizing level: alpha • highest value seen so far • lower bound on node's evaluation/score • minimizing level: beta • lowest value seen so far • higher bound on node's evaluation/score Rutgers CS440, Fall 2003

Alpha-Beta Example minimax(A,0,4) CallStack max A α= A A B C D 0 E F G -5 H 3 I 8 J K L 2 M N 4 O P 9 Q -6 R 0 S 3 T 5 U -7 V -9 A W -3 X -5 Rutgers CS440, Fall 2003

Alpha-Beta Example minimax(B,1,4) CallStack max A α= min B B Bβ= C D 0 E F G -5 H 3 I 8 J K L 2 M N 4 O P 9 Q -6 R 0 S 3 T 5 U -7 V -9 B A W -3 X -5 Rutgers CS440, Fall 2003

Alpha-Beta Example minimax(F,2,4) CallStack max A α= Bβ= C D 0 E min max F Fα= F G -5 H 3 I 8 J K L 2 M N 4 O P 9 Q -6 R 0 S 3 T 5 U -7 V -9 F B A W -3 X -5 Rutgers CS440, Fall 2003

Alpha-Beta Example minimax(N,3,4) max CallStack A α= min Bβ= C D 0 E max Fα= G -5 H 3 I 8 J K L 2 M N N 4 N 4 O P 9 Q -6 R 0 S 3 T 5 U -7 V -9 F B A W -3 X -5 blue: terminal state Rutgers CS440, Fall 2003

Alpha-Beta Example minimax(F,2,4) is returned to alpha = 4, maximum seen so far CallStack A α= max Bβ= C D 0 E min Fα=4 Fα= G -5 H 3 I 8 J K L 2 M max N 4 O P 9 Q -6 R 0 S 3 T 5 U -7 V -9 F B A W -3 X -5 blue: terminal state Rutgers CS440, Fall 2003

Alpha-Beta Example minimax(O,3,4) CallStack A α= max Bβ= C D 0 E min Fα=4 G -5 H 3 I 8 J K L 2 M max O N 4 Oβ= O O P 9 Q -6 R 0 S 3 T 5 U -7 V -9 min F B A W -3 X -5 blue: terminal state Rutgers CS440, Fall 2003

Alpha-Beta Example minimax(W,4,4) CallStack max A α= min Bβ= C D 0 E Fα=4 G -5 H 3 I 8 J K L 2 M max W O N 4 Oβ= P 9 Q -6 R 0 S 3 T 5 U -7 V -9 min F B A W -3 W -3 X -5 blue: terminal state blue: terminal state (depth limit) Rutgers CS440, Fall 2003

Alpha-Beta Example minimax(O,3,4) is returned to beta = -3, minimum seen so far CallStack max A α= Bβ= C D 0 E min Fα=4 G -5 H 3 I 8 J K L 2 M max O N 4 Oβ=-3 Oβ= P 9 Q -6 R 0 S 3 T 5 U -7 V -9 min F B A W -3 X -5 blue: terminal state Rutgers CS440, Fall 2003

Alpha-Beta Example minimax(O,3,4) is returned to O's beta  F's alpha: stop expanding O (alpha cut-off) CallStack A α= max Bβ= C D 0 E min Fα=4 G -5 H 3 I 8 J K L 2 M max O N 4 Oβ=-3 P 9 Q -6 R 0 S 3 T 5 U -7 V -9 min F B A W -3 X -5 X -5 blue: terminal state Rutgers CS440, Fall 2003

Alpha-Beta Example Why? Smart opponent will choose W or worse, thus O's upper bound is –3 So computer shouldn't choose O:-3 since N:4 is better CallStack A α= max Bβ= C D 0 E min Fα=4 G -5 H 3 I 8 J K L 2 M max O N 4 Oβ=-3 P 9 Q -6 R 0 S 3 T 5 U -7 V -9 min F B A W -3 X -5 blue: terminal state Rutgers CS440, Fall 2003

Alpha-Beta Example minimax(F,2,4) is returned to alpha not changed (maximizing) CallStack A α= max Bβ= C D 0 E min Fα=4 G -5 H 3 I 8 J K L 2 M max N 4 Oβ=-3 P 9 Q -6 R 0 S 3 T 5 U -7 V -9 min F B A W -3 X -5 X -5 blue: terminal state Rutgers CS440, Fall 2003

Alpha-Beta Example minimax(B,1,4) is returned to beta = 4, minimum seen so far CallStack A α= max Bβ=4 Bβ= C D 0 E min Fα=4 G -5 H 3 I 8 J K L 2 M max N 4 Oβ=-3 P 9 Q -6 R 0 S 3 T 5 U -7 V -9 min B A W -3 X -5 X -5 blue: terminal state Rutgers CS440, Fall 2003

Effectiveness of Alpha-Beta Search • Effectiveness depends on the order in which successors are examined. More effective if bestare examined first • Worst Case: • ordered so that no pruning takes place • no improvement over exhaustive search • Best Case: • each player’s best move is evaluated first (left-most) • In practice, performance is closer to bestrather than worst case Rutgers CS440, Fall 2003

Effectiveness of Alpha-Beta Search • In practice often get O(b(d/2)) rather than O(bd) • same as having a branching factor of b since (b)d = b(d/2) • For Example: Chess • goes from b ~ 35 to b ~ 6 • permits much deeper search for the same time • makes computer chess competitive with humans Rutgers CS440, Fall 2003

Dealing with Limited Time • In real games, there is usually a time limit T on making a move • How do we take this into account? • cannot stop alpha-beta midway and expect to useresults with any confidence • so, we could set a conservative depth-limit that guarantees we will find a move in time < T • but then, the search may finish early andthe opportunity is wasted to do more search Rutgers CS440, Fall 2003

Dealing with Limited Time • In practice, iterative deepening search (IDS) is used • run alpha-beta search with an increasing depth limit • when the clock runs out, use the solution foundfor the last completed alpha-beta search(i.e., the deepest search that was completed) Rutgers CS440, Fall 2003

The Horizon Effect • Sometimes disaster lurks just beyond search depth • computer captures queen, but a few moves later the opponent checkmates (i.e., wins) • The computer has a limited horizon; it cannotsee that this significant event could happen • How do you avoid catastrophic losses due to “short-sightedness”? • quiescence search • secondary search Rutgers CS440, Fall 2003

The Horizon Effect • Quiescence Search • when evaluation frequently changing, look deeper than limit • look for a point when game “quiets down” • Secondary Search • find best move looking to depth d • look k steps beyond to verify that it still looks good • if it doesn't, repeat Step 2 for next best move Rutgers CS440, Fall 2003

Book Moves • Build a database of opening moves, end games, and studied configurations • If the current state is in the database, use database: • to determine the next move • to evaluate the board • Otherwise, do alpha-beta search Rutgers CS440, Fall 2003

Examples of Algorithmswhich Learn to Play Well Checkers: A. L. Samuel, “Some Studies in Machine Learning using the Game of Checkers,” IBM Journal of Research and Development, 11(6):601-617, 1959 • Learned by playing a copy of itself thousands of times • Used only an IBM 704 with 10,000 words of RAM, magnetic tape, and a clock speed of 1 kHz • Successful enough to compete well at human tournaments Rutgers CS440, Fall 2003

Examples of Algorithmswhich Learn to Play Well Backgammon: G. Tesauro and T. J. Sejnowski, “A Parallel Network that Learns to Play Backgammon,” Artificial Intelligence39(3), 357-390, 1989 • Also learns by playing copies of itself • Uses a non-linear evaluation function - a neural network • Rated one of the top three players in the world Rutgers CS440, Fall 2003

Non-deterministic Games • Some games involve chance, for example: • roll of dice • spin of game wheel • deal of cards from shuffled deck • How can we handle games with random elements? • The game tree representation is extendedto include chance nodes: • agent moves • chance nodes • opponent moves Rutgers CS440, Fall 2003

Aα= 50/50 50/50 .5 .5 .5 .5 Bβ=2 Cβ=6 Dβ=0 Eβ=-4 7 2 9 6 5 0 8 -4 Non-deterministic Games The game tree representation is extended: max chance min Rutgers CS440, Fall 2003

Aα= 50/50 50/50 .5 .5 .5 .5 Bβ=2 Cβ=6 Dβ=0 Eβ=-4 7 2 9 6 5 0 8 -4 Non-deterministic Games • Weight score by the probabilities that move occurs • Use expected value for move: sum of possible random outcomes max 50/50 4 50/50-2 chance min Rutgers CS440, Fall 2003

Aα= 50/50 4 50/50 -2 .5 .5 .5 .5 Bβ=2 Cβ=6 Dβ=0 Eβ=-4 7 2 9 6 5 0 8 -4 Non-deterministic Games • Choose move with highest expected value Aα=4 max chance min Rutgers CS440, Fall 2003

Non-deterministic Games • Non-determinism increases branching factor • 21 possible rolls with 2 dice • Value of lookahead diminishes: as depth increases probability of reaching a given node decreases • alpha-beta pruning less effective • TDGammon: • depth-2 search • very good heuristic • plays at world champion level Rutgers CS440, Fall 2003

Computers can playGrandMaster Chess “Deep Blue” (IBM) • Parallel processor, 32 nodes • Each node has 8 dedicated VLSI “chess chips” • Can search 200 million configurations/second • Uses minimax, alpha-beta, sophisticated heuristics • It currently can search to 14 ply (i.e., 7 pairs of moves) • Can avoid horizon by searching as deep as 40 ply • Uses book moves Rutgers CS440, Fall 2003

Computers can playGrandMaster Chess Kasparov vs. Deep Blue, May 1997 • 6 game full-regulation chess match sponsored by ACM • Kasparov lost the match 2 wins & 1 tie to 3 wins & 1 tie • This was an historic achievement for computer chess being the first time a computer became the best chess player on the planet • Note that Deep Blue plays by “brute force” (i.e., raw power from computer speed and memory); it uses relatively little that is similar to human intuition and cleverness Rutgers CS440, Fall 2003

Status of Computersin Other Deterministic Games • Checkers/Draughts • current world champion is Chinook • can beat any human, (beat Tinsley in 1994) • uses alpha-beta search, book moves (> 443 billion) • Othello • computers can easily beat the world experts • Go • branching factor b ~ 360 (very large!) • $2 million prize for any system that can beat a world expert Rutgers CS440, Fall 2003

Lecture 6: Adversarial Search & Games