1 / 43

Lecture 6: Adversarial Search & Games

Lecture 6: Adversarial Search & Games. Reading: Ch. 6, AIMA. Adversarial search. So far, single agent search – no opponents or collaborators Multi-agent search: Playing a game with an opponent: adversarial search

Patman
Télécharger la présentation

Lecture 6: Adversarial Search & Games

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 6:Adversarial Search & Games Reading: Ch. 6, AIMA Rutgers CS440, Fall 2003

  2. Adversarial search • So far, single agent search – no opponents or collaborators • Multi-agent search: • Playing a game with an opponent: adversarial search • Economies: even more complex, societies of cooperative and non-cooperative agents • Game playing and AI: • Games can be complex, require (?) human intelligence • Have to evolve in “real-time” • Well-defined problems • Limited scope Rutgers CS440, Fall 2003

  3. Games and AI Rutgers CS440, Fall 2003

  4. Games and search • Traditional search: single agent, searches for its well-being, unobstructed • Games: search against an opponent • Consider a two player board game: • e.g., chess, checkers, tic-tac-toe • board configuration: unique arrangement of "pieces" • Representing board games as search problem: • states: board configurations • operators: legal moves • initial state: current board configuration • goal state: winning/terminal board configuration Rutgers CS440, Fall 2003

  5. X X X X X X X X X X O O O X X O X Wrong representation • We want to optimize our (agent’s) goal, hence build a search tree based on possible moves/actions • Problem: discounts the opponent Rutgers CS440, Fall 2003

  6. X X X X X X O X X X X X O O O O O X X O X X X O X X O Better representation: game search tree • Include opponent’s actions as well Agent move Full move Opponent move Agent move 5 10 1 Utilities (assigned to goal nodes) Rutgers CS440, Fall 2003

  7. Game search trees • What is the size of the game search trees? • O(bd) • Tic-tac-toe: 9! leaves (max depth= 9) • Chess: 35 legal moves, average “depth” 100 • bd ~ 35100 ~10154 states, “only” ~1040 legal states • Too deep for exhaustive search! Rutgers CS440, Fall 2003

  8. F -7 G -5 H 3 I 9 J -6 K 0 L 2 M 1 N 3 O 2 Utilities in search trees • Assign utility to (terminal) states, describing how much they are valued for the agent • High utility – good for the agent • Low utility – good for the opponent computer's possible moves A 9 opponent'spossible moves B -5 C 9 D 2 E 3 terminal states board evaluation from agent's perspective Rutgers CS440, Fall 2003

  9. B -7 C -6 D 0 E 1 B C D E F -7 G -5 H 3 I 9 J -6 K 0 L 2 M 1 N 3 O 2 Search strategy • Worst-case scenario: assume the opponent will always make a best move (i.e., worst move for us) • Minimax search: maximize the utility for our agent while expecting that the opponent plays his best moves: • High utility favors agent => chose move with maximal utility • Low move favors opponent => assume opponent makes the move with lowest utility A1 A computer's possible moves opponent'spossible moves terminal states Rutgers CS440, Fall 2003

  10. B -5 B -5 B C -6 C -6 C D 0 D 0 D E E 1 E 1 A A1 A A A A max min B B B C C C D D D E E E F -7 F -7 F -7 G -5 G -5 G -5 H 3 H 3 H 3 I 9 I 9 I 9 J -6 J -6 J -6 K 0 K 0 K 0 L 2 L 2 L 2 M 1 M 1 M 1 N 3 N 3 N 3 O 2 O 2 O 2 Minimax algorithm • Start with utilities of terminal nodes • Propagate them back to root node by choosing the minimax strategy Rutgers CS440, Fall 2003

  11. Complexity of minimax algorithm • Utilities propagate up in a recursive fashion: • DFS • Space complexity: • O(bd) • Time complexity: • O(bd) • Problem: time complexity – it’s a game, finite time to make a move Rutgers CS440, Fall 2003

  12. Reducing complexity of minimax (1) • Don’t search to full depth d, terminate early • Prune bad paths • Problem: • Don’t have utility of non-terminal nodes • Estimate utility for non-terminal nodes: • static board evaluation function (SBE) is a heuristic that assigns utility to non-terminal nodes • it reflects the computer’s chances of winning from that node • it must be easy to calculate from board configuration • For example, Chess: SBE = α* materialBalance + β* centerControl + γ* … material balance = Value of white pieces - Value of black piecespawn = 1, rook = 5, queen = 9, etc. Rutgers CS440, Fall 2003

  13. Minimax with Evaluation Functions • Same as general Minimax, except • only goes to depth m • estimates using SBE function • How would this algorithm perform at chess? • if could look ahead ~4 pairs of moves (i.e., 8 ply) would be consistently beaten by average players • if could look ahead ~8 pairs as done in a typical PC, is as good as human master Rutgers CS440, Fall 2003

  14. Reducing complexity of minimax (2) • Some branches of the tree will not be taken if the opponent plays cleverly. Can we detect them ahead of time? • Prune off paths that do not need to be explored • Alpha-beta pruning • Keep track of while doing DFS of game tree: • maximizing level: alpha • highest value seen so far • lower bound on node's evaluation/score • minimizing level: beta • lowest value seen so far • higher bound on node's evaluation/score Rutgers CS440, Fall 2003

  15. Alpha-Beta Example minimax(A,0,4) CallStack max A α= A A B C D 0 E F G -5 H 3 I 8 J K L 2 M N 4 O P 9 Q -6 R 0 S 3 T 5 U -7 V -9 A W -3 X -5 Rutgers CS440, Fall 2003

  16. Alpha-Beta Example minimax(B,1,4) CallStack max A α= min B B Bβ= C D 0 E F G -5 H 3 I 8 J K L 2 M N 4 O P 9 Q -6 R 0 S 3 T 5 U -7 V -9 B A W -3 X -5 Rutgers CS440, Fall 2003

  17. Alpha-Beta Example minimax(F,2,4) CallStack max A α= Bβ= C D 0 E min max F Fα= F G -5 H 3 I 8 J K L 2 M N 4 O P 9 Q -6 R 0 S 3 T 5 U -7 V -9 F B A W -3 X -5 Rutgers CS440, Fall 2003

  18. Alpha-Beta Example minimax(N,3,4) max CallStack A α= min Bβ= C D 0 E max Fα= G -5 H 3 I 8 J K L 2 M N N 4 N 4 O P 9 Q -6 R 0 S 3 T 5 U -7 V -9 F B A W -3 X -5 blue: terminal state Rutgers CS440, Fall 2003

  19. Alpha-Beta Example minimax(F,2,4) is returned to alpha = 4, maximum seen so far CallStack A α= max Bβ= C D 0 E min Fα=4 Fα= G -5 H 3 I 8 J K L 2 M max N 4 O P 9 Q -6 R 0 S 3 T 5 U -7 V -9 F B A W -3 X -5 blue: terminal state Rutgers CS440, Fall 2003

  20. Alpha-Beta Example minimax(O,3,4) CallStack A α= max Bβ= C D 0 E min Fα=4 G -5 H 3 I 8 J K L 2 M max O N 4 Oβ= O O P 9 Q -6 R 0 S 3 T 5 U -7 V -9 min F B A W -3 X -5 blue: terminal state Rutgers CS440, Fall 2003

  21. Alpha-Beta Example minimax(W,4,4) CallStack max A α= min Bβ= C D 0 E Fα=4 G -5 H 3 I 8 J K L 2 M max W O N 4 Oβ= P 9 Q -6 R 0 S 3 T 5 U -7 V -9 min F B A W -3 W -3 X -5 blue: terminal state blue: terminal state (depth limit) Rutgers CS440, Fall 2003

  22. Alpha-Beta Example minimax(O,3,4) is returned to beta = -3, minimum seen so far CallStack max A α= Bβ= C D 0 E min Fα=4 G -5 H 3 I 8 J K L 2 M max O N 4 Oβ=-3 Oβ= P 9 Q -6 R 0 S 3 T 5 U -7 V -9 min F B A W -3 X -5 blue: terminal state Rutgers CS440, Fall 2003

  23. Alpha-Beta Example minimax(O,3,4) is returned to O's beta  F's alpha: stop expanding O (alpha cut-off) CallStack A α= max Bβ= C D 0 E min Fα=4 G -5 H 3 I 8 J K L 2 M max O N 4 Oβ=-3 P 9 Q -6 R 0 S 3 T 5 U -7 V -9 min F B A W -3 X -5 X -5 blue: terminal state Rutgers CS440, Fall 2003

  24. Alpha-Beta Example Why? Smart opponent will choose W or worse, thus O's upper bound is –3 So computer shouldn't choose O:-3 since N:4 is better CallStack A α= max Bβ= C D 0 E min Fα=4 G -5 H 3 I 8 J K L 2 M max O N 4 Oβ=-3 P 9 Q -6 R 0 S 3 T 5 U -7 V -9 min F B A W -3 X -5 blue: terminal state Rutgers CS440, Fall 2003

  25. Alpha-Beta Example minimax(F,2,4) is returned to alpha not changed (maximizing) CallStack A α= max Bβ= C D 0 E min Fα=4 G -5 H 3 I 8 J K L 2 M max N 4 Oβ=-3 P 9 Q -6 R 0 S 3 T 5 U -7 V -9 min F B A W -3 X -5 X -5 blue: terminal state Rutgers CS440, Fall 2003

  26. Alpha-Beta Example minimax(B,1,4) is returned to beta = 4, minimum seen so far CallStack A α= max Bβ=4 Bβ= C D 0 E min Fα=4 G -5 H 3 I 8 J K L 2 M max N 4 Oβ=-3 P 9 Q -6 R 0 S 3 T 5 U -7 V -9 min B A W -3 X -5 X -5 blue: terminal state Rutgers CS440, Fall 2003

  27. Effectiveness of Alpha-Beta Search • Effectiveness depends on the order in which successors are examined. More effective if bestare examined first • Worst Case: • ordered so that no pruning takes place • no improvement over exhaustive search • Best Case: • each player’s best move is evaluated first (left-most) • In practice, performance is closer to bestrather than worst case Rutgers CS440, Fall 2003

  28. Effectiveness of Alpha-Beta Search • In practice often get O(b(d/2)) rather than O(bd) • same as having a branching factor of b since (b)d = b(d/2) • For Example: Chess • goes from b ~ 35 to b ~ 6 • permits much deeper search for the same time • makes computer chess competitive with humans Rutgers CS440, Fall 2003

  29. Dealing with Limited Time • In real games, there is usually a time limit T on making a move • How do we take this into account? • cannot stop alpha-beta midway and expect to useresults with any confidence • so, we could set a conservative depth-limit that guarantees we will find a move in time < T • but then, the search may finish early andthe opportunity is wasted to do more search Rutgers CS440, Fall 2003

  30. Dealing with Limited Time • In practice, iterative deepening search (IDS) is used • run alpha-beta search with an increasing depth limit • when the clock runs out, use the solution foundfor the last completed alpha-beta search(i.e., the deepest search that was completed) Rutgers CS440, Fall 2003

  31. The Horizon Effect • Sometimes disaster lurks just beyond search depth • computer captures queen, but a few moves later the opponent checkmates (i.e., wins) • The computer has a limited horizon; it cannotsee that this significant event could happen • How do you avoid catastrophic losses due to “short-sightedness”? • quiescence search • secondary search Rutgers CS440, Fall 2003

  32. The Horizon Effect • Quiescence Search • when evaluation frequently changing, look deeper than limit • look for a point when game “quiets down” • Secondary Search • find best move looking to depth d • look k steps beyond to verify that it still looks good • if it doesn't, repeat Step 2 for next best move Rutgers CS440, Fall 2003

  33. Book Moves • Build a database of opening moves, end games, and studied configurations • If the current state is in the database, use database: • to determine the next move • to evaluate the board • Otherwise, do alpha-beta search Rutgers CS440, Fall 2003

  34. Examples of Algorithmswhich Learn to Play Well Checkers: A. L. Samuel, “Some Studies in Machine Learning using the Game of Checkers,” IBM Journal of Research and Development, 11(6):601-617, 1959 • Learned by playing a copy of itself thousands of times • Used only an IBM 704 with 10,000 words of RAM, magnetic tape, and a clock speed of 1 kHz • Successful enough to compete well at human tournaments Rutgers CS440, Fall 2003

  35. Examples of Algorithmswhich Learn to Play Well Backgammon: G. Tesauro and T. J. Sejnowski, “A Parallel Network that Learns to Play Backgammon,” Artificial Intelligence39(3), 357-390, 1989 • Also learns by playing copies of itself • Uses a non-linear evaluation function - a neural network • Rated one of the top three players in the world Rutgers CS440, Fall 2003

  36. Non-deterministic Games • Some games involve chance, for example: • roll of dice • spin of game wheel • deal of cards from shuffled deck • How can we handle games with random elements? • The game tree representation is extendedto include chance nodes: • agent moves • chance nodes • opponent moves Rutgers CS440, Fall 2003

  37. Aα= 50/50 50/50 .5 .5 .5 .5 Bβ=2 Cβ=6 Dβ=0 Eβ=-4 7 2 9 6 5 0 8 -4 Non-deterministic Games The game tree representation is extended: max chance min Rutgers CS440, Fall 2003

  38. Aα= 50/50 50/50 .5 .5 .5 .5 Bβ=2 Cβ=6 Dβ=0 Eβ=-4 7 2 9 6 5 0 8 -4 Non-deterministic Games • Weight score by the probabilities that move occurs • Use expected value for move: sum of possible random outcomes max 50/50 4 50/50-2 chance min Rutgers CS440, Fall 2003

  39. Aα= 50/50 4 50/50 -2 .5 .5 .5 .5 Bβ=2 Cβ=6 Dβ=0 Eβ=-4 7 2 9 6 5 0 8 -4 Non-deterministic Games • Choose move with highest expected value Aα=4 max chance min Rutgers CS440, Fall 2003

  40. Non-deterministic Games • Non-determinism increases branching factor • 21 possible rolls with 2 dice • Value of lookahead diminishes: as depth increases probability of reaching a given node decreases • alpha-beta pruning less effective • TDGammon: • depth-2 search • very good heuristic • plays at world champion level Rutgers CS440, Fall 2003

  41. Computers can playGrandMaster Chess “Deep Blue” (IBM) • Parallel processor, 32 nodes • Each node has 8 dedicated VLSI “chess chips” • Can search 200 million configurations/second • Uses minimax, alpha-beta, sophisticated heuristics • It currently can search to 14 ply (i.e., 7 pairs of moves) • Can avoid horizon by searching as deep as 40 ply • Uses book moves Rutgers CS440, Fall 2003

  42. Computers can playGrandMaster Chess Kasparov vs. Deep Blue, May 1997 • 6 game full-regulation chess match sponsored by ACM • Kasparov lost the match 2 wins & 1 tie to 3 wins & 1 tie • This was an historic achievement for computer chess being the first time a computer became the best chess player on the planet • Note that Deep Blue plays by “brute force” (i.e., raw power from computer speed and memory); it uses relatively little that is similar to human intuition and cleverness Rutgers CS440, Fall 2003

  43. Status of Computersin Other Deterministic Games • Checkers/Draughts • current world champion is Chinook • can beat any human, (beat Tinsley in 1994) • uses alpha-beta search, book moves (> 443 billion) • Othello • computers can easily beat the world experts • Go • branching factor b ~ 360 (very large!) • $2 million prize for any system that can beat a world expert Rutgers CS440, Fall 2003

More Related