1 / 48

Adversarial Search: Games

Adversarial Search: Games. Chapter 6 Feb 21 2007. What are, why study games?. Games are a form of multi-agent environment What do other agents do and how do they affect our success? Competitive multi-agent environments give rise to adversarial search ( games ) Why study games?

lotus
Télécharger la présentation

Adversarial Search: Games

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adversarial Search: Games Chapter 6 Feb 21 2007

  2. What are, why study games? • Games are a form of multi-agent environment • What do other agents do and how do they affect our success? • Competitive multi-agent environments give rise to adversarial search (games) • Why study games? • Fun; historically entertaining • Interesting because they are hard • Easy to represent,agents restricted to small number of actions

  3. Games vs. Search • Search – no adversary • Solution is (heuristic) method for finding goal • Heuristics and CSP techniques can find optimal solution • Evaluation function: estimate of cost from start to goal through given node • Examples: path planning, scheduling activities • Games – adversary • Solution is a strategy: specifies move for every possible opponent reply • Time limits force an approximate solution • Evaluation function: “goodness” of game position • Examples: chess, checkers, Othello, backgammon

  4. Historical Contributions • computer considers possible lines of play • Babbage, 1846 • algorithm for perfect play • Zermelo, 1912, Von Neumann, 1944 • finite horizon, approximate evaluation • Zuse, 1945, Wiener, 1948, Shannon, 1950 • first chess program • Turing, 1951 • machine learning to improve evaluation accuracy • Samuel, 1952-57 • pruning to allow deeper search • McCarthy, 1956

  5. Types of Games deterministic chance perfectinformation imperfectinformation

  6. Game setup • Two players: MAX and MIN • MAX moves first, take turns until the game is over • winner gets award, loser gets penalty • Games as search: • Initial state: e.g. board configuration of chess • Successor function: list of (move,state) pairs specifying legal moves. • Terminal test: Is the game finished? • Utility function: numerical value of terminal states. e.g. win (+1), loose (-1) and draw (0) in tic-tac-toe • MAX uses search tree to determine next move

  7. Example Game Tree ply move ply ply = half-move = one player’s turn

  8. Optimal strategies • Find the contingent strategy for MAX assuming an infallible MIN opponent. • Assumption: Both players play optimally !! • Given a game tree, the optimal strategy can be determined by using the minimax value of each node: MINIMAX-VALUE(n)= UTILITY(n) If n is a terminal maxs  successors(n) MINIMAX-VALUE(s) If n is a max node mins  successors(n) MINIMAX-VALUE(s) If n is a min node

  9. Two-Ply Game Tree

  10. Two-Ply Game Tree

  11. The minimax decision Two-Ply Game Tree Minimax maximizes the worst-case outcome for max.

  12. What if MIN does not play optimally? • Definition of optimal play for MAX assumes MIN plays optimally: maximizes worst-case outcome for MAX. • But if MIN does not play optimally, MAX will do even better. [proven.]

  13. Minimax Algorithm function MINIMAX-DECISION(state) returns an action inputs: state, current state in game vMAX-VALUE(state) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s)) return v function MIN-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s)) return v

  14. Properties of minimax • Complete? Yes (if tree is finite) • Optimal? Yes (against an optimal opponent) • Time complexity? O(bm) • Space complexity? O(bm) (depth-first) • For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible

  15. Multiplayer games • Games allow more than two players • Single minimax values become vectors

  16. Pruning • Minimax problem: number of games states is exponential in the number of moves. • Alpha-beta pruning • Remove branches that don’t influence final decision Range of possible values [-∞,+∞] [-∞, +∞]

  17. Alpha-Beta Example (continued) [-∞,+∞] [-∞,3]

  18. Alpha-Beta Example (continued) [-∞,+∞] [-∞,3]

  19. Alpha-Beta Example (continued) [3,+∞] [3,3]

  20. Alpha-Beta Example (continued) [3,+∞] This node is worse for MAX [3,3] [-∞,2] prune this subtree, already determined to be worse choice than left subtree

  21. Alpha-Beta Example (continued) , [3,14] [3,3] [-∞,2] [-∞,14]

  22. Alpha-Beta Example (continued) , [3,5] [3,3] [−∞,2] [-∞,5]

  23. Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [−∞,2]

  24. Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [-∞,2]

  25. Alpha-Beta Algorithm function ALPHA-BETA-SEARCH(state) returns an action inputs: state, current state in game vMAX-VALUE(state, - ∞ , +∞) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  - ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s,  , )) ifv ≥ then returnv  MAX( ,v) return v

  26. Alpha-Beta Algorithm function MIN-VALUE(state,  , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  + ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s,  , )) ifv ≤ then returnv  MIN( ,v) return v

  27. General alpha-beta pruning • Consider a node n somewhere in the tree • If player has a better choice at • Parent node of n • Or any choice point further up • n will never be reached in actual play. • Hence when enough is known about n, it can be pruned.

  28. α is the value of the best (i.e., highest-value) choice found so far at any choice point along the path for max If v is worse than α, max will avoid it  prune that branch Define β similarly for min Why is it called α-β?

  29. Final Comments about Alpha-Beta Pruning • Pruning does not affect final results • Entire subtrees can be pruned • Good move ordering improves effectiveness of pruning • With “perfect ordering,” time complexity is O(bm/2) • Branching factor of sqrt(b) !! • Alpha-beta pruning can look twice as far as minimax in the same amount of time • Repeated states are again possible. • Store them in memory = transposition table • A simple example of the value of reasoning about which computations are relevant (a form of metareasoning)

  30. Resource Limits • Minimax and alpha-beta pruning require too much leaf-node evaluation. • May be impractical within a reasonable amount of time. • SHANNON (1950): • Cut off search earlier (replace TERMINAL-TEST by CUTOFF-TEST) • Apply heuristic evaluation function EVAL (replacing utility function of alpha-beta)

  31. Cutting off search • Change: • if TERMINAL-TEST(state) then return UTILITY(state) into • if CUTOFF-TEST(state,depth) then return EVAL(state) • Introduces a fixed-depth limit depth • Is selected so that the amount of time will not exceed what the rules of the game allow. • When cutoff occurs, the evaluation is performed.

  32. Cutting off search Does it work in practice? bm = 106, b=35  m=4 4-ply lookahead is a hopeless chess player! • 4-ply ≈ human novice • 8-ply ≈ typical PC, human master • 12-ply ≈ Deep Blue, Kasparov

  33. Heuristic EVAL • Idea: produce an estimate of the expected utility of the game from a given position • Performance depends on quality of EVAL • Requirements: • EVAL should order terminal-nodes in the same way as UTILITY • Computation must not take too long • For non-terminal states the EVAL should be strongly correlated with the actual chance of winning • Only useful for quiescent states(no wild swings in value in near future)

  34. Heuristic EVAL example For chess, typically a linear weighted sum of features e.g., w1 = 9 with f1(s) = (number of white queens) – (number of black queens) Eval(s) = w1 f1(s) + w2 f2(s) + … + wnfn(s)

  35. Heuristic difficulties Heuristic counts pieces won

  36. Horizon effect Fixed depth search thinks it can avoid the queening move

  37. Deterministic games in practice • Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used a precomputed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions. • Chess: Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply. • Othello: human champions refuse to compete against computers, who are too good. • Go: human champions refuse to compete against computers, who are too bad. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves.

  38. Games that include chance chance nodes • Possible moves:(5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16) • [1,1], [6,6] chance 1/36, all other chance 1/18 • Cannot calculate definite minimax value, only expected value

  39. Expected minimax value EXPECTED-MINIMAX-VALUE(n)= UTILITY(n) If n is a terminal maxs  successors(n) MINIMAX-VALUE(s) If n is a max node mins  successors(n) MINIMAX-VALUE(s) If n is a max node s  successors(n) P(s) . EXPECTEDMINIMAX(s) If n is a chance node These equations can be backed-up recursively all the way to the root of the game tree.

  40. Position evaluation with chance nodes same ordering, different scaling • Left, A1 wins • Right A2 wins • Outcome of evaluation function mustnot change when values are scaled differently. • Behavior is preserved only by a positive linear transformation of EVAL.

  41. MIN wins trick MIN wins trick MAX wins remainingtricks  even score

  42. same result different card

  43. uncertaincard at this point, MAX has 50% chance of losing

  44. “averaging over clairvoyance”

  45. Summary • Games illustrate several important points about AI • Perfection is unattainable  approximation • Good idea what to think about what to think about • Uncertainty constrains the assignment of values to states • Games : AI as grand prix racing : automobile design

More Related