1 / 46

Previously…

Previously…. Adversarial Search. AIMA Chapter 5.1 – 5.5. AI vs. Human Players: the State of the Art. To Be Updated next year!. Deterministic Games in Practice.

lynne
Télécharger la présentation

Previously…

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Previously…

  2. Adversarial Search AIMA Chapter 5.1 – 5.5

  3. AI vs. Human Players: the State of the Art To Be Updated next year!

  4. Deterministic Games in Practice • Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used a precomputed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions. • Chess: Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply. • Othello: human champions refuse to compete against computers, who are too good.

  5. Outline • Adversarial search problems (aka games) • Optimal (i.e., Minimax) decisions • - pruning • Imperfect, real-time decisions

  6. Games vs.Search Problems

  7. Let’s Play! • Two players in a zero-sum game: • Winner gets paid and loser pays. • Easy to think in terms of a max player and min player • Player 1 wants to maximize value (MAX player) • Player 2 wants to minimize value (MIN player)

  8. Game: Problem Formulation A gameis defined by 7 components: • Initial state • States • Players: defines which player has the move in state • Actions : returns set of legal moves in state • Transition model : returns state that results from the move in state .

  9. Game: Problem Formulation A gameis defined by 7 components: • Terminal test returns true if game is over and false otherwise • Terminal states: states where the game has ended • Utility function gives final numeric value for a game that ends in terminal state for a player • Chess: White wins ;Black wins ; draw .

  10. Game Tree (2-Player, Deterministic, Turn-Taking) • Key property:zero-sum game • Loosely, it means that there’s a loser for every winner • Total utility score over all agents sum to zero (or constant) • Aka constant-sum game • Makes the game adversarial • Non Zero-Sum Games?

  11. Example : Game of NIM Several piles of sticks are given. We represent the configuration of the piles by a monotone sequence of integers, such as . A player may remove, in one turn, any number of sticks from one pile. Thus, would become if the player were to remove sticks from the last pile. The player who takes the last stick loses. • Represent the NIM game as a game tree.

  12. Player Strategies A strategy for player : what will player do at every node of the tree that they make a move in? Need to specify behavior in states that will never be reached!

  13. Winning Strategy A strategy for player 1 is called winning if for any strategy by player 2, the game ends with player 1 as the winner. A strategy for player 1 is called non-losing if for any strategy by player 2, the game ends in either a tie or a win for player 1. Theorem (Von Neumann): in the game of chess, only one of the following is true: • White has a winning strategy • Black has a winning strategy • Each player has a non-losing strategy.

  14. Optimal Strategy at Node - Minimax Intuitively, • MAX chooses move to maximize the minimum payoff • MIN chooses move to minimize the maximum payoff

  15. Minimax Play(Subperfect Nash Equilibrium) Backwards Induction P2 P1

  16. Minimax Play(Subperfect Nash Equilibrium) Backwards Induction P2 P1 P2

  17. Minimax Play(Subperfect Nash Equilibrium) Backwards Induction P2 P1 P2 P2

  18. Minimax Play(Subperfect Nash Equilibrium) Backwards Induction P2 P1 P2 P2

  19. Minimax Play(Subperfect Nash Equilibrium) What are the optimal strategies in this game? P2 P1 P2 P2

  20. Properties of Minimax • Yes (if game tree is finite) • Yes (optimal gameplay) • Like DFS:

  21. Minimax Algorithm • Runs in time polynomial in tree size • Returns a sub-perfect Nash equilibrium: the best action at every choice node. Are we done here?

  22. Backwards Induction • Game trees are huge: chess has game tree with nodes (planet Earth has atoms) • Impossible to expand the entire tree

  23. -Pruning • Basic idea:“If you have an idea that is surely bad, don't take the time to see how truly awful it is.” -- Pat Winston • Maintain a lower bound and upper bound of the values of, respectively, MAX’s and MIN’s nodes seen thus far. We can prune subtrees that will never affect minimax decision.

  24. MIN MAX

  25. MIN MAX

  26. MIN MAX

  27. MIN MAX

  28. MIN MAX

  29. MIN MAX

  30. MIN MAX No point exploring the last two nodes! Choosing results in a loss of at least 7…

  31. -Pruning • For each MAX node , is the highest observed value found on path from ; initially • For each MIN node , is the lowest observed value found on path from; initially • Alpha prune: given a MIN node , stop searching below if there is some MAX ancestor of with • Beta prune: given a MAX node , stop searching below if there is some MIN ancestor of with

  32. Analysis of α-β Pruning • When we prune a branch, it never affects final outcome. • Good move ordering improves effectiveness of pruning • “Perfect” ordering: time complexity = • Good pruning strategies allow us to search twice as deep! • Chess: simple ordering (checks, then take pieces, then forward moves, then backwards moves) gets you close to best-case result. • It makes sense to have good expansion order heuristics. • Random ordering: time complexity = for

  33. Summary:- Pruning Algorithm • Initially, , • is max along search path containing • is min along search path containing • If a MIN node has value , no need to explore further. • If a MAX node has value , no need to explore further.

  34. Time Limit • Problem: very large search space in typical games • Solution:-pruning removes large parts of search space • Unresolved problem: Maximum depth of tree • Standard solutions: • evaluation function = estimated expected utility of state • cutoff test: e.g., depth limit

  35. Heuristic Minimax Value Run minimax until depth ; then start using the evaluation function to choose nodes.

  36. Evaluation Functions • An evaluation function is a mapping from game states to real values: • So far: • Let’s move beyond that… • Should be cheap to compute • For non-terminal states, must be strongly correlated with actual chances of winning • Tic-Tac-Toe:

  37. Evaluation Functions • Chess: • Alan Turing’s evaluation function: where is the point value of white’s pieces and is the point value of black’s pieces. • Modern evaluation functions: weighted sum of position features • Example features: piece count, piece placement, controlled squares etc. • Deep Blue has about features. • How do we determine weights? Do they change dynamically?

  38. Evaluation Functions • Suppose that • It is possible that for we have for all . • The evaluation function does not differentiate between and ! • … but it can tell us an expected utility for all nodes sharing feature values.

  39. Let : all states whose feature values are as specified in the vector . • “All states where white has two pawns but black has a bishop.” • Suppose we know that in this case, • Black wins of games • White wins of games • of games end in a draw • Then expected utility for black is • Evaluation function need not return actual expected values, just maintain relative order of states.

  40. Cutting Off Search • Modify minimax or -pruning algorithms by replacing • Terminal-Test(state) with Cutoff-Test(state, depth) • Utility(state) is replaced by Eval(state) • Can also be combined with iterative deepening

  41. Stochastic Games • Many games have randomization: • Backgammon • Settlers of Catan • Poker • How do we deal with uncertainty? • Can we still use minimax?

  42. Adding Chance Layers Calculate the expected value of a state (MUCH harder than deterministic games)

  43. Example: Coin Toss Game(example from Brown and Sandholm “Safe and Nested Subgame Solving for Imperfect-Information Games”, NIPS 2017) • An unbiased coin is tossed; MAX player sees the result, MIN player doesn’t. • MAX player can choose to: • Sell the coin, getting a reward of , if the coin lands on heads and otherwise. • Play, in which case it’s player MIN’s turn. • In player MIN’s turn, needs to guess either “heads” or “tails”. Gets reward of 1 for correct guess.

  44. Example: Coin Toss Game(example from Brown and Sandholm “Safe and Nested Subgame Solving for Imperfect-Information Games”, NIPS 2017) Play Play Sell Sell Heads Heads Tails Tails

  45. Example: Coin Toss Game(example from Brown and Sandholm “Safe and Nested Subgame Solving for Imperfect-Information Games”, NIPS 2017) • We allow players to choose random strategies • MIN’s optimal strategy depends on - a value outside MIN’s subtree! “how desperate was MAX to let me play?” • Computing optimal strategies in perfect information games: no need to know what happens outside your subtree!

More Related