Adversarial Search

Adversarial Search Chapter 6 Section 1 – 4

Search in an Adversarial Environment • Iterative deepening and A* useful for single-agent search problems • What if there are TWO agents? • Goals in conflict: • Adversarial Search • Especially common in AI: • Goals in direct conflict • IE: GAMES.

Games vs. search problems • "Unpredictable" opponent  specifying a move for every possible opponent reply • Time limits  unlikely to find goal, must approximate • Efficiency matters a lot • HARD. • In AI, typically "zero sum": one player wins exactly as much as other player loses.

Types of games Deterministic Chance Perfect Info Chess, Monopoly Checkers Backgammon Othello Tic-Tac-Toe Imperfect Info Bridge Poker Scrabble

Tic-Tac-Toe • Tic Tac Toe is one of the classic AI examples. Let's play some. • Tic Tac Toe version 1. • http://www.ourvirtualmall.com/tictac.htm • Tic Tac Toe version 2. • http://thinks.com/java/tic-tac-toe/tic-tac-toe.htm • Try them both, at various levels of difficulty. • What kind of strategy are you using? • What kind does the computer seem to be using? • Did you win? Lose?

Problem Definition • Formally define a two-person game as: • Two players, called MAX and MIN. • Alternate moves • At end of game winner is rewarded and loser penalized. • Game has • Initial State: board position and player to go first • Successor Function: returns (move, state) pairs • All legal moves from the current state • Resulting state • Terminal Test • Utility function for terminal states. • Initial state plus legal moves define game tree.

Tic Tac Toe Game tree

Optimal Strategies • Optimal strategy is sequence of moves leading to desired goal state. • MAX's strategy is affected by MIN's play. • So MAX needs a strategy which is the best possible payoff, assuming optimal play on MIN's part. • Determined by looking at MINIMAX value for each node in game tree.

Minimax • Perfect play for deterministic games • Idea: choose move to position with highest minimax value = best achievable payoff against best play • E.g., 2-ply game:

Minimax algorithm

Properties of minimax • Complete? Yes (if tree is finite) • Optimal? Yes (against an optimal opponent) • Time complexity? O(bm) • Space complexity? O(bm) (depth-first exploration) • For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible • Even tic-tac-toe is much too complex to diagram here, although it's small enough to implement.

Pruning the Search • “If you have an idea that is surely bad, don't take the time to see how truly awful it is.” -- Pat Winston • Minimax exponential with # of moves; not feasible in real-life • But we can PRUNE some branches. • Alpha-Beta pruning • If it is clear that a branch can't improve on the value we already have, stop analysis.

α-β pruning example

Properties of α-β • Pruning does not affect final result • Good move ordering improves effectiveness of pruning • With "perfect ordering," time complexity = O(bm/2) doubles depth of search which can be carried out for a given level of resources. • A simple example of the value of reasoning about which computations are relevant (a form of metareasoning)

α is the value of the best (i.e., highest-value) choice found so far at any choice point along the path for max If v is worse than α, max will avoid it  prune that branch Define β similarly for min Why is it called α-β?

The α-β algorithm

"Informed" Search • Alpha-Beta still not feasible for large game spaces. • Can we improve on performance with domain knowledge? • Yes -- if we have a useful heuristic for evaluating game states. • Conceptually analogous to A* for single-agent search.

Resource limits Suppose we have 100 secs, explore 104 nodes/sec106nodes per move Standard approach: • cutoff test: e.g., depth limit (perhaps add quiescence search) • evaluation function = estimated desirability of position

Evaluation function • Evaluation function or static evaluator is used to evaluate the “goodness” of a game position. • Contrast with heuristic search where the evaluation function was a non-negative estimate of the cost from the start node to a goal and passing through the given node • The zero-sum assumption allows us to use a single evaluation function to describe the goodness of a board with respect to both players. • f(n) >> 0: position n good for me and bad for you • f(n) << 0: position n bad for me and good for you • f(n) near 0: position n is a neutral position • f(n) = +infinity: win for me • f(n) = -infinity: win for you DesJardins: www.cs.umbc.edu/671/fall03/slides/c8-9_games.ppt

Evaluation function examples • Example of an evaluation function for Tic-Tac-Toe: f(n) = [# of 3-lengths open for me] - [# of 3-lengths open for you] where a 3-length is a complete row, column, or diagonal • Alan Turing’s function for chess • f(n) = w(n)/b(n) where w(n) = sum of the point value of white’s pieces and b(n) = sum of black’s • Most evaluation functions are specified as a weighted sum of position features: f(n) = w1*feat1(n) + w2*feat2(n) + ... + wn*featk(n) • Example features for chess are piece count, piece placement, squares controlled, etc. • Deep Blue (which beat Gary Kasparov in 1997) had over 8000 features in its evaluation function DesJardins: www.cs.umbc.edu/671/fall03/slides/c8-9_games.ppt

Cutting off search MinimaxCutoff is identical to MinimaxValue except • Terminal? is replaced by Cutoff? • Utility is replaced by Eval Does it work in practice? For chess: bm = 106, b=35  m=4 4-ply lookahead is a hopeless chess player! • 4-ply ≈ human novice • 8-ply ≈ typical PC, human master • 12-ply ≈ Deep Blue, Kasparov

Deterministic games in practice • Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used a precomputed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions. • Chess: Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply. • Othello: human champions refuse to compete against computers, who are too good. • Go: human champions refuse to compete against computers, who are too bad. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves.

Games of chance • Backgammon is a two-player game with uncertainty. • Players roll dice to determine what moves to make. • White has just rolled 5 and 6 and has four legal moves: • 5-10, 5-11 • 5-11, 19-24 • 5-10, 10-16 • 5-11, 11-16 • Such games are good for exploring decision making in adversarial problems involving skill and luck. DesJardins: www.cs.umbc.edu/671/fall03/slides/c8-9_games.ppt

Decision-Making in Non-Deterministic Games • Probable state tree will depend on chance as well as moves chosen • Add "chance" notes to the max and min nodes. • Compute expected values for chance nodes.

Game Trees with Chance Nodes • Chance nodes (shown as circles) represent random events • For a random event with N outcomes, each chance node has N distinct children; a probability is associated with each • (For 2 dice, there are 21 distinct outcomes) • Use minimax to compute values for MAX and MIN nodes • Use expected values for chance nodes • For chance nodes over a max node, as in C: • expectimax(C) = ∑i(P(di) * maxvalue(i)) • For chance nodes over a min node: • expectimin(C) = ∑i(P(di) * minvalue(i)) Min Rolls Max Rolls DesJardins: www.cs.umbc.edu/671/fall03/slides/c8-9_games.ppt

Meaning of the evaluation function A1 is best move A2 is best move 2 outcomes with prob {.9, .1} • Dealing with probabilities and expected values means we have to be careful about the “meaning” of values returned by the static evaluator. • Note that a “relative-order preserving” change of the values would not change the decision of minimax, but could change the decision with chance nodes. • Linear transformations are OK DesJardins: www.cs.umbc.edu/671/fall03/slides/c8-9_games.ppt

Summary • Games are fun to work on! • They illustrate several important points about AI • perfection is unattainable  must approximate • good idea to think about what to think about

Adversarial Search