PPT - Adversarial Search PowerPoint Presentation, free download

Adversarial Search CMPT 420 / CMPG 720

Outline • Game playing • Game trees • Minimax • Alpha-beta pruning

Games vs. search problems • competitive environments: agents’ goals are in conflict • adversarial search problems (games)

Types of Games deterministic chance perfect information imperfect information

Games • deterministic, fully-observable, turn-taking, two–player, zero-sum games • Utility values at the end are equal and opposite • Tic-tac-toe

Game Search Formulation • Two players MAX and MIN take turns (with MAX playing first) • S0: • Player(s): • Action(s): • Result(s,a): • Terminal-test(s): • Utility(s,p):

Game Search Formulation • S0: initial state • Player(s): • Action(s): • Result(s,a): • Terminal-test(s): • Utility(s,p):

Game Search Formulation • S0: initial state • Player(s): which player has the move in a state • Action(s): • Result(s,a): • Terminal-test(s): • Utility(s,p):

Game Search Formulation • S0: initial state • Player(s): which player has the move in a state • Action(s): set of legal moves in a state • Result(s,a): • Terminal-test(s): • Utility(s,p):

Game Search Formulation • S0: initial state • Player(s): which player has the move in a state • Action(s): set of legal moves in a state • Result(s,a): transition model • Terminal-test(s): • Utility(s,p):

Game Search Formulation • S0: initial state • Player(s): which player has the move in a state • Action(s): set of legal moves in a state • Result(s,a): transition model • Terminal-test(s): true/false (terminal states) • Utility(s,p):

Game Search Formulation • S0: initial state • Player(s): which player has the move in a state • Actions(s): set of legal moves in a state • Result(s,a): transition model • Terminal-test(s): true/false (terminal states) • Utility(s,p): utility function defines the final value of a game that ends in terminal state s for a player p • zero-sum games: same total payoff

Game tree (1-player)

Partial Game Tree for Tic-Tac-Toe

Optimal strategies • MAX uses search tree to determine next move. • Assumption: Both players play optimally!! • Given a game tree, the optimal strategy can be determined by using the minimaxvalue of each node

Minimax • The minimax value of a node is the utility (for Max) of being in the corresponding state, assuming that both players play optimally. • Minimax(s) = • if Terminal-test(s) • if Player(s) = Max • if Player(s) = Min

Minimax • The minimax value of a node is the utility (for Max) of being in the corresponding state, assuming that both players play optimally. • Minimax(s) = • Utility (s) if Terminal-test(s) • max of Minimax(Result(s,a)) if Player(s) = Max • min of Minimax(Result(s,a)) if Player(s) = Min

Optimal Play 2 2 1 2 7 1 2 7 1 8 8 2 2 1 2 7 1 8 2 7 1 8 2 7 1 8 This is the optimal play MAX MIN

Two-Ply Game Tree

Two-Ply Game Tree The minimax decision Minimax maximizes the worst-case outcome for max.

What if MIN does not play optimally? • Definition of optimal play for MAX assumes MIN plays optimally: maximizes worst-case outcome for MAX. • But if MIN does not play optimally, MAX can do even better.

Minimax Algorithm function MINIMAX-DECISION(state) returns an action inputs: state, current state in game vMAX-VALUE(state) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  -∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s)) return v function MIN-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s)) return v

Properties of minimax • Complete? • Yes (if tree is finite) • Optimal? • Yes (against an optimal opponent) • Time complexity? • O(bm) • Space complexity? • O(bm) (depth-first exploration) • For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution is infeasible

Alpha-Beta Pruning • Problem with minimax search: exponential in the depth of the tree • Can we cut it in half? • It is possible to compute the minimax decision without looking at every node. • pruning: eliminate some parts of the tree

Alpha-beta pruning • We can improve on the performance of the minimax algorithm through alpha-beta pruning MAX MIN MAX 2 7 1 ?

Alpha-beta pruning • We can improve on the performance of the minimax algorithm through alpha-beta pruning MAX • We don’t need to compute the value at this node. • No matter what it is, it can’t affect the value of the root node. MIN MAX 2 7 1 ?

Alpha-Beta Example Do DFS until the first leaf Range of possible values [-∞,+∞] [-∞, +∞]

Alpha-Beta Example Do DFS until first leaf Range of possible values [-∞,+∞] [-∞, +∞]

Alpha-Beta Example (continued) [-∞,+∞] [-∞,3]

Alpha-Beta Example (continued) [-∞,+∞] [3,3]

Alpha-Beta Example (continued) [3,+∞] [3,3]

Alpha-Beta Example (continued) [3,+∞] [3,3] [-∞, ∞]

Alpha-Beta Example (continued) [3,+∞] [3,3] [-∞,2]

Alpha-Beta Example (continued) [3,+∞] This node is worse for MAX [3,3] [-∞,2]

Alpha-Beta Example (continued) , [3,14] [3,3] [-∞,2] [-∞, ∞]

Alpha-Beta Example (continued) , [3,14] [3,3] [-∞,2] [-∞,14]

Alpha-Beta Example (continued) , [3,5] [3,3] [−∞,2] [-∞,5]

Alpha-Beta Example (continued) [2,2] [3,3] [−∞,2]

Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [-∞,2]

α-β pruning example • Minimax(root) • = max(min(3,12,8),min(2,x,y),min(14,5,2)) • = max(3,min(2,x,y),2) • = 3

α-β pruning • We made the same minimax decision without ever evaluating two of the leaf nodes! • They are independent. • It is possible to prune entire subtrees.

α = value of the best choice found so far at any choice point along the path for max If v is worse than α, max will avoid it prune that branch Define β similarly for min Why is it called α-β?

Alpha-Beta Algorithm function ALPHA-BETA-SEARCH(state) returns an action inputs: state, current state in game vMAX-VALUE(state, - ∞ , +∞) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  - ∞ for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s,  , )) ifv ≥ then returnv  MAX( ,v) return v

Alpha-Beta Algorithm function MIN-VALUE(state,  , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v  + ∞ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s,  , )) ifv ≤ then returnv  MIN( ,v) return v

Comments: Alpha-Beta Pruning • Pruning does not affect the final results. • Entire subtrees can be pruned. • Good move ordering improves effectiveness of pruning. • With “perfect ordering,” time complexity is O(bm/2) • Alpha-beta pruning can look twice as far as minimax in the same amount of time

Adversarial Search

Presentation Transcript

Adversarial Search: Games

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search

Adversarial Search