Pruning in Artificial Intelligence: Efficient Minimax Value Computation in Game Trees

Artificial Intelligence: Representation and Problem SolvingMulti-agent Systems (2): Basic Concepts in Game Theory 15-381 / 681 Instructors: Fei Fang (This Lecture) and Dave Touretzky feifang@cmu.edu Wean Hall 4126

Recap: Pruning Fei Fang

Recap: Pruning • Alpha-Beta ( pruning): compute the minimax value of a game tree (or a specific state) with minimal exploration • During the search, at state , record the min (for MIN node) or max (for MAX node) value of its successors that have been explored • is lower bound of the minimax value for a MAX node (initialized as ) and upper bound for a MIN node (initialized as ) • During the search, at state , record the lower-bound (, initialized as ) and upper-bound (, initialized as ) of the minimax value based on what have been searched so far (not only based on its explored successors, but also the other explored branches of the tree) • As more successors of are explored, update the value of , , • Prune a subtree starting at a node if is outside of the range • For MAX player, prune if • For MIN player, prune if • and are bounds determined globally, and are bounds determined locally; If there is a conflict, it only means the local branch is useless and can be pruned Fei Fang

Recap: Pruning • : lower-bound of minimax value • : upper-bound of minimax value Fei Fang

Outline • Overview • Notations • Basic Concepts • Solution Concepts • Dominant Strategy • Nash Equilibrium (NE) • Maximin Strategy • Minimax Strategy • Minimax Theorem Fei Fang

From Games To Game Theory • Game theory is the study of strategic decision making (of more than one player) • Used in economics, political science etc. John von Neumann John Nash Heinrich Freiherr von Stackelberg Winners of Nobel Memorial Prize in Economic Sciences Fei Fang

Normal-Form Games • A game in normal form consists of • Set of players • Set of strategies • Payoffs / Utility functions • Players move simultaneously and the game ends immediately afterwards • Strategy profile , • Outcome / Utility profile • Zero-Sum Game: (Matrix form, Strategic form, Standard form) Fei Fang

Example Normal-Form Games • Prisoner’s Dilemma (PD) • Two suspects are charged with a crime • If both Cooperate: 1 year in jail each • If one Defect (rat out the other person), one Cooperate: 0 year for (D), 3 years for (C) • If both Defect: 2 years in jail each • Variation: Split or Steal https://www.youtube.com/watch?v=p3Uos2fzIJ0 Fei Fang

Example Normal-Form Games • Rock-Paper-Scissors (RPS) • Rock beats Scissors • Scissors beats Paper • Paper beats Rock Fei Fang

Some Example Games • Football vs Concert (FvsC) • Historically known as Battle of Sexes • If football together: Alex , Berry  • If concert together: Alex , Berry  • If not together: Alex , Berry  Fei Fang

Normal-Form Games • In many cases, each player has a finite set of actions , and player ’s strategy set is , i.e., the set of probability distribution over actions • Action profile , • Set of actions • Utility function can be represented as • or simply written as • Let be the probability of choosing action , then Expected utility Fei Fang

Payoff Matrix • A two-player normal-form game with finite actions can be represented by a (bi)matrix • Player 1: Row player, Player 2: Column player • Often first number is for row player, second for column player Player 2 Player 1 Player 2 Player 1 Berry Alex Fei Fang

Pure Strategy, Mixed Strategy, Support • A two-player normal-form game with finite action set , and strategy set is • Pure strategy: choose an action deterministically • Mixed strategy: choose action randomly • Support: set of actions chosen with non-zero probability • Let where is the probability of choosing the action of player , then • Pure strategy: • Mixed strategy: • Support Fei Fang

Quiz 1 • In Rock-Paper-Scissors, if , , what is ? • A: • B: • C: • D: Player 2 Player 1 Fei Fang

Best Response • Let . Action profile can be denoted as • Similarly, define and • Best Response: Set of actions or strategies leading to highest expected utility given the strategies or actions of other players • iff • iff • Theorem (Nash 1951): A mixed strategy is BR iff all actions in the support are BR • iff Fei Fang

Pareto Optimality • An outcome is Pareto optimal if there is no other outcome that all players would prefer, i.e., each player gets higher utility • An outcome is Pareto dominated by another outcome if all the players would prefer the other outcome Fei Fang

Solution Concepts in Normal-Form Games • In normal-form games, how should one player play and what should we expect all the players to play? • Dominant strategy and dominant strategy equilibrium / solution • Nash Equilibrium • Minimax strategy • Maximin strategy • Correlated Equilibrium Fei Fang

Dominant Strategy • Dominant Strategy • One strategy is always better/never worse/never worse and sometimes better than any other strategy • Focus on single player’s strategy • Not always exist strictly dominates if very weakly dominates if weakly dominates if is a (strictly/very weakly/weakly) dominant strategy if it dominates Fei Fang

Dominant Strategy Equilibrium or Solution • Dominant strategy equilibrium/solution • Every player plays a dominant strategy • Focus on strategy profile for all players • Not always exist Fei Fang

Find Dominant Strategy • Onlyneedtoenumeratepurestrategies. • Pure strategy is a strictlydominant strategy if • If a strategy is a strictly/weakly dominant strategy, it has to be a pure strategy • A mixed strategy is a very weekly dominant strategy iff all actions in its support are very weekly dominant strategies Player 2 Player 1 Fei Fang

Nash Equilibrium • Nash Equilibrium (NE) • Every player’s strategy is a best response to others’ strategy profile • Focus on strategy profile for all players • One cannot gain by unilateral deviation • Pure Strategy Nash Equilibrium (PSNE) • Mixed Strategy Nash Equilibrium • Formally • is PSNE if • is NE if Fei Fang

Nash Equilibrium • What are the PSNEs in the following games? • In FvsC, is Alex: (2/3,1/3), Berry: (1/3,2/3)a mixed strategy NE? Player 2 Alex Player 1 Fei Fang

Nash Equilibrium • Theorem (Nash 1951): NE always exists in finite games • Finite game: • NE: pure or mixed • Proof: Through Brouwer's fixed point theorem Fei Fang

Find PSNE • Find pure strategy Nash Equilibrium (PSNE) • Enumerate all action profile • For each action profile, check for each player to see if there is no incentive for this player to deviate, i.e., there exists another action of this player that lead to higher payoff, given the actions of other players • Can we do better? Player 2 Player 1 Fei Fang

Find PSNE • Strictly dominated strategies cannot be part of an NE • is strictly dominated if , , • can be a mixed strategy • Only need to check pure strategies of other players, i.e., • Such a strategy can never be BR, thus not part of NE • Weakly dominated strategy can be part of an NE • Remove strictly dominated actions (pure strategies) and then find PSNE in the remaining game • Can we do better? Player 2 Player 2 Player 1 Player 1 Fei Fang

Find PSNE • Iterative Elimination of Strictly Dominated Strategies • In each step, eliminate dominated strategies(purestrategies,i.e.,actions) from each player’s strategy space. Repeat until no more action can be removed • When the remaining game has only one action for each player, then that is the unique Nash Equilibrium of the game and the game is called dominance solvable • It may not be a dominant strategy equilibrium • When the remaining game has more than one action for some players, find PSNE in the remaining game • Order of removal does not matter Player 2 Player 1 Fei Fang

Find PSNE • If you iterative eliminate veryweakly dominated strategies, at least one equilibrium is preserved • is veryweakly dominated if , • Order of removal can matter Player 2 Player 1 Fei Fang

FindPSNE • Tosummarize • TofindallPSNE • Iterative Elimination of Strictly Dominated Strategies • Enumerateallactionsprofilesintheremaininggame,andforeachactionprofile,checkifnoneoftheplayershasincentivetodeviated • TofindaPSNE • Iterative Elimination of (VeryWeakly)Dominated Strategies • SearchforallactionsprofilesintheremaininggameuntilaPSNEisfound Fei Fang

Find All NEs (PSNE and Mixed Strategy NE) • Special case: Two player, zero-sum game • NE=Minimax=Maximin, solved by LP (will introduce later) • In practice, available solvers/packages: nashpy (python), gambit project (http://www.gambit-project.org/) • Two-player, general-sum bimatrixgame: Support Enumeration Method Fei Fang

Find All NEs • Recall: A mixed strategy is BR iff all actions in the support are BR • To find all NEs, think from the inverse direction: enumerate support • If we know in the NE, for player , action , , and are in the support of , what does it mean? • They are all BR to other player’s strategies, and therefore • 1) Action , , and are chosen with non-zero probability, and the probability of choosing them sum up to 1 • 2) Action , , and lead to the exactly same expected utility • This gives us a number of equations! • 3) The expected utility of taking action , , and is not lower than any other actions • These are necessary conditions for with support=action , , and being part of NE Fei Fang

Find All NEs • If support for Alex is (Football, Concert) and for Berry is (Football, Concert), i.e., each action is chosen with non-zero probability, then Action F and C lead to the exactly same expected utility to Alex when fixing Berry’s strategy, and Action F and C lead to the exactly same expected utility to Berry when fixing Alex’s strategy • Assume Alex’s strategy is and Berry’s strategy is then Berry Alex Now check . It is indeed a reasonable NE with the specified support. Fei Fang

Quiz 2 • What is the probability of Berry choosing Football in NE with support size=2? • A: • B: • C: • D: No such NE Berry Alex Fei Fang

Find All NEs • Support Enumeration Method (for bimatrix games) • Enumerate all support pairs with the same size for size=1 to • For each possible support pair • Compute the probability so as to (1) keep the other player indifferent among actions in the support and (2) the probability of taking actions in the support sum up to 1 • Check if the resulting probability is consistent with our assumption: all actions in the support set are chosen with non-zero non-negative probability • Check if no incentive to deviate, i.e., all other actions that are not in the support does not lead to higher expected utility Expected utility (EU) of choosing any action is the support is the same Fei Fang

Find All NEs • Support size=1 • Alex: Football, Berry: Football: is an NE • Alex: Football, Berry: Concert: is not an NE • Berry’s action Football, which is not in the support, leads to higher utility for Berry • Alex: Concert, Berry: Football: is not an NE • Alex’s action Football, which is not in the support, leads to higher utility for Alex • Alex: Concert, Berry: Concert: is an NE • Support size=2: • Alex: (Football, Concert), Berry: (Football, Concert) • , is an NE Berry Alex Fei Fang

Maximin Strategy • Maximin Strategy (applicable to multiplayer games) • Maximize worst case expected utility • Maximin strategy for player is • Maximin value for player is • Focus on single player’s strategy (Also called safety level) Fei Fang

Compute Maximin Strategy • For bimatrix games, maximin strategy can be computed through linear programming • Let be player 1’s payoff value when player 1 choose action and player 2 choose action To get , we denote where is the probability of choosing the action of player 1. Now we need to find the value of s.t. Only need to check pure strategies. Recall the theorem of BR: A mixed strategy is BR iff all actions in the support are BR Fei Fang

Compute Maximin Strategy • Convert to LP • Claim: is optimal solution for iff it is optimal solution for -- LP s.t. s.t. , s.t. , Let be the payoff matrix for player 1 (row player). Then can be rewritten in matrix form Fei Fang

Compute Maximin Strategy s.t. , Berry Alex Fei Fang

Minimax Strategy • Minimax Strategy (make sense in two-player games) • Minimize best case expected utility for the other player (just want to harm your opponent) • Minimax strategy for player is • Minimax value for player is • Focus on single player’s strategy • Can be computed through linear programming Fei Fang

Compute Minimax Strategy • For bimatrix games, maximin strategy can be computed through linear programming • Let be player 2’s payoff value when player 1 choose action and player 2 choose action . Denote where is the probability of choosing the action of player 1. Then the minimax strategy can be found through solving the following LP Fei Fang

Compute Minimax Strategy s.t. , Berry Alex Fei Fang

Minimax Theorem • Theorem (von Neumann 1928, Nash 1951): • Minimax=Maximin=NE in 2-player zero-sum games • Formally, every two-player zero-sum game has a unique value such that • Player 1 can guarantee value at least • Player 2 can guarantee loss at most • is called value of the game • All NEs leads to the same utility profile in a two-player zero-sum game Fei Fang

Summary • A game in normal form consists of • Set of players, Set of strategies, Payoffs / Utility functions • Players move simultaneously • For a bimatrix game, we expect you to be able to find: Fei Fang

Reading • Textbook Chapter 17.5 Fei Fang

Additional Resources (optional) • Online course • https://www.youtube.com/user/gametheoryonline Fei Fang

Pruning in Artificial Intelligence: Efficient Minimax Value Computation in Game Trees