290 likes | 302 Vues
Multiagent Systems. Extensive Form Games. Extensive Form Games. Normal form games don’t allow to represent sequentiality of decisions of the agents Multiple sequential decisions of an agent temporal structure of multiagent decisions Extensive form games provide
E N D
Multiagent Systems Extensive Form Games
Extensive Form Games • Normal form games don’t allow to represent • sequentiality of decisions of the agents • Multiple sequential decisions of an agent • temporal structure of multiagent decisions • Extensive form games provide • Explicit representation of temporal structure/protocol of decisions • Explicit representation of multiple sequential decisions by an agent
Extensive Form Games • To capture different amounts of information available in different scenarios, there are two main variants of extensive form games • Perfect Information Games • Each player knows the current state in the decision making sequence and is aware of all decisions that the other agents have made • Imperfect Information Games • Different parts of the game look identical to the agent and it can not decide which of them it is in • In extensive form games the decision making sequence is represented as a decision tree
Perfect Information Games • A perfect information game in extensive form is defined as: • N is the set of n agents • A is the set of actions • H is the set of non-terminal choice (decision) nodes • Z is the set of terminal nodes • χ: H→2Aindicates all actions available to the agent in a node • ρ:H→N indicates which agent makes decisions in a given node • σ:H✕A→H U Z is the successor function indicating the next node in the game • u=(u1,…un) is the vector of utility functions for each player
Perfect Information Games • The sharing game in extensive form • Two siblings receive two presents • One sibling decides how to share them • The second sibling decides whether to accept the shares or to decline the presents 1 0-2 2-0 1-1 2 2 2 no yes no yes no yes (0,0) (1,1) (0,0) (0,2) (2,0) (0,0)
Pure Strategies in Perfect Information Games • A pure strategy for agent i in a perfect information game is a complete specification of the (deterministic) actions the agent will take in each decision node associated with the agent • Strategies have to include action choices even for nodes that can not be encountered under the strategy
Pure Strategies in Perfect Information Games • Pure strategies for agent 1: (2-0), (1-1), (0-2) • Pure strategies for agent 2: (yes,yes,yes), (yes,yes,no), (yes,no,yes), (yes,no,no), (no,yes,yes), (no,yes,no), (no,no,yes), (no,no,no) 1 0-2 2-0 1-1 2 2 2 no yes no yes no yes (0,0) (1,1) (0,0) (0,2) (2,0) (0,0)
Pure Strategies in Perfect Information Games 1 • Pure strategies for agent 1: (A,H), (A,I), (B,H), (B,I) • Note: (A,H) and (A,I) are pure strategies even though the decision between H and I after A never has to be taken • Pure strategies for agent 2: (C,E,G), (C,F,G), (D,E,G), (D,F,G) B A 2 2 D E F C 2 1 (1,1) (5,2) H I G (3,2) (2,1) (1,0)
Strategies and Equilibria • Solution strategies can be defined as in normal form games: • Mixed strategies are defined by a probability distribution over pure strategies • Best responses for agent i are strategies that lead to optimal utilities in the context of the strategies of the other agents • A Nash equilibrium is a strategy profile in which each agent’s strategy is a best response to the other agents’ strategies in the profile
Nash Equilibria in Perfect Information Extensive Form Games • Every perfect information game in extensive form has a pure strategy Nash equilibrium • Since the agents make decisions sequentially and are aware of all prior decisions, random decisions making can not hide the actual outcome and therefore reduce to a deterministic action choice. • Every perfect information game in extensive form can be converted into normal form • The reverse is not true since extensive form requires knowledge of prior, sequential decisions
Induced Normal Form 1 • Pure strategies for agent 1: (A,H), (A,I), (B,H), (B,I) • Note: (A,H) and (A,I) are pure strategies even though the decision between H and I after A never has to be taken • Pure strategies for agent 2: (C,E,G), (C,F,G), (D,E,G), (D,F,G) B A 2 2 D E F C 2 1 (1,1) (5,2) H I G (3,2) (2,1) (1,0)
Induced Normal Form 1 • Pure strategy Nash equilibria: • (A,G),(C,F) • (A,H),(C,F) • (B,H),(C,E) B A 2 2 D E F C 1 (3,8) (8,3) (5,5) G H (2,10) (1,0)
Induced Normal Form • Using the induced normal form, all techniques from normal form games can be used • Extensive form is more compact than induced normal form • More utility values have to be represented • Some of the Nash equilibria are counterintuitive • E.g. (B,H),(C,E) -Why would agent 1 ever play H ? • H is a threat for player 2 not to play F • Is this threat credible ?
Subgames and Subgame Perfect Equilibria • A subgame of a game in extensive form is defined by a subtree rooted in a nodein H • A subgame perfect equilibrium is a Nash equilibrium for which its restriction to the nodes in any subgame is also a Nash equilibrium • Nash equilibria with non-credible threats are not subgame perfect • Every perfect information game in extensive form has at least one subgame perfect Nash equilibrium
Computing Subgame Perfect Equilibria • Backward induction can be used to compute a subgame perfect equilibrium for n-player general-sum games • Starting with the smallest subgames, propagate the vector containing the maximum utility for the particular decision agent to the root of the subtree • For the equilibrium strategy the agents take the best action (the one that links to the maximum value) at each node • In zero-sum games this is the common minimax algorithm
The Centipede Problem 1 • Subgame perfect equilibrium: (E,E,E),(E,E,E) • The outcome of this strategy profile is pareto dominated by all but one other outcome C E 2 C E (1,0) 1 C E (0,2) 2 C E (3,1) 1 C E (2,4) (3,5) (4,3)
Imperfect Information Games • Imperfect information games handle situations where agents do not have complete knowledge of the stage of the game or the decision the other agents are taking • Imperfect information is modeled by associating nodes in the decision tree to information sets • Different nodes in the same information set can not be distinguished • Unknown actions of other agents or incomplete knoweldge of the stage of the game lead to non-distinguishable nodes
Imperfect Information Game • An imperfect information game in extensive form is defined as: • N, A, H, Z,χ, σ, u define a perfect information game • I=(I1,…In) is the vector of the information sets Ii of agent i defining the sets of indistinguishable nodes for this agent • Ii = (Ii,1,…,Ii,ki) is a partition of the nodes assigned to agent i where nodes in the same partition (equivalence class) are indistinguishable for agent i
Pure Strategies in Imperfect Information Games • A pure strategy for agent i in an imperfect information game is a complete specification of the (deterministic) actions the agent will take in each information class • Strategies have to include action choices even for information classes (and thus nodes) that can not be encountered under the strategy
Imperfect Information Game • Prisoners’ Dilemma 1 S C 2 2 S C S C (-1,-10) (-10,-1) (-5,-5) (-3,-3) • Pure strategies for agent 1: (C), (S) • Pure strategies for agent 2: (C), (S)
Strategies and Equilibria • All solution strategies can be defined as in perfect information games • As in perfect information games, every imperfect information game can be converted into a normal form game • Every normal form game can be converted into an imperfect information game • Simply put all nodes for player 2 into the same information class
Randomized Strategies • In imperfect information games we can define a second way to generate randomized strategies • Mixed strategies: randomization over pure strategies • Behavioral strategies: strategies containing independent randomization over the actions in each information set
Mixed and Behavioral Strategies 1 • Mixed strategy example for agent 1: • (0.6:(A,G); 0.4:(B,H)) • Behavioral strategy example for agent 1: • ([0.5:A;0.5:B],[0.3:G;0.7:H]) B A 2 2 D E F C 1 (3,8) (8,3) (5,5) G H (2,10) (1,0)
Randomized Strategies • Expressive power of mixed and behavioral strategies are noncomparable • In some games there are outcomes that can be achieved using mixed strategies but not using behavioral strategies • In some games there are outcomes that can be achieved using behavioral strategies but not using mixed strategies
Behavioral Strategy Example 1 R L 1 2 R U D L • Pure strategies: • Agent 1: (L), (R); Agent 2: (U), (D) • Mixed strategy equilibrium: • R,D • Behavioral strategy equilibrium: • [98/198:L;100/198:R],D (100,100) (5,1) (1,0) (2,2)
Perfect Recall • A player in an imperfect information game has perfect recall if he does not forget anything he knew about moves made so far • For every path to two nodes in the same information set for player i, the node sequence leading to the nodes has to be representable by a unique sequence of information classes and for each node sequence, the actions taken by agent i have to be the same as the corresponding ones in any other path
Perfect Recall • Formally: for any two nodes h, h’in the same information class, for every path h0,a0,…hn,an,h and h0,a’0,…h’m,a’m,h’ • m=n • hjand h’jare in the same information class for player i • For all j, ρ(hj)=i → aj=a’j • A game of perfect recall is an imperfect information game in which every agent has perfect recall
Games of Perfect Recall • (Kuhn, 1953): In a game of perfect recall, any mixed strategy of a given agent can be replaced by an equivalent behavioral strategy, and any behavioral strategy can be replaced by an equivalent mixed strategy. • In games of perfect recall, Nash equilibria can be found in the form of behavioral strategies
Equilibria for Games of Perfect Recall • Convert the game to normal form and solve for the game. • Exponential complexity in the normal form game size • In games of perfect recall we can use the sequence form to accelerate the solution by avoiding the increase in the size of the game when converting to normal form • Instead of strategies, use the action sequences of the agents on the path to a terminal node and realization probabilities (representing the probabilities of reaching the terminal nodes under the strategy) • Zero-sum games can be solved in time polynomial in the size of the extensive form game. • General-sum games can be solved in time exponential in the size of the extensive form game