280 likes | 497 Vues
Game Theory Sequential bargaining and Repeated Games. Univ. Prof.dr. M.C.W. Janssen University of Vienna Winter semester 2010-11 Week 46 (November 14-15). Sequential Bargaining. Ultimatum game is a sequential bargaining game with one round. SPE we know
E N D
Game Theory Sequential bargaining and Repeated Games Univ. Prof.dr. M.C.W. Janssen University of Vienna Winter semester 2010-11 Week 46 (November 14-15)
Sequential Bargaining Ultimatum game is a sequential bargaining game with one round. SPE we know Consider then a sequential bargaining game with two rounds and alternating offers, and players discounting future pay-off with δ. SPE pay-offs are (1-δ, δ) Player 2 can propose to keep everything in last round and this will be accepted. Thus, by refusing in the first round he can guarantee himself δ Player 1 should give him at least δ in first round if 2 is about to accept; he can get at most 1-δ 2
Alternating offers (Rubinstein, Stahl) Same stage game, but repeated infinitely often. What are equilibrium profits? Define v (v*) as lowest (highest) pay-off you can get if you make an offer Because of infinite horizon and equal discount factors, period 1 analysis is the same as period 2 analysis v≥1- δv*: lowest pay-off player 1 can guarantee himself is remaining of highest discounted pay-off player 2 can guarantee himself in the next round v* ≤1- δv : highest pay-off player 1 can guarantee himself is remaining of lowest discounted pay-off player 2 can guarantee himself in the next round v ≥1/(1+δ) and v* ≤ 1/(1+δ). Hence, equalities have to hold Player 1 is better off as he makes first proposal, but advantage disappears when δ gets close to 1. Intuitive First offer such that it is immediately accepted! Why to bother about rest of the game? Unique subgame perfect equilibrium strategies 3
What if δ’s differ across players Period 1 analysis is similar to period 3 analysis, but not anymore to period 2 analysis Define vi (vi*) as lowest (highest) pay-off player i can get if she makes an offer v1≥ 1- δ2v2*; by symmetry, the same thing holds for player 2. v1* ≤ 1- δ2v2; by symmetry, the same thing holds for player 2. v1≥ (1- δ2)/(1- δ1δ2) and v*1≤ (1- δ2)/(1- δ1δ2) Hence, equalities have to hold; additional advantage for player with highest δ. 4
Notation in repeated games • Define history of play as follows. • Let a0 = (a01 ,a02 ,…,a0n) be the action profile that is played in stage 0, i.e., the actions played by all players • History at the beginning of period 1, h1 = a0 • History at the beginning of stage t+1, ht+1 = (a0,…,at) • The set Ht is the set of all possible histories ht and Ai(ht) is the set of actions that player i can choose after history ht and Ai(Ht) is the union of this set over all possible histories • Strategy σi of player i is a sequence of mappings {σki} where each σki maps Hk to mixed actions. • Note that you cannot condition on the random events
Subgame perfection and the one-stage deviation principle in finitely repeated games One stage deviation principle: No player can deviate by deviating in a single period and then returning back to the (equilibrium) strategy There is no player i and strategy s’(i) that is equal to s*(i) apart from the action in one period given one history h, such that ui (s’i ,s*-i ) > ui (s*i ,s*-i ) given that history h Prop. In finite horizon games, a strategy combination s* is a SPE if, and only if, it satisfies “one stage deviation principle”. Only if: clear, otherwise there is an immediate violation of SPE definition If: suppose to the contrary, s* satisfies the principle but is not SPE. Then there is a stage t and a history ht s.t. at least one player has a strategy s’i(ht)≠s*i(ht) and s’i(ht) is a better response. Continuation next slide 6
Proof one stage deviation principle Let t’ be the last period in which s’i(ht’)≠s*i(ht’) Because of the one-stage-deviation principle t’ > t Period t’ is defined such that for all t” > t’ s’i(ht”)=s*i(ht”) Define then another strategy sI that is such that it coincides with s’I up to t’ and coincides with s*I at t’ and afterwards. Because of the one-stage-deviation principle and since s’i(ht”)=si(ht”)for all t” > t’, si is as good a response given history ht If t’ = t+1, then si only differs in one period from s*, and therefore the one stage deviation principle implies that si cannot be strictly better If t’ > t+1, similar argument applies (details page 109) 7
Additional equilibria in repeated games Main interest in repeated games is what type of equilibrium outcomes can be supported that cannot be supported in a static game Repetition of static equilibrium is always an equilibrium in a repeated game; not so interesting Thus, what else? Consider an example 8
Multiple Equilibria Nash Equilibria 10
Can non-Nash outcomes of the static game be supported in equilibrium if the game is repeated 2 times? 11
Last period analysis • In the last period they cannot choose for (U,L) • As both firms have an incentive to “cheat” as 16 is a higher pay-off than 12 • Punishment is not possible (as it is the last period) 12
First-period analysis • But: in the first period they can choose for (U,L) • Strategy: • - Choose “U (L) ” in period 1 • - Choose “M (C)” in period 2 when other chooses “L (U)” in period 1 • Choose “B (R)” in period 2 when other chooses somthing else in period 1 • Punishment is part of strategy • Is this an equilibrium? Is it a SPE 13
Pay-offs in infinitely repeated games • Overall pay-offs ui; stage game pay-offs gi, continuation pay-off from period t onwards • Want to have an expression where one can easily compare stage game pay-offs and repeated game pay-offs, i.e., normalisation: • Time averaging is sometimes used for the case of complete patience
Folk Theorem I • If players are sufficiently patient, then any feasible, individually rational pay-offs can be enforced by an equilibrium • Individually rational pay-offs: minimax pay-off • vi = • mji is action player j chooses to minimax player i • Feasible pay-offs is the convex hull V of the static game pay-offs, i.e., V = convex hull {v / there is an a A such that g(a) =v} • Both terms need some explanation
Minimax pay-offs • What are the Nash equilibria of this game? • Denote by q the probability player 2 chooses L • In a mixed strategy eq ⅓≤q≤⅔, pay-offs 0 and 1 • Minimax for player 1 • u(U) = -3q+1 • u(M) = 3q-2 • U(D) = 0 • Minimax is 0 • Minimax for player 2 is also 0 • By 1 choosing (½,½,0) • Thus, minimax pay-offs can be lower than Nash eq. pay-offs
Feasible pay-offs • Equilibrium pay-offs are (2,1), (1,2) and (⅔, ⅔) • Convex hull of eq. pay-offs is triangle connecting the three points (also e.g. (1½,1½)) • V connects (2,1), (1,2) and (0, 0) • But (1½,1½) cannot be obtained by independent mixing, only as correlated eq • Correlated mixing can happen in repeated setting by alternating between playing two equilibria (and time averaging pay-offs or δ close to 1) Eq. pay-offs
Folk Theorem II • Prop. For every feasible pay-off vector v with vi> vi, there exist a δ < 1 such that for all δ > δ there exist a Nash equilibrium of the infinitely repeated game with pay-off v. • Pay-offs in repeated game cannot only be larger, but also smaller than static Nash eq pay-offs!! • Basic idea: if players are sufficiently patient, then any finite gain in a one period deviation is nothing compared to a small, but permanent loss in future pay-offs (punishment by minimaxing a player)
“Proof” “Nash Folk Theorem” • Consider feasible pay-off v and action profile g(a)=v • If there is no action profile a that yields v, you may choose a sequence of actions such that v is (close to) average (discounted) pay-offs (or a public randomization) • Consider strategy: start by playing ai; play ai as long as others do, if one player j deviates minimax him forever, i.e., choose mji • Deviation in period t yields normalised pay-off • which is smaller than vi if δ is larger than δi, where δi solves
Is the threat of Minimaxing credible? • If we restrict analysis to static “Nash threats”, then Friedman shows that only pay-offs larger than the static Nash equilibrium pay-offs can be supported • Others show that in games where the minimax pay-offs are lower than the static equilibrium pay-offs, even worse outcomes can be compatible with a SPE of the infinitely repeated game.
Basic idea of SPE with minimax pay-offstime averaging • After a deviation, play the minimax pay-off for N periods, where N is chosen for all players s.t. • After N periods return back to “cooperative” mood • (finite) N ensures that no player has an incentive to deviate • Cost of punishment is extremely small as with time averaging pay-offs in a finite number of periods “do not make a difference” • Average pay-off to player j when i is punished is vj
Basic idea of SPE with minimax pay-offsdiscounted pay-offs • Previous strategies (for time averaging pay-offs) do not work as it may be that minimaxing another player gives a player a lower pay-off than his own minimax pay-off. • Reward punishers, instead of punishing them if they don’t punish • Choose a vector in the interior of V such that for each i you can still give a higher pay-off. • V needs to be of “full dimension” • Play in three phases: • Initial cooperative phase • Punishment phase where players minimax for N(j) periods the deviator j (as before); • switch to punishment phase for player i if i deviates in one of the N(j) periods. • Reward phase after the punishment phase is fully completed
Renegotiation proofness in repeated games • Is SPE the best notion of a credible threat? • Suppose you cooperate for some time in the PD and then someone defects, by chance. Should you go back immediately to always defect? • Or should players “renegotiate”? • It is in both players interest to revert back to the cooperative outcome • In any subgame the equilibrium played must not be Pareto-dominated. • Pareto-optimality as an assumption and the critique that is possible (risk dominance and Pareto-dominance) • Deviations are accidents and unlikely to be repated? “Bygones are bygones”
Pareto perfection only applies in two-player games A • Two Nash equilibria in pure strategies: (U,L,A) and (D,R,B) • ULA is Pareto-efficient • Natural candidate? • Suppose players 1 and 2 expect matrix chooser to choose A. Then they can renegotiate and gain by playing (D,R) B
Definition of Pareto perfect equilibrium • Fix stage game g and play it for T periods. • Let P(T) the set of pay-offs of pure strategy SPE of G(T) • R(t) is the set of strongly efficient points of P(t), i.e., this is the set of points such that there is not another pay-off point where no player is worse off and some player is better off. • Set Q(1) = P(1) • For any t, let Q(t) be the set of pay-offs of pure strategy SPE that can be enforced with continuation pay-offs in R(t-1) • A SPE is Pareto perfect if for every possible history and in every time period t, the continuation pay-offs are in R(T-t)
Pareto perfection restricts threats • Some efficient equilibria cannot be supported anymore under Pareto-perfection • It restricts the set of threats, and thereby it is more difficult to keep players on the equilibrium path • Example
Example Pareto-perfection • Three pure strategies in G(1) with pay-offs (4,1), (1,4) and (3,3) • In G(2) without discounting pay-off of 8 is possible. Unique element in R(2) • Without restriction to Pareto perfection in G(3) pay-off of 13 possible • With Pareto perfection in first period of G(3) no threat possible; one has to play stage game equilibrium • Equilibrium play alternates between odd and even periods under Pareto perfection