1 / 67

Understanding Learning in Games Theory

Explore the application of fictitious play and Cournot adjustment in game theory through speaker Tzur Sayag's analysis of PM Sharon's peace process decisions over specific events. Dive deep into common models of learning in games, sophisticated player behavior, and replicator dynamics.

gdell
Télécharger la présentation

Understanding Learning in Games Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fictitious PlayThe Theory of Learning in GamesD. Fudenberg and D. Levine Speaker: Tzur Sayag 03/06/2003

  2. Do you believe that PM Sharon is serious about the peace process? • A voter has to decide if he should support PM Sharon • Belief: Sharon will never evacuate settlements • Action: Vote against the new economics revolution. • May 24: Sharon announces “occupation is no-good” • Belief: Sharon will probably never evacuate settlements • Action: Vote against the new economics revolution • Jun 5: Sharon meets Abu-Mazen and declares support for a Palestinian state. • Belief: Seems like Sharon might evacuate the settlements after all • Action: Vote for the new economics revolution.

  3. Roadmap • Introduction to the common models of learning in games • Cournot adjustment • Fictitious play and Nash equilibriums • Motivation • Definitions • Results • Generalizations of fictitious play – if we have time

  4. Notations P1 gets a1 and p2 gets b1 if they play Action1,Action1 respectively Player 2 Player 1

  5. Learning in Games - 1 • Repeated games – same or related • fixed-player model • Teach the opponent to play a best response to a particular action by repeating it over and over.

  6. Being Sophisticated – Example • D is dominant for Bob. • If Alice learns Bob only plays D, game converges to <D,L> • Bob’s payoff for <D,L> is 2. • If Bob is patient, he can play U always and just wait for a while • If Bob always plays U, • Alice who thought Bob’s gonna play D should shift its play from L to R (since R was only good when Bob actually played D) • So Bob plays constant U which leads Alice to play constant R with payoff 2 > 1. • in this case Bob gets 3 which is better. • Bingo! Alice Bob

  7. Being Sophisticated – Abstracting • Most learning theory rely on models in which the incentive is small to alter the future play of the opponent. • Locked in for 2 periods • Large anonymous population • Embed a two player game by pairing players randomly from a large population.

  8. Models of Embedding • Single-pair model • random single pair, actions revealed to everyone • Aggregate static model • all players randomly matched, aggregates outcomes revealed to everyone • Random-matching model • all players randomly matched, each player sees his game outcome only

  9. Three common models of Learning • Fictitious play • Players observe only their own matches and play a best response to the frequencies. • Partial best-response • A fixed portion switches each period from its current action to a BR to the aggregate stats from the previous period. • Replicator Dynamics • The share of the population using each strategy grows proportionally to that strategy’s current payoff.

  10. Cournot Adjustment – a flavor of analysis • Two firms 1 and 2. • Strategy: choose a quantity siє[0,∞) • Strategy profileis <si, s-i>єS • Utility for i is ui(<si, s-i>) • Assume ui(<., s-i>) is strictly convex • BR(s-i) = argmax ( ui(<x, s-i>) ) xєS BR is unique since u is concave so the relevant u’’ is positive, this means that u’ is a monotone increasing function which means it has at most one zero which means, yes, you guessed it right U only has one extreme point and the max is therefore unique. u can’t be fixed since it is STRICTLY concave by assumption

  11. Cournot Adjustment Model • time periods t = 0,1,2,…, discrete • State profile θ0 єS • in each period the player chooses a pure strategy that is BR to the previous period • Formally i chooses st=BR(s-it-1)

  12. Cournot DynamicsReaction Curve BR1 For every θ2 the line states the BR of player 1 against it. The value for player 1 is the “height” at point θ2 θt= (θt1 , θt2) θ2 Can you convince yourself this point is a Nash? θt+1 θt2 BR2 θ1 θt1 New BR if 2 plays θt2

  13. Cournot Dynamics • A movement between profiles such that • θt+1 = f(θt) , fi(θt) = BRi(θt-i) • A steady state is θs s.t. θs = f(θs) • Once θt= θs the system remains there • Claim (simple) θs is a NASH • Proof: by definition for every player θs=BRi(θ-i), so players don’t want to move. • SO: EVERY STEADY STATE IS A NASH EQUILIBRIUM

  14. Cournot Dynamics – oblivions to linear transformation • Proposition 1.1: Suppose u’i(s)=a·ui(s) + vi(s-i) for all players I, Then u’ and u are best-response equivalent • Proof: • vi(s-i) is dependent on the opponent’s play so it does not change the “magnitude” order (“seder”) of my actions • Multiplying all payoffs by the same constant a has no effect on the order • So, a transformation that leaves preferences, and consequently best responses, will give rise to the same dynamic learning process.

  15. Cournot Dynamics and Zero sum Games • Recall: payoffs in ZSG add to zero. • Proposition 1.2: every 2 x 2 game for which the best response correspondences have a unique intersection that lies in the interior of the strategy space is best-response equivalent to a zero-sum game. • Proof: given G, a 2x2 game, with unique intersection, • w.l.o.g. assume 1) A is BR for player 1 against A 2) B is BR for player 2 against A • If A was also a BR for player 2 then <A,A> is a BR correspondence at a pure profile which contradicts our assumption.

  16. Cournot Dynamics and Zero Sum Games 2 • Proof outline: Given G, the 2x2 game with unique intersection, webuild a zero sum game that has the same Best Responses. Observe the following zsg. • If a<1 then BR1(A)=A since u1(A,A) = 1 but u1(A,B) is “only” a • If a<1 then BR2(A)=B since u2(A,B) = 0 but u1(A,A) is “only” 0 • Denote σi player i’s probability to play A • Claim 1: player 1 is indifferent between A and B if, σ2 =a · σ2 + b · (1- σ2) • Claim 2: player 2 is indifferent between A and B if, σ1 + a ·(1-σ1) = b · (1- σ1)

  17. Proof of:player 1 is indifferent between A and B if, σ2 =a ·σ2 + b · (1- σ2) • Assumeσ2 = a · σ2 + b · (1- σ2) (*) (σ2 is the prob. 2 • (1) If player 1 plays A he (1) gets: plays A) • u1(A,?) = σ2 · (u1(A,A)) + (1 - σ2) · (u1(A,B)) σ2 · (1) + (1- σ2) · (0) = σ2 (by the game table) • (2) If player 1 plays B he gets: • u1(A,?) = σ2 · u1(B,A) + (1 - σ2) · u1(B,B) σ2 · (a) + (1- σ2) · b (by the game table) • So if (1) = (2) he does not care which to choose, (1) = (2) σ2 = σ2 · (a) + (1- σ2) · b as required. • Proof of claim 2 regarding 2’s indifference follows the same path.

  18. Mental note: σi = Pr[player i playing A] Proof cont:Building the ZSG Game • 1 is indifferent between A and B ifσ2 =a · σ2 + b · (1- σ2) • 2 is indifferent between A and B ifσ1 + a ·(1-σ1) = b · (1- σ1) • Fixing an intersection point σ1, σ2 We can solve for the unknown payoffs a,b: a= (σ2 – σ1) / (1+ σ2 · σ1) Notice that (σ2 – σ1) < 1 (σi > 0otherwise i never plays A…) • (σ2 – σ1) < 1 implies a<1 (since (1+ σ2 · σ1)>1) Q.E.D.We already showed that when a< 1 it means that we get the same best responses we had in the original game G: A for player 1 against A, B for player 2 against A • To sum up: it should have been obvious that (σ1, σ2) is a Nash, the point was to find a 2x2 ZSG which has the same best responses as the original game

  19. Strategic-Form Games • Finite actions • One shot simultaneous-move games • {Players, strategy space, payoff functions} is the strategic form of a game

  20. Nash and Correlated Nash • A game can have several Nashs<A,A>, <B,B>,<(1/2,1/2), (1/2,1/2)>but the payoffs may be different.<A,A> gets 2 for each<(1/2,1/2), (1/2,1/2)> gets 1 for each. • Lets question the robustness of the mixed strategy Nash point. • Intuitively, at the mixed, players are indifferent (“in real life”) play A,B whatever…so one may believe that the other one plays A with slightly more probability. He then wants to switch to pure A so the robustness of Nash seems questionable..

  21. Nash and Correlated Nash • A Nash is strict if for each player i, si is the unique best response to s-i • Only pure strategies can be strict since if a mixed is BR than so is every pure strategy in the mixed strategy’s support otherwise there is no point of including it. • Recall: Support for a mixed strategy are the pure strategies that participate with positive probability.

  22. Some Questions in Theory of Games • When and why should we expect play to correspond to a Nash equilibrium • If there are several Nash equilibria, when one should we expect to occur? • In the previous example, in the absence of coordination, we are faced with the possibility that player 1 expects NE1=<A,A> so he plays A, the opponent might expect NE2=<B,B> and he plays B, with the results of the non-equilibrium outcome profile <A,B>

  23. The Idea of Learning based explanation of equilibrium • Intuitively, the history of observations can provide a way for the players to coordinate their expectations on one of the two pure-strategy equilibrium. • Typically, Learning models predict that this coordination will eventually occur, with the determination of which of the two eq. arise left to initial conditions or to random chance.

  24. The Idea of Learning based explanation of equilibrium • For the history to serve this coordination role, the sequence of actions played must eventually become constant or at least readily predictable by the players, of course, there is no presumption that this is always the case. • Perhaps, rather than going to a Nash, players wander around the space aimlessly, or perhaps play lies in some set of alternatives larger than the set of Nashs?

  25. The Idea of Learning based explanation of equilibrium • For the simple coordination game (symmetric <2,2>, <0,0>) there is no reason to think that any learning process will prefer one Nash over the other. • What if we alter it such that there is a better Nash. Will the players learn to play the <A,A> Nash? Altered

  26. Correlated Nash (Aumann 74) • Suppose the players have access to randomized devices that are privately viewed. • If a player chooses a strategy according to his own randomized device, the result is a probability distribution over strategy profiles, denoted μєΔ(S). • Unlike a profile of mixed strategies which is by definition uncorrelated, such a distribution may be correlated.

  27. Correlated Nash – Jordan’s matching pennies • 3 players. • Each chooses H or T • Payoffs are +1 or -1 only • 1 wins if he matches 2 • 2 wins if he matches 3 • 3 wins by not matching 1 • This game has a unique NE, each play (1/2,1/2) • However:It has many correlated NE. Player 3 plays H Player 3 plays T

  28. Correlated Nash – Jordan’s matching pennies Player 3 plays H • C-NE: unified distribution over these 6 profiles:(H,H,H) (H,H,T) (H,T,T) (T,T,T) (T,T,H) (T,H,H) • Each player has 50% to play H. • No weight is placed on (H,T,H), so the play of the players is not independent (it is correlated) • For Player 1: When he plays H he faces 1/3 chance each of his opponents play (H,H), (H,T),(T,T). Since his goal is to match 2, he wins 2/3 of the times by playing H and only one third if he plays Y. similarly if he plays T his opponents might only play (T,T), (T,H), (H,H). Now tails win 2/3 of the times as against heads which wins only 1/3 of the time. So he is evened. He is at a Nash. Player 3 plays T

  29. Why is Correlated Nash of Significance? • Hint – Cycles create correlation between profile strategies. • Informally a cycle is a finite sequence of profiles of length k such that s0=sk. • Cournot play can exhibit cycles – example follows. • So cycles => correlation => correlated Nash

  30. Cournot Cycle - matching pennies. • 3 player (head, tail) • 1 wants to match 2. • 2 wants to match 3. • 3 wants to un-match 1. • [Cournot:] means each player assumes his opponents play the same as in their last step

  31. Roadmap • Introduction to the common models of learning • Cournot adjustment • Fictitious play and Nash equilibriums • Motivation • Definitions • Results • Generalizations of fictitious play

  32. Fictitious play - Introduction • Motivation… • Repeated game, stationary assumption…. • Each player forms a belief of his opponents “strategy” by looking at what happened • Player plays Best Response according to his/her belief

  33. Two-Player Fictitious Play - notations • S1 and S2 are finite actions spaces for players one and two respectively. • S1 = {■,●,▲} • S2 = {♥,●,♦} • u1, u2 – player payoff functions • u1(■, ♥)=15 • for mixed strategy we take • u1(<½,½>,<¼, ¾> )= u1( • u1(■,<¼, ¾> )=¼ u1(■, ♥)+¾ u1(■, ♦) • Player is pi, opponent is p-i i={1,2}

  34. Two-Player Fictitious Play • Notion of belief • A prediction of the opponent action distribution the degree to which 1 believes 2 will play ● • Assume players choose their actions for each period to maximize their expected payoff, with respect to their belief for the current period.

  35. Two-Player Fictitious Play – Forming Beliefs • Player i starts with a weight function K0i • K0i : S-i → + • For example: • K0i:{▲,■,●} → + • K0i(■)=4 • As the game is iteratively repeated K is updated

  36. Two-Player Fictitious Play – Belief update • If some action say ■ was played (by the opponent!) the last time, we add 1 to it’s count, generally: • Kt (s-i) = 1 if s-it-1 = s-i 0 otherwise • That’s a complicated way of saying that K(s) simply counts the number of times the opponent played s.

  37. Two-Player Fictitious Play – Using frequencies to form beliefs • Given K the frequency vector, • Each player forms a probability vector  over his opponent’s actions • His belief can be said to be that the • Pr[i plays ■] = Kt(■) / #steps • Simple normalization Reads: the belief player i holds at time t regarding the probability of his opponent to plays s-I in time t

  38. My belief is that my opponent plays ♥ with probability ½, ,● with prob ¼ and ¼ ♦, looking at my payoff table, by playing ■ I can max the utility Two-Player Fictitious Play – Using frequencies 2 • We now have a belief of how the opponent plays. • A FP is any rule ρit which assigns a Best Response action to the belief it • Example: • ρ1(<½,¼,¼>) = ■ (extend naturally to mixed) • This implies that u1([■, <½,¼,¼>]) is “better” for player 1 than any other action against <½,¼,¼>

  39. Two-Player Fictitious Play – remarks • Many BR are possible for a given belief set • An example of such rules ρ may be: • Always prefer pure action over mixed action • Pick the best response for which your action index is least, (that’s the limit of my creativity) • (both of course must still be best responses)

  40. Two-Player Fictitious Play – Interpretation (page 31-32) • Bayesian inference • Player i believes opponents’ play corresponds to a sequence of i.i.d. multinomial random variable with a fixed but unknown distribution. • Player i’s prior over that unknown distribution takes the form of a Dirichlet distribution. • i’s prior and posterior belief corresponds to a distribution over the set Δ(S-i) of probability distributions over S-i • The distribution over oppnent’s strategies I t is the induced marginal distribution over pure strategies. • If beliefs over Δ(S-i) are denoted μi, then we have:

  41. Two-Player Fictitious Play – Interpretation • Denote the marginal empirical distribution as: • The assessment  is not the same as d because of the influence of i’s prior belief • This has the form of a “fictitious sample” observed before the game started. • As observations are incorporated into , it will converge to d (the empirical distribution)

  42. Two-Player Fictitious Play – Interpretation • Notes: • As long as the initial weights are positive it will stay positive • The belief reflects the conviction that the opponent strategy is constant and unknown. • It may be wrong If the process cycles. • Any finite sequence of what looks like a cycle is actually consistent with this assumption that the world is constant and those observations are a fluke • If cycles persist, we might expect i to notice it but in any case, his beliefs will not be falsified in the first few periods as they did in the Cournot process.

  43. Asymptotic Behavior – does play converges • Sufficient conditions • Proposition 2.1 • (1) if s is a strict Nash and is s is played at time t in the process of FP then s is played at all subsequent dates • (2) any pure-strategy steady state of FP must be a Nash

  44. If ŝ a strict Nash and played at time t ŝ is played at all subsequent dates • Proof: • Suppose it(players beliefs) are such that the actions are strict Nash s’. • [believe me that:] When profile ŝ is played at time t, each players belief at t+1 are a convex combination of it and a mass point on ŝ-i: it+1 = (1-αt) it + αtδ(ŝ-i) • we get:

  45. If ŝ a strict Nash and played at time t ŝ is played at all subsequent dates • We want to show that this payoff is still better than any other payoff involving it+1 • Now ŝi was a strict BR for it • Should be obvious for the first term (by assumption that it is strict BR for it). • for the second term, note that for the point mass it is obvious that ŝi is better because it implies that the profile < ŝi , ŝ-i > is a Nash which was our assumption

  46. So, what is a point mass on ŝ-i and why is it+1 a convex combination of it and of it ? • I need to show you that: it+1 = (1-αt) it + αtδ(ŝ) • For clarity, lets say t=10, there are 2 players, • Lets say ŝ={( ½, ½ ), ( ¼ , ¾ )} • S-i=S2={C,D} and look at 110+1(C) • recall 110 (C)=K10 (C) /10 (ignore prior it matters not) • Suppose that at time 10 player 2 actually played C (he played a mixed which is interpreted that ¼ of the times he would play C..) • 111 (C) = K10 (C) + 1 / 11

  47. 2nd part: Any pure-strategy steady state of FP must be a Nash • A steady state is a strategy profile that is played in every step after perhaps a finite time T. • Ideas? • If play remains at a pure-strategy profile then eventually the assessments will become concentrated at this profile. • If it was not a Nash for one of the players, him playing what he played would not be a BR, this is a contradiction to how FP works, • Since all players always play BR according to their belief. • Food for thought: Why does it not work for mixed-strategy profile?

  48. To Conclude this • we wanted to show that “if s is a strict Nash and is s is played at time t in the process of FP then s is played at all subsequent dates” • We showed it by looking at what happens to players belief and prove that the actions at given the new belief are still strict BR. • This means the system is at a steady state. • We also showed that if it is a pure-strategy steady state it is a Nash.

  49. No Pure Nash => FP can’t converge to a pure profile • Matching pennies • For example: • At time=3 player I believes that II prefers Tails, so he plays Tails to match • But II plays Heads so I adds one to Heads • Now Heads>Tails and I convinced himself II will play Heads so he switches to H • The game cycles and never converges to the Nash profile.

  50. No Pure Nash => FP can’t converge to a pure profile • If the game did “converge” it would be in a steady state that is pure and not a Nash (since matching pennies has no pure Nash) but we showed that any pure-steady state must be a Nash. • Its ok then that the game does not converge. • Interestingly, the empirical distributions over player i’s strategies are converging to ( ½ , ½ ) their product {( ½ , ½ ), ( ½ , ½ )} is a Nash.

More Related