1 / 36

Computing Nash Equilibrium

Computing Nash Equilibrium. Presenter: Yishay Mansour. Outline. Problem Definition Notation Last week: Zero-Sum game This week: Zero Sum: Online algorithm General Sum Games Multiple players – approximate Nash 2 players – exact Nash. Model. Multiple players N={1, ... , n}

bruth
Télécharger la présentation

Computing Nash Equilibrium

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing Nash Equilibrium Presenter: Yishay Mansour

  2. Outline • Problem Definition • Notation • Last week: Zero-Sum game • This week: • Zero Sum: Online algorithm • General Sum Games • Multiple players – approximate Nash • 2 players – exact Nash

  3. Model • Multiple players N={1, ... , n} • Strategy set • Player i has m actions Si = {si1, ... , sim} • Siare pure actions of player i • S = i Si • Payoff functions • Player i ui : S  

  4. Strategies • Pure strategies: actions • Mixed strategy • Player i : pi distribution over Si • Game : P = i pi • Product distribution • Modified distribution • P-i = probability P except for player i • (q, P-i ) = player i plays q other player pj

  5. Notations • Average Payoff • Player i: ui(P) = Es~P[ui(s)] =  P(s)ui(s) • P(s) = i pi (si) • Nash Equilibrium • P* is a Nash Eq. If for every player i • For any distribution qi • ui(qi,P*-i)  ui(P*) • Best Response

  6. Two player games • Payoff matrices (A,B) • m rows and n columns • player 1 has m action, player 2 has n actions • strategies p and q • Payoffs: u1(pq)=pAqtand u2(pq)= pBqt • Zero sum game • A= -B

  7. Online learning • Playing with unknown payoff matrix • Online algorithm: • at each step selects an action. • can be stochastic or fractional • Observes all possible payoffs • Updates its parameters • Goal: Achieve the value of the game • Payoff matrix of the “game” define at the end

  8. Online learning - Algorithm • Notations: • Opponent distribution Qt • Our distribution Pt • Observed cost M(i, Qt) • Should be MQt, and M(Pt,Qt) = Pt M Qt • cost on [0,1] • Goal: minimize cost • Algorithm: Exponential weights • Action i has weight proportional to bL(i,t) • L(i,t) = loss of action i until time t

  9. Online algorithm: Notations • Formally: • Number of total steps T is known • parameter: b 0< b < 1 • wt+1(i) = wt(i) bM(i,Qt) • Zt =  wt(i) • Pt+1(i) = wt+1(i) / Zt • Initially, P1(i) > 0 , for every i

  10. Online algorithm: Theorem • Theorem • For any matrix M with entries in [0,1] • Any sequence of dist. Q1 ... QT • The algorithm generates P1, ... , PT • RE(A||B) = Ex~A [ln (A(x) / B(x) ) ]

  11. Relative Entropy • For any two distributions A and B • RE(A||B) = Ex~A [ln (A(x) / B(x) ) ] • can be infinite • B(x) = 0 and A(x)  0 • Always non-negative • log is concave •  ai log bi  log  ai bi •  A(x) ln B(x) / A(x)  ln  A(x) B(x) / A(x) = 0

  12. Online algorithm: Analysis • Lemma • For any mixed strategy P • Corollary

  13. Online Algorithm: Optimization • b= 1/(1 + sqrt{2 (ln n) / T}) • additional loss • O(sqrt{(ln n )/T}) • Zero sum game: • Average Loss: v • additional loss O(sqrt{(ln n )/T})

  14. Example: Zero Sum

  15. Two players General sum games • Input matrices (A,B) • No unique value • Computational issues: • find some Nash, • all Nash • Can be exponentially many • identity matrix • Example 2xN

  16. Computational Complexity • Complexity of finding a sample equilibrium is unknown • “…no proof of NP-completeness seems possible” (Papadimitriou, 94) • Equilibria with certain properties are NP-Hard • e.g., max-payoff, max-support • (Even) for symmetric 2-player games: •  NE with expected social welfare at least k? •  NE with least payoff at least k? •  Pareto-optimal NE? •  NE with player 1 EU of at least k? •  multiple NE? •  NE where player 1 plays (or not) a particular strategy? Gilboa & Zemel, Conitzer & Sandholm

  17. Two players General sum games • player 1 best response: • Like for zero sum: • Fix strategy q of player 2 • maximize p (Aqt) such that j pj = 1 and pj 0 • dual LP: minimize u such that u  Aqt • Strong Duality: p(Aqt) = u = p u • p( u – Aq) = 0 • complementary system • Player 2: q(v- pB) =0

  18. Nash: Linear Complementary System • Find distributions p and q and values u and v • u  Aqt • v  pB • p( u – Aq) = 0 • q(v- pB) =0 • j pj = 1 and pj  0 • j qj = 1 and qj  0

  19. Two players General sum games • Assume the support of strategies known. • p has support Sp and q has support Sq • Can formulate the Nash as LP:

  20. Approximate Nash • Assume we are given Nash • strategies (p,q) • Show that there exists: • small support • epsilon-Nash • Brute force search • enumerate all small supports! • Each one requires only poly. time • Proof!

  21. Nash: Linear Complementary System • Find distributions p and q and values u and v • u  Aqt • v  pB • p( u – Aq) = 0 • q(v- pB) =0 • j pj = 1 and pj  0 • j qj = 1 and qj  0

  22. Lemke & Howson • Define labeling • For strategy p (player 1): • Label i : if (pi=0) where i action of player 1 • Label j : if action j (payer 2) is best response to p • bj p  bkp • Similar for player 2 • Label j : if (qj=0) where j action of player 2 • Label i : if action i (payer 1) is best response to q • ai q  ajq

  23. LM algo • strategy (p,q) is Nash if and only if: • Each label k is either a label of p or q (or both) • Proof! • Example

  24. Lemke-Howson: Example G1: G2: a3 a5 (0,0,1) (0,1) 1 2 (0,1/3,2/3) 4 4 2 (1/3,2/3) 1 a1 3 (2/3,1/3) 5 (1,0,0) a4 (2/3,1/3,0) (1,0) 5 3 (0,1,0) a2 U2= U1=

  25. Lemke-Howson: Example G1: G2: a3 a5 (0,0,1) (0,1) 1 2 (0,1/3,2/3) 4 4 2 (1/3,2/3) 1 a1 3 (2/3,1/3) 5 (1,0,0) a4 (2/3,1/3,0) (1,0) 5 3 (0,1,0) a2 U2= U1=

  26. LM: non-degenerate • Two player game is non-degenerate if • given a strategy (p or q) • with support k • At most k pure best responses • Many equivalent definitions • Theorem: For a non-degenerate game • finite number of p with m labels • finite number of q with n labels

  27. LM: Graphs • Consider distributions where: • player 1 has m labels • player 2 has n labels • Graph (per player): • join nodes that share all but 1 label • Product graph: • nodes are pair of nodes (p,q) • edges: if (p,p’) an edge then (p,q)-(p’,q) edge

  28. LM • completely labeled node: • node that has m+n labels • Nash! • node: k-almost completely labeled • all labeling but label k. • edge: k-almost completely labeled • all labels on both sides except label k • artificial node: (0,0)

  29. LM : Paths • Any Nash Eq. • connected to exactly one vertex which is • k-almost completely labeled • Any k-almost completely labeled node • has two neighbors in the graph • Follows from the non-degeneracy!

  30. LM: algo • start at (0,0) • drop label k • follow a path • end of the path is a Nash

  31. Lemke-Howson: Algorithm a3 a5 G1: (0,0,1) G2: (0,1) 1 2 (0,1/3,2/3) 4 4 2 (1/3,2/3) 1 a1 3 (2/3,1/3) 5 (1,0,0) a4 (2/3,1/3,0) (1,0) 5 3 (0,1,0) a2

  32. Lemke-Howson: Algorithm a3 a5 G2: G1: (0,0,1) (0,1) 1 2 (0,1/3,2/3) 4 4 2 (1/3,2/3) 1 a1 3 (2/3,1/3) 5 (1,0,0) a4 (2/3,1/3,0) (1,0) 5 3 (0,1,0) a2

  33. Lemke-Howson: Algorithm a3 a5 G1: (0,0,1) G2: (0,1) 1 2 (0,1/3,2/3) 4 4 2 1 (1/3,2/3) a1 3 (2/3,1/3) 5 (1,0,0) a4 (2/3,1/3,0) (1,0) 5 3 (0,1,0) a2

  34. Lemke-Howson: Other Equilibria a3 a5 G1: (0,0,1) G2: (0,1) 1 2 (0,1/3,2/3) 4 4 2 1 (1/3,2/3) a1 3 (2/3,1/3) 5 (1,0,0) a4 (2/3,1/3,0) (1,0) 5 3 (0,1,0) a2

  35. LM: Theorem • Consider a non-degenerate game • Graph consists of disjoint paths and cycles • End points of paths are Nash • or (0,0) • Number of Nash is odd.

  36. LM: Sketch of Proof • Deleting a label k • making support larger • making BR smaller • Smaller BR • solve for the smaller BR • subtract from dist. until one component is zero • Larger support • unique solution (since non-degenerate)

More Related