330 likes | 758 Vues
Computing Nash Equilibrium. Presenter: Yishay Mansour. Outline. Problem Definition Notation Today: Zero-Sum game Next week: General Sum Games Multiple players. Model. Multiple players N={1, ... , n} Strategy set Player i has m actions S i = {s i1 , ... , s im }
E N D
Computing Nash Equilibrium Presenter: Yishay Mansour
Outline • Problem Definition • Notation • Today: Zero-Sum game • Next week: General Sum Games • Multiple players
Model • Multiple players N={1, ... , n} • Strategy set • Player i has m actions Si = {si1, ... , sim} • Siare pure actions of player i • S = i Si • Payoff functions • Player i ui : S
Strategies • Pure strategies: actions • Mixed strategy • Player i – pi distribution over Si • Game - P = i pi • Product distribution • Modified distribution • P-i = probability P except for player i • (q, P-i ) = player i plays q other player pj
Notations • Average Payoff • Player i: ui(P) = Es~P[ui(s)] = P(s)ui(s) • P(s) = i pi (si) • Nash Equilibrium • P* is a Nash Eq. If for every player i • For any distribution qi • ui(qi,P*-i) ui(P*) • Best Response
Notations • Alternative payoff • xij(P) = ui(sij,P-i) = Es~P[ui(s) | si = sij] • Difference in payoff • zij(P) = xij(P) – ui(P) • Improvement in payoff • gij(P) = max{ zij(P),0}
Fixed point Theorems • Intermediate Value Theorem • domain [a,b] • function f continuous • f(a) f(b) < 0 • exists z such that f(z)=0 • Proof: M+ = { x | f(x) 0} M- ={x | f(x) 0} • closed sets and have an intersection.
Brouwer’s Fixed point theorem • f: S S continuous, S compact and convex • There exists z in S : z = f(z) • For S=[0,1], previous theorem
Kakutani’ Fixed Point Theorem • L: S S correspondence • L(x) is a convex set • L semi-continuous • S compact and convex • There exists z: z in L(z)
Nash Equilibrium I • Best response correspondence • L(P) = argmaxQ { ui(qi, P-i)} • L is a correspondence, continuous • Nash is a fixed point of L • P* in L(P*) • Kakutani’s fixed point theorem
Nash Equilibrium II • Fixed point • K(P) has mN parameters • Kij(P) = (pij+gij(P)) / (1 + gij(P)) • Nash is a fixed point of K • P* = K(P*) • Original proof of Nash • Continuous function on a compact space • Brouwer’s fixed point theorem
Nash Equilibrium III • Non-linear complementary problem (NCP) • Recall zij(P) • For every player i and action aij: • zij(P)*pij = 0 • zi(P) is orthogonal to pi • Nash: z(P*) 0 • zij(P*) 0
Nash Equilibrium IV • Stationary point problem • Recall: x = alternative payoff • Nash: P* • For every P • (P-P*) x(P*) 0 • (pij –p*ij) x(P*) 0
Nash Equilibrium V • Minimizing a function • Objective function: • V(P) = i j [gij(P)]2 • V(P) is continuous and differentiable, non-negative function • NASH: V(P*) = 0 • Local Minima
Nash Equilibrium VI • Semi-Algebraic set • distribution P: j pij = 1 • difference in payoff: • zij(P) 0 • zij(P) = xij(P) – ui(P) 0 • Explicitly:
Two player games • Payoff matrices (A,B) • m rows and n columns • player 1 has m action, player 2 has n actions • strategies p and q • Payoffs: u1(pq)=pAqtand u2(pq)= pBqt • Zero sum game • A= -B
Linear Programming • Primal LP: • x in SETprimal is feasible • maximize <c,x> subject to x in SETprimal
Linear Programming • Dual LP: • y in SETdual is feasible • minimize <b,y> subject to y in SETdual
Duality Theorem • Weak duality: <c,x> <b,y> • for any feasible x and y • proof! • Strong Duality • If there are feasible solutions then • <c,x> = <b,y> for some feasible x and y • sketch of proof.
Two players zero sum • Fix strategy q of player 2, • player 1 best response: • maximize p (Aqt) such that j pj = 1 and pj 0 • dual LP: minimize u such that u Aqt • Player 2: select strategy q : • minimize u such that u Aqtand i qi = 1 and qi 0 • dual (strategy for player 1) • maximize v such that v pA, j pj = 1 and pj 0 • There exists a unique value v.
Summary • Two players zero sum • linear programming • polynomial time • can have multiple Nash • unique value! • If (p,q) and (p’,q’) Nash then • (p,q’) and (p’,q) Nash
Online learning • Playing with unknown payoff matrix • Online algorithm: • at each step selects an action. • can be stochastic or fractional • Observes all possible payoffs • Updates its parameters • Goal: Achieve the value of the game • Payoff matrix of the “game” define at the end
Online learning - Algorithm • Notations: • Opponent distribution Qt • Our distribution Pt • Observed cost M(i, Qt) • Should be MQt • Goal: minimize cost • Algorithm: Exponential weights • Action i has weight proportional to bL(i,t) • L(i,t) = loss of action i until time t
Online algorithm: Notations • Formally: • parameter: b 0< b < 1 • wt+1(i) = wt(i) bM(i,Qt) • Zt = wt(i) • Pt+1(i) = wt+1(i) / Zt • Number of total steps T is known
Online algorithm: Theorem • Theorem • For any matrix M with entries in [0,1] • Any sequence of dist. Q1 ... QT • The algorithm generates P1, ... , PT • RE(A||B) = Ex~A [ln (A(x) / B(x) ) ]
Online algorithm: Analysis • Lemma • For any mixed strategy P • Corollary
Online Algorithm: Optimization • b= 1/(1 + sqrt{2 (ln n) / T}) • Average Loss: v + O(sqrt{(ln n )/T})
Two players General sum games • Input matrices (A,B) • No unique value • Computational issues: find some, all Nash • player 1 best response: • Like for zero sum: • Fix strategy q of player 2 • maximize p (Aqt) such that j pj = 1 and pj 0 • dual LP: minimize u such that u Aqt
Two players General sum games • Assume the support of strategies known. • p has support Sp and q has support Sq • Can formulate the Nash as LP: