Algorithmic Insights in Game Dynamics and Computation

Issues on the border of economics and computationנושאים בגבול כלכלה וחישוב Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part III) (Some slides from Prof. YishayMansour’s courseat TAU)

Two Things • Ex1 to be published by Thu • submission deadline: 6.12.12, midnight • can submit in pairs • submit through Dr. Blumrosen’s mailbox • Debt from last class.

(-1,1) (1,-1) (1,-1) (-1,1) Left Right Left Right Reminder: Zero-Sum Games • Azero-sum game is a 2-player strategic game such that for eachsS, we haveu1(s) + u2(s) = 0. • What is good for me, is bad for my opponent and vice versa

Reminder: Minimax-Optimal Strategies • A (mixed) strategy s1*isminimax optimal for player 1, if mins2 S2u1(s1*,s2) ≥mins2 S2u1(s1,s2) for all s1S1 • Similar for player 2 • Can be found via linear programming.

Reminder: Minimax Theorem • Every 2-player zero-sum game has a unique value V. • Minimax optimal strategy for R guarantees R’s expected gain at least V. • Minimaxoptimal strategy for C guarantees R’s expected gain at most V.

Algorithmic Implications • The minimax theorem is a useful tool in the analysis of randomized algorithms • Let’s see why.

Find Bill • There are n boxes and exactly one box contains a dollar bill, and the rest of the boxes are empty. • A probe is defined as opening a box to see if it contains the dollar bill. • The objective is to locate the box containing the dollar bill while minimizing the number of probes performed. • How well can a deterministic algorithm do? • Can we do better via a randomized algorithm? • i.e., an algorithm that is a probability distribution over deterministic algorithms

Randomized Find Alg • Randomized Find: select xin {H,T} uniformly at random • if x = H then probe boxes in order from 1 through n and stop if bill is found • Otherwise, probe boxes in order from n through 1 and stop if bill is found • The expected number of probes made by the algorithm is (n+1)/2. • if the dollar bill is in the ith box, then i probes are made with probability ½ and (n - i + 1) probes are made with probability ½.

Randomized Find is Optimal • Lemma: A lower bound on the expected number of probes required by any randomized algorithm to solve the Find-Bill problem is (n + 1)/2. • Proof via the minimax theorem!

The Algorithm Game ALG1 ALG2 … ALGn • Row player aims to choose malicious inputs; • Column player aims to choose efficient algorithms • Payoff for (I,ALG) is the running time of ALG on I Input1 Input2 . . . Inputm T(Alg,I)

The Algorithm Game ALG1 ALG2 … ALGn • Pure strategies: • specific input for row player • deterministic algorithm for column player • Mixed strategies: • distribution over inputs for row player • randomized algorithm for column player Input1 Input2 . . . Inputm T(Alg,I)

The Algorithm Game ALG1 ALG2 … ALGn • If I’m the column player what strategy (i.e., randomized algorithm) do I want to choose? Input1 Input2 . . . Inputm T(Alg,I)

The Algorithm Game ALG1 ALG2 … ALGn • What does the minimax theorem mean here? Input1 Input2 . . . Inputm T(Alg,I)

Yao’s Principle • Let T(I,Alg) denote the time required for deterministic algorithm Alg to run on input I. Then,maxp on IminAlgE[T(Ip,Alg)] = minq on algsmaxIE[T(I,Algq)] • So, for any two probability distributions p and qmindet-algE[T(Ip,Alg)] maxIE[T(I,Algq)]

Using Yao’s Principle • Useful technique for proving lower bounds on running times of randomized algorithms • Step I: Design a probability distribution Ip over inputs for which every deterministic algorithm’s running time is at least a • Step II:Deduce that every randomized algorithm’s (expected) running time is at least a

Back to Find-Bill • Lemma: A lower bound on the expected number of probes required by any randomized algorithm to solve the Find-Bill problem is (n + 1)/2. • Proof: • Consider the scenario that the bill is located in any one of the n boxes uniformly at random. • Consider only deterministic algorithms that do not probe the same box twice. • By symmetry we can assume that the probe order for a deterministic algorithm ALG is 1 through n. • The expected #probes for ALG is ∑i/n = (n+1)/2 • Yao’s principle implies the lower bound.

No Regret Algs: So far… • In some games (e.g., potential games), best-/better-response dynamics are guaranteed to converge to a PNE. • In 2-player zero-sum games no-regret dynamics converge to a NE. • What about general games?

(0,0) (-3,1) (1,-3) (-4,-4) Stop Go Stop Go Chicken Game ½ ½ ¼ ¼ ½ ½ ¼ ¼ What are the pure NEs? What are the (mixed) NEs?

(0,0) (-3,1) (1,-3) (-4,-4) Stop Go Stop Go Correlated Equilibrium: Illustration 0 ½ ½ 0 • Suppose that there is a trusted random device that samples a pure strategy profile from a distribution P • … and tells each player his component of the strategy profile. • If all players other than i are following the strategy suggested by the random device, then i does not have any incentive to deviate.

(0,0) (-3,1) (1,-3) (-4,-4) Stop Go Stop Go Correlated Equilibrium: Illustration 1/3 1/3 1/3 0 • Suppose that there is a trusted random device that samples a pure strategy profile from a distribution P • … and tells each player his component of the strategy profile. • If all players other than i are following the strategy suggested by the random device, then i does not have any incentive to deviate.

Correlated Equilibrium • Consider a game: • Si is the set of (pure) strategies for player i • S = S1 x S2 x… x Sn • s = (s1,s2,…,sn )  S is a vector of strategies • Ui: S  R is the payoff function for player i. • Notation: given a strategy vector s, let s-i= (s1,…,si-1,si,…,sn) • The vector siwhere the i’th element is omitted

Correlated Equilibrium A correlated equilibrium is a probability distribution p over (pure) strategy profiles in S such that forany i, si, si’:Σs-ip(si,s-i) ui(si,s-i) ≥ Σs-ip(si,s-i) ui(si’,s-i)

Facts About Correlated Equilibrium • CE always exists • why? • The set of CE is convex • what about NE? • CEs are the solution to a set of linear equations • CE can be computed in an efficient manner (e.g., via linear programming)

Moreover… • When every player uses a no-regret algorithm to select strategies the dynamics converges to a CE • in any game! • But this requires a stronger definition of no-regret…

Types of No-Regret Algs • No external regret: Do (nearly) as well as best strategy in hindsight • what we’ve been talking about so far • I should have always taken the same route to work… • No internal regret: the Alg could not gain (in hindsight) by substituting a single strategy with another (consistently) • each time strategy si was chosen substitute with si’ • each time I bought a Microsoft stock I should have bought the Google stock • No internal regret implies no external regret • why?

Reminder: Minimizing Regret • At each round t=1,2, …,T • There are n actions (experts) 1,2, …, n • Algorithm selects an action in {1,…,n} • and then observes the gain gi,t[0,1] of each action i{1,…,n} • Let gi = Stgi,t. Let gmax = maxigi • No external regret: Do (at least) “nearly as well” as gmax in hindsight.

Internal Regret • Assume that alg outputs action sequenceA=a1… aT • The action sequence A(b → d) : • Change everyait=btoait=din • g(b→d)is the gain ofA(b → d) (for the same gains gi,t) • Internal regret: max{b,d}g(b→d) - galg– = max{b,d} Σt(gd,t-gb,t)pb,t • An algorithm has no internal regret alg if its internal regret goes to 0 as T goes to infinity

Internal Regret and Dominated Strategies • Suppose that a player uses a no-internal-regret algorithm to select strategies • in a repeated game against others • What guarantees does the player have? • beyond the no-regret guarantee

Dominated Strategies • Strategy siis dominated by a (mixed) strategy si’ if for everys-i we have thatui(si,s-i) < ui(si’, s-i) • Clearly, we like to avoid choosing dominated strategies si s’i

Internal Regret and Dominated Strategies • siis dominated by si’ • every time we playedsiwe do better withsi’ • Define internal regret • swapping the pair of strategies • No internal regret  no dominated strategies

Does a No-Internal-Regret Alg Exist? • Yes! • In fact, there exist algorithms with a stronger guarantee: no swap regret. • no swap regret: alg cannot benefit in hindsight by changing action i to F(i) for any F:{1,…,n} -> {1,…,n} • We show a generic reduction fromno-external-regret to no-internal-regret

Alg1 External to Swap Regret • Our algorithm utilizes no-external-regret algorithms to achieve no-internal-regret: • n no-external-regret algorithms • intuitively, each algorithm represents a strategyin {1,…,n} • for algorithm Algi, and for any sequence of gain vectors:gAlgi > gmax - Ri Alg2 Algn

q1 Alg1 qi p qn External to Swap Regret • At timet: • each Algioutputs a distribution qi • induces a matrix Q • our algorithm uses Q to decide on a distribution p over the strategies {1,…,n} • adversary decides on gains vector g=<g1…gn> • our algorithm returns to each Algisome gains vector Q Alg2 Algn

Q p Combining the No-External-Regret Algs • Approach I: • Select an expert Ai with probability ri • Let the “selected” expert decide the outcome p • strategy distribution p=Qr • Approach II: • Directly decide on p. • Our approach: make p=r • Find a p such that p=Qp

Alg1 Distributing Gain • Adversary selects gains g=(g1…gn) • Return to Algi gain vector pig • Note: Σ pig=g Alg2 Algn

External to Swap Regret q1 Alg1 qi p qn • At time t: • each Algioutputs a distribution qi • induces a matrix Q • output distribution p such that p=Qp • pj = Σi piqi,j • observe gains g=(g1,…,gn) • return to Algi the gain vector pig Q Alg2 Algn

External to Swap Regret • Gain of Algi(from its view) at round t • <qi,t,(pig)> = pi,t<qi,t,gt> • No-external-regret guarantee: • gAlgi= Σtpi,t<qi,t,gt> > Σtpi,tgj,t – Ri • For any swap function F: • gAlg = Σt <pt,gt> = Σt<ptQt,gt> = ΣtΣipi,t<qi,t,gt> = ΣigAlgi>ΣiΣtpi,tgF(i),t – Ri= gAlg,F - ΣiRi

Swap Regret Corollary: Can be improved to:

Summary • The Minimax Theorem is a useful tool for analyzing randomized algorithms • Yao’s Principle • There exist no-swap-regret algorithms • Next time: When all players use no-swap-regret algorithms to select strategies the dynamics converge to a CE

Algorithmic Insights in Game Dynamics and Computation