1 / 39

Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב

Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב. Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part III) (Some slides from Prof. Yishay Mansour’s course at TAU). Two Things. Ex1 to be published by Thu submission deadline: 6.12.12, midnight

pepin
Télécharger la présentation

Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Issues on the border of economics and computationנושאים בגבול כלכלה וחישוב Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part III) (Some slides from Prof. YishayMansour’s courseat TAU)

  2. Two Things • Ex1 to be published by Thu • submission deadline: 6.12.12, midnight • can submit in pairs • submit through Dr. Blumrosen’s mailbox • Debt from last class.

  3. (-1,1) (1,-1) (1,-1) (-1,1) Left Right Left Right Reminder: Zero-Sum Games • Azero-sum game is a 2-player strategic game such that for eachsS, we haveu1(s) + u2(s) = 0. • What is good for me, is bad for my opponent and vice versa

  4. Reminder: Minimax-Optimal Strategies • A (mixed) strategy s1*isminimax optimal for player 1, if mins2 S2u1(s1*,s2) ≥mins2 S2u1(s1,s2) for all s1S1 • Similar for player 2 • Can be found via linear programming.

  5. Reminder: Minimax Theorem • Every 2-player zero-sum game has a unique value V. • Minimax optimal strategy for R guarantees R’s expected gain at least V. • Minimaxoptimal strategy for C guarantees R’s expected gain at most V.

  6. Algorithmic Implications • The minimax theorem is a useful tool in the analysis of randomized algorithms • Let’s see why.

  7. Find Bill • There are n boxes and exactly one box contains a dollar bill, and the rest of the boxes are empty. • A probe is defined as opening a box to see if it contains the dollar bill. • The objective is to locate the box containing the dollar bill while minimizing the number of probes performed. • How well can a deterministic algorithm do? • Can we do better via a randomized algorithm? • i.e., an algorithm that is a probability distribution over deterministic algorithms

  8. Randomized Find Alg • Randomized Find: select xin {H,T} uniformly at random • if x = H then probe boxes in order from 1 through n and stop if bill is found • Otherwise, probe boxes in order from n through 1 and stop if bill is found • The expected number of probes made by the algorithm is (n+1)/2. • if the dollar bill is in the ith box, then i probes are made with probability ½ and (n - i + 1) probes are made with probability ½.

  9. Randomized Find is Optimal • Lemma: A lower bound on the expected number of probes required by any randomized algorithm to solve the Find-Bill problem is (n + 1)/2. • Proof via the minimax theorem!

  10. The Algorithm Game ALG1 ALG2 … ALGn • Row player aims to choose malicious inputs; • Column player aims to choose efficient algorithms • Payoff for (I,ALG) is the running time of ALG on I Input1 Input2 . . . Inputm T(Alg,I)

  11. The Algorithm Game ALG1 ALG2 … ALGn • Pure strategies: • specific input for row player • deterministic algorithm for column player • Mixed strategies: • distribution over inputs for row player • randomized algorithm for column player Input1 Input2 . . . Inputm T(Alg,I)

  12. The Algorithm Game ALG1 ALG2 … ALGn • If I’m the column player what strategy (i.e., randomized algorithm) do I want to choose? Input1 Input2 . . . Inputm T(Alg,I)

  13. The Algorithm Game ALG1 ALG2 … ALGn • What does the minimax theorem mean here? Input1 Input2 . . . Inputm T(Alg,I)

  14. Yao’s Principle • Let T(I,Alg) denote the time required for deterministic algorithm Alg to run on input I. Then,maxp on IminAlgE[T(Ip,Alg)] = minq on algsmaxIE[T(I,Algq)] • So, for any two probability distributions p and qmindet-algE[T(Ip,Alg)] maxIE[T(I,Algq)]

  15. Using Yao’s Principle • Useful technique for proving lower bounds on running times of randomized algorithms • Step I: Design a probability distribution Ip over inputs for which every deterministic algorithm’s running time is at least a • Step II:Deduce that every randomized algorithm’s (expected) running time is at least a

  16. Back to Find-Bill • Lemma: A lower bound on the expected number of probes required by any randomized algorithm to solve the Find-Bill problem is (n + 1)/2. • Proof: • Consider the scenario that the bill is located in any one of the n boxes uniformly at random. • Consider only deterministic algorithms that do not probe the same box twice. • By symmetry we can assume that the probe order for a deterministic algorithm ALG is 1 through n. • The expected #probes for ALG is ∑i/n = (n+1)/2 • Yao’s principle implies the lower bound.

  17. No Regret Algs: So far… • In some games (e.g., potential games), best-/better-response dynamics are guaranteed to converge to a PNE. • In 2-player zero-sum games no-regret dynamics converge to a NE. • What about general games?

  18. (0,0) (-3,1) (1,-3) (-4,-4) Stop Go Stop Go Chicken Game ½ ½ ¼ ¼ ½ ½ ¼ ¼ What are the pure NEs? What are the (mixed) NEs?

  19. (0,0) (-3,1) (1,-3) (-4,-4) Stop Go Stop Go Correlated Equilibrium: Illustration 0 ½ ½ 0 • Suppose that there is a trusted random device that samples a pure strategy profile from a distribution P • … and tells each player his component of the strategy profile. • If all players other than i are following the strategy suggested by the random device, then i does not have any incentive to deviate.

  20. (0,0) (-3,1) (1,-3) (-4,-4) Stop Go Stop Go Correlated Equilibrium: Illustration 1/3 1/3 1/3 0 • Suppose that there is a trusted random device that samples a pure strategy profile from a distribution P • … and tells each player his component of the strategy profile. • If all players other than i are following the strategy suggested by the random device, then i does not have any incentive to deviate.

  21. Correlated Equilibrium • Consider a game: • Si is the set of (pure) strategies for player i • S = S1 x S2 x… x Sn • s = (s1,s2,…,sn )  S is a vector of strategies • Ui: S  R is the payoff function for player i. • Notation: given a strategy vector s, let s-i= (s1,…,si-1,si,…,sn) • The vector siwhere the i’th element is omitted

  22. Correlated Equilibrium A correlated equilibrium is a probability distribution p over (pure) strategy profiles in S such that forany i, si, si’:Σs-ip(si,s-i) ui(si,s-i) ≥ Σs-ip(si,s-i) ui(si’,s-i)

  23. Facts About Correlated Equilibrium • CE always exists • why? • The set of CE is convex • what about NE? • CEs are the solution to a set of linear equations • CE can be computed in an efficient manner (e.g., via linear programming)

  24. Moreover… • When every player uses a no-regret algorithm to select strategies the dynamics converges to a CE • in any game! • But this requires a stronger definition of no-regret…

  25. Types of No-Regret Algs • No external regret: Do (nearly) as well as best strategy in hindsight • what we’ve been talking about so far • I should have always taken the same route to work… • No internal regret: the Alg could not gain (in hindsight) by substituting a single strategy with another (consistently) • each time strategy si was chosen substitute with si’ • each time I bought a Microsoft stock I should have bought the Google stock • No internal regret implies no external regret • why?

  26. Reminder: Minimizing Regret • At each round t=1,2, …,T • There are n actions (experts) 1,2, …, n • Algorithm selects an action in {1,…,n} • and then observes the gain gi,t[0,1] of each action i{1,…,n} • Let gi = Stgi,t. Let gmax = maxigi • No external regret: Do (at least) “nearly as well” as gmax in hindsight.

  27. Internal Regret • Assume that alg outputs action sequenceA=a1… aT • The action sequence A(b → d) : • Change everyait=btoait=din • g(b→d)is the gain ofA(b → d) (for the same gains gi,t) • Internal regret: max{b,d}g(b→d) - galg– = max{b,d} Σt(gd,t-gb,t)pb,t • An algorithm has no internal regret alg if its internal regret goes to 0 as T goes to infinity

  28. Internal Regret and Dominated Strategies • Suppose that a player uses a no-internal-regret algorithm to select strategies • in a repeated game against others • What guarantees does the player have? • beyond the no-regret guarantee

  29. Dominated Strategies • Strategy siis dominated by a (mixed) strategy si’ if for everys-i we have thatui(si,s-i) < ui(si’, s-i) • Clearly, we like to avoid choosing dominated strategies si s’i

  30. Internal Regret and Dominated Strategies • siis dominated by si’ • every time we playedsiwe do better withsi’ • Define internal regret • swapping the pair of strategies • No internal regret  no dominated strategies

  31. Does a No-Internal-Regret Alg Exist? • Yes! • In fact, there exist algorithms with a stronger guarantee: no swap regret. • no swap regret: alg cannot benefit in hindsight by changing action i to F(i) for any F:{1,…,n} -> {1,…,n} • We show a generic reduction fromno-external-regret to no-internal-regret

  32. Alg1 External to Swap Regret • Our algorithm utilizes no-external-regret algorithms to achieve no-internal-regret: • n no-external-regret algorithms • intuitively, each algorithm represents a strategyin {1,…,n} • for algorithm Algi, and for any sequence of gain vectors:gAlgi > gmax - Ri Alg2 Algn

  33. q1 Alg1 qi p qn External to Swap Regret • At timet: • each Algioutputs a distribution qi • induces a matrix Q • our algorithm uses Q to decide on a distribution p over the strategies {1,…,n} • adversary decides on gains vector g=<g1…gn> • our algorithm returns to each Algisome gains vector Q Alg2 Algn

  34. Q p Combining the No-External-Regret Algs • Approach I: • Select an expert Ai with probability ri • Let the “selected” expert decide the outcome p • strategy distribution p=Qr • Approach II: • Directly decide on p. • Our approach: make p=r • Find a p such that p=Qp

  35. Alg1 Distributing Gain • Adversary selects gains g=(g1…gn) • Return to Algi gain vector pig • Note: Σ pig=g Alg2 Algn

  36. External to Swap Regret q1 Alg1 qi p qn • At time t: • each Algioutputs a distribution qi • induces a matrix Q • output distribution p such that p=Qp • pj = Σi piqi,j • observe gains g=(g1,…,gn) • return to Algi the gain vector pig Q Alg2 Algn

  37. External to Swap Regret • Gain of Algi(from its view) at round t • <qi,t,(pig)> = pi,t<qi,t,gt> • No-external-regret guarantee: • gAlgi= Σtpi,t<qi,t,gt> > Σtpi,tgj,t – Ri • For any swap function F: • gAlg = Σt <pt,gt> = Σt<ptQt,gt> = ΣtΣipi,t<qi,t,gt> = ΣigAlgi>ΣiΣtpi,tgF(i),t – Ri= gAlg,F - ΣiRi

  38. Swap Regret Corollary: Can be improved to:

  39. Summary • The Minimax Theorem is a useful tool for analyzing randomized algorithms • Yao’s Principle • There exist no-swap-regret algorithms • Next time: When all players use no-swap-regret algorithms to select strategies the dynamics converge to a CE

More Related