1 / 74

No Regret Algorithms in Games

No Regret Algorithms in Games. Georgios Piliouras Georgia Institute of Technology John Hopkins University. No Regret Algorithms in Games. Georgios Piliouras Georgia Institute of Technology John Hopkins University. No Regret Algorithms in Games. Georgios Piliouras

jered
Télécharger la présentation

No Regret Algorithms in Games

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. No Regret Algorithms in Games GeorgiosPiliouras Georgia Institute of Technology John Hopkins University

  2. No Regret Algorithms in Games GeorgiosPiliouras Georgia Institute of Technology John Hopkins University

  3. No Regret Algorithms in Games GeorgiosPiliouras Georgia Institute of Technology John Hopkins University

  4. No Regret Algorithms in Social Interactions GeorgiosPiliouras Georgia Institute of Technology John Hopkins University

  5. No Regret Algorithms in Social Interactions GeorgiosPiliouras Georgia Institute of Technology John Hopkins University

  6. No Regret Behavior in Social Interactions GeorgiosPiliouras Georgia Institute of Technology John Hopkins University

  7. No Regret Behavior in Social Interactions GeorgiosPiliouras Georgia Institute of Technology John Hopkins University

  8. “Reasonable”Behavior in Social Interactions GeorgiosPiliouras Georgia Institute of Technology John Hopkins University

  9. No Regret Learning Regret(T) in a history of T periods: (review) No single action significantly outperforms the dynamic. total profit of best fixed action in hindsight - total profit of algorithm An algorithm is characterized as “no regret” if for every input sequence the regret grows sublinearly in T. [Blackwell 56], [Hannan 57], [Fundberg, Levine 94],…

  10. No Regret Learning (review) No single action significantly outperforms the dynamic.

  11. Games (i.e. Social Interactions) • Interacting entities • Pursuing their own goals • Lack of centralized control

  12. Games (review) • n players • Set of strategies Si for each player i • Possible states (strategy profiles)S=×Si • Utility ui:S→R • Social Welfare Q:S→R • Extend to allow probabilitiesΔ(Si), Δ(S) ui(Δ(S))=E(ui(S)) Q(Δ(S))=E(Q(S))

  13. Games & Equilibria 1/3 1/3 1/3 Paper Scissors Rock 1/3 Rock Paper 1/3 Scissors 1/3 Nash: A product of mixed strategies s.t. no player has a profitable deviating strategy.

  14. Games & Equilibria 1/3 1/3 1/3 Paper Scissors Rock 1/3 Rock Paper 1/3 Scissors 1/3 Choose any of the green outcomes uniformly (prob. 1/9) Nash: Aprobability distribution over outcomes, that is a product of mixed strategies s.t. no player has a profitable deviating strategy.

  15. Games & Equilibria 1/3 1/3 1/3 Paper Scissors Rock 1/3 Rock Paper 1/3 Scissors 1/3 Nash: Aprobability distribution over outcomes, s.t. no player has a profitable deviating strategy. Coarse Correlated Equilibria (CCE):

  16. Games & Equilibria Paper Scissors Rock Rock Paper Scissors Aprobability distribution over outcomes, s.t. no player has a profitable deviating strategy. Coarse Correlated Equilibria (CCE):

  17. Games & Equilibria Choose any of the green outcomes uniformly (prob. 1/6) Paper Scissors Rock Rock Paper Scissors Aprobability distribution over outcomes, s.t. no player has a profitable deviating strategy. Coarse Correlated Equilibria (CCE):

  18. Algorithms Playing Games Alg 2 Paper Scissors Rock Rock Alg 1 Paper Scissors . Online algorithm: Takes as input the past history of play until day t, and chooses a randomized action for day t+1.

  19. Today’s class No-Regret Alg 2 Paper Scissors Rock No-Regret Rock Alg 1 Paper Scissors . Online algorithm: Takes as input the past history of play until day t, and chooses a randomized action for day t+1.

  20. No Regret Algorithms & CCE • A history of no-regret algorithmsis a sequence of outcomes s.t. no agent has a single deviating action that can increase her average payoff. • A Coarse Correlated Equilibrium is a probability distribution over outcomes s.t. no agent has a single deviating action that can increase her expected payoff.

  21. How good are the CCE? • It depends… • Which class of games are we interested in? • Which notion of social welfare? • Today • Class of games: potentialgames • Social welfare: makespan • [Kleinberg, P., Tardos STOC 09]

  22. Congestion Games • n players and m resources (“edges”) • Each strategy corresponds to a set of resources (“paths”) • Each edge has a cost function ce(x) that determines the cost as a function on the # of players using it. • Cost experienced by a player = sum of edge costs x x x x Cost(red)=6 Cost(green)=8 2x x x 2x

  23. Equilibria and Social Welfare Load Balancing Makespan: Expected maximum latency over all links … … c(x)=x c(x)=x c(x)=x

  24. Equilibria and Social Welfare Pure Nash Makespan = 1 … 1 1 1 … c(x)=x c(x)=x c(x)=x

  25. Equilibria and Social Welfare (Mixed) Nash Makespan = Ω(logn/loglogn) > 1 Makespan = Θ(logn/loglogn) > 1 … 1/n 1/n 1/n … c(x)=x c(x)=x c(x)=x [Koutsoupias, Mavronicolas, Spirakis ’02], [Czumaj, Vöcking ’02]

  26. Equilibria and Social Welfare Coarse Correlated Equilibria Makespan = Ω(√n) >> Θ(logn/loglogn) … … c(x)=x c(x)=x c(x)=x [Blum, Hajiaghayi, Ligett, Roth ’08]

  27. Linear Load Balancing Q=makespan Q(worst CCE) = Θ(√n) CCE Nash >> Q(worst Nash)= Θ(logn/loglogn) Pure Nash >> Q(worst pure Nash) Q(OPT) OPT Δ(S)

  28. Linear Load Balancing Q=makespan Price of Total Anarchy = Θ(√n) CCE Nash >> Price of Anarchy = Θ(logn/loglogn) Pure Nash >> Price of Pure Anarchy 1 OPT Δ(S)

  29. Our Hope Naturalno-regret algorithms should be able to steer away from worst case equilibria.

  30. The Multiplicative Weights Algorithm (MWA) [Littlestone, Warmuth ’94], [Freund, Schapire ‘99] • Pick s with probability proportional to (1-ε)total(s), where total(s) denotes combined cost in all past periods. • Provable performance guarantees against arbitraryopponents No Regret: Against any sequence of opponents’ play, avg. payoff converges to that of the best fixed option (or better).

  31. (Multiplicative Weights) Algorithm in (Potential) Games • (t) is the current state of the system (this is a tuple of randomized strategies, one for each player). • Each player tosses their coins and a specific outcome is realized. • Depending on the outcome of these random events, we transition to the next state. Infinite Markov Chains with Infinite States (t+1) (t+1) O(ε) (t) O(ε) (t+1) O(ε) Δ(S)

  32. (Multiplicative Weights) Algorithm in (Potential) Games • Problem 1: Hard to get intuition about the problem, let alone analyze. • Let’s try to come up with a “discounted” version of the problem. • Ideas?? Infinite Markov Chains with Infinite States (t+1) (t+1) O(ε) (t) O(ε) (t+1) O(ε) Δ(S)

  33. (Multiplicative Weights) Algorithm in (Potential) Games • Idea 1: Analyze expected motion. Infinite Markov Chains with Infinite States (t+1) (t+1) O(ε) (t) O(ε) (t+1) O(ε) Δ(S)

  34. (Multiplicative Weights) Algorithm in (Potential) Games • Idea 1: Analyze expected motion. • The system evolution is now deterministic. (i.e. there exists a function f, s.t. • I wish to analyze this function (e.g. find fixed points). E[ (t+1)] (t) E[ (t+1)]= f ( (t),ε ) O(ε) Δ(S)

  35. (Multiplicative Weights) Algorithm in (Potential) Games • Idea 2: I wish to analyze the MWA dynamics for small ε. • Use Taylor expansion to find a first order approximation to f. • Problem 2: The function f is still rather complicated. E[ (t+1)] (t) O(ε) f ( (t),ε) = f ( (t),0) + ε ×f ´( (t),0) + O(ε2) Δ(S)

  36. (Multiplicative Weights) Algorithm in (Potential) Games • Idea 2: I wish to analyze the MWAdynamics for small ε. • Use Taylor expansion to find a first order approximation to f. • Problem 2: The function f is still rather complicated. E[ (t+1)] (t) O(ε) f ( (t),ε) ≈ f ( (t),0) + ε ×f ´( (t),0) Δ(S)

  37. (Multiplicative Weights) Algorithm in (Potential) Games • As ε→0, the equation specifies a vector on each point of our state space (i.e. a vector field). This vector field defines a system of ODEs which we are going to analyze. f ( (t),ε)-f ( (t),0) = f´( (t),0) ε (t) f´( (t),0) Δ(S)

  38. (Multiplicative Weights) Algorithm in (Potential) Games • As ε→0, the equation specifies a vector on each point of our state space (i.e. a vector field). This vector field defines a system of ODEs which we are going to analyze. f ( (t),ε)-f ( (t),0) = f´( (t),0) ε (t) f´( (t),0) Δ(S)

  39. Deriving the ODE • Taking expectations: • Differentiate w.r.t. ε, take expected value: • This is the replicator dynamic studied in evolutionary game theory.

  40. Motivating Example c(x)=x c(x)=x

  41. Motivating Example • Each player’s mixed strategy is summarized by a single number. (Probability of picking machine 1.) Plot mixed strategy profile in R2. Mixed Nash Pure Nash

  42. Motivating Example • Each player’s mixed strategy is summarized by a single number. (Probability of picking machine 1.) Plot mixed strategy profile in R2.

  43. Motivating Example • Even in the simplest case of two balls, two bins with linear utility the replicator equation has a nonlinear form.

  44. The potential function • The congestion game has a potential function • Let Ψ=E[Φ]. A calculation yields • Hence Ψ decreases except when every player randomizes over paths of equal expected cost (i.e. is a Lyapunov function of the dynamics). [Monderer-Shapley ’96].

  45. Unstable vs. stable fixed points • The derivative of ξ is a matrix J (the Jacobian) whose spectrum distinguishes stable from unstable fixed points. Unstable if some eigenvalue has positive real part, else neutrally stable. • Non-Nash fixed points are unstable. (Easy) • Which Nash are unstable?

  46. .5 Unstable Nash .5 .5 .5 c(x)=x c(x)=x

  47. .51 Motivating Example .49 .5 .5 c(x)=x c(x)=x

  48. .51 Motivating Example .49 .4 .6 c(x)=x c(x)=x

  49. .65 Motivating Example .35 .4 .6 c(x)=x c(x)=x

  50. Unstable vs. stable fixed points • The derivative of ξ is a matrix J (the Jacobian) whose spectrum distinguishes stable from unstable fixed points. Unstable if some eigenvalue has positive real part, else neutrally stable. • Non-Nash fixed points are unstable. (Easy) • Which Nash are unstable?

More Related