Créer une présentation
Télécharger la présentation

Télécharger la présentation
## Games, Optimization, and Online Algorithms

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Games, Optimization, and Online Algorithms**Martin Zinkevich University of Alberta November 8th, 2006**Question**• Does pop culture have anything to offer advanced research projects?**Fun and Gamesfor Scientists**• Fun problem (in scientist-ese) • (1) A problem which has a wide base of players at a variety of levels • (2) A problem which has aspects which provide interesting challenges for the human mind**Fun and Gamesfor Scientists**• Game problem (in scientist-ese) • (1) A problem which has a formal structure (rules) with a variety of parameter settings (opponents). • (2) A problem where the world IS out to get you.**Fun and Games**• “Fun” can capture aspects of difficultythat are orthogonal to the size of the state space or the algorithmic complexity of the problems involved. • “Games” are environments where issues such as: • learning-to-learncan be studied amongst a variety of opponents, and • non-stationarity can be studied in the presence of other learning agents.**Two Objectives of This Talk**• Finding Nash equilibria • Developing “experts” a priori in games**Main Point**• Algorithms that learn in self-play can be utilized to generate both an equilibrium as well as experts. • Constraint/column generation is among these**Question in This Talk**• What are interesting unbalanced strategies to consider?**Outline**• Introduction • Iterated Best Response • Iterated Generalized Best Response • Other Applications • Conclusion**Iterated Best Response(Broken Version)**• One broken idea • INIT: start with an arbitrary strategy • RESPONSE: Compute the best response • REPEAT: step 2 until satisfied**Hide and Seek**HIDE ACTIONS:BLUE SEEK ACTIONS: RED**Hide and Seek**SEEK ACTIONS: RED HIDE ACTIONS:BLUE**Problem: No Balance**• There is no one killer strategy in some games. • Without adding some balance, there is no way to fully explore the space.**What Games Require Balance?**• Simultaneous move games • Imperfect Information Games (games with private information).**Balancing Existing Strategies**SEEK ACTIONS: RED HIDE ACTIONS:BLUE 50/50 RESTRICTED NASH 50/50**Iterated Balanced Best Response**• INIT: Start with strategies S for player 1 and T for player 2. • BALANCE: Make a bimatrix game and solve for equilibrium. • RESPONSE: Add the best responses to the equilibrium of the game to S and T. • REPEAT 2 and 3 until satisfied**What’s The Point?**• In general, equilibrium computations are significantly harder than best responses. • In practice, it is easier to compute an approximate best response than an approximate Nash equilibrium.**Pure Poker**• Player 1, Player 2 each receive a “card” in [0,1] (a real number) • Then, player 1 bets or checks. • If player 1 bets, player 2 calls or folds.**Fold**Call Fold Call Check Bet Strategies Player 1 Player 2 Probability Mass Probability Mass 1 1 0 1 0 1 Card Card**Fold**Call Check Pure Poker • Continuous state space • Given a strategy that splits [0,1] into a finite number of intervals and plays a fixed distribution in each interval, the best response is also of this form.**F**F F Call Call Call Fold Fold Call Call Fold Call Check Check Check Bet Bet Bet Check Bet B C B Pure Poker Player 2 Call Player 1 Bet Bet Call**Real Poker**• In one abstraction we are currently working with, each player has 625 private states, and there are about 16,000 betting sequences, for over several BILLION states. While it is possible to iterate over all possible states in a short period of time, you can’t really perform complex operations on this size of problem.**Positive Results**• In under a hundred iterations, this technique can approximately solve simple variants of poker, such as Kuhn and Leduc Poker.**Outline**• Introduction • Iterated Best Response • Iterated Generalized Best Response • Other Applications • Conclusion**Practical Problem**• Although balance-response technique above works, it can generate lots of strategies before equilibrium is achieved. Is there a way to cut down on this?**Robustness**• How do you develop a strategy that is robust assuming that your opponent will play a strategy you have already seen?**Strat**a b c Min A 3 1 2 1 B 9 2 10 2 X 3 7 5 3 Y 5 4 4 4 Z 7 3 1 1 Robustness:Generalized Best Response Maximize the MINIMUM against a set of opponents**Strat**a b c Min A 3 1 2 1 B 9 2 10 2 X 3 7 5 3 Y 5 4 4 4 Z 7 3 1 1 Robustness: Generalized Best Response Maximize the MINIMUM against a set of opponents The set of possible actions could be INFINITE**Iterated Generalized Best Response**• Start with strategies S and T. • Add to T a generalized best response to S. • Add to S a generalized best response to T. • Repeat until satisfied.**Hide and Seek**HIDE ACTIONS:BLUE SEEK ACTIONS: RED**How to Compute aGeneralized Best Response?**• Use a linear program. • Could be slow • Could be arbitrarily high precision • Use iterated best response • Start with sets of strategies S and possibly empty T. • Compute a Nash equilibrium between S and T. • Find a best response to the mixture over S. • Add it to T.**Results in Poker**• Using this technique (iterated GBR), we solved a four-round game of Texas Hold’Em • We beat Opti4 (Sparbot)! • By 0.01 small bets/hand **Other Applications**• Economics (non-zero sum) • Counterstrike/RTS Games (best response not easy)**Extensions**• Non-zero sum games • Approximate best response operation (through reinforcement learning) • Learning the abstraction while learning the strategy**Conclusions**• Algorithms that learn in self-play (such as iterated generalized best response) yield a wealth of useful strategies including approximate Nash equilibrium.**How Hard is a Game?**• For a game to be hard, it has to be at least POSSIBLE to play it badly: otherwise, regardless of how complex it is, it is still easy. • The depth of human skill in a particular game indicates its complexity.**Formalism**• If the complexity of a game is at least k, then there exists people 1 to k, such that for any two people in the list i>j, player i can beat player j with at least 2/3 probability.**Formalism**• If the complexity of a game is at least k, then there exists people 1 to k, such that for any two people in the list i>j, player i can beat player j with at least 2/3 probability.**Why People?**• Choose a number between 1 and 100. Highest number wins a dollar, no money is exchanged on a tie.**Formalism**• If the complexity of a game is at least k, then there exists strategies 1 to k, such that for any two strategies in the list i>j, strategy i can beat strategy j with at least 2/3 probability.**Formalism**• The epsilon-complexity of a game is at least k if there exists strategies 1 to k, and for any two strategies i>j, EV[i playing against j]>epsilon**Make it a Linear Program?**• The linear program (sequence form) has a number of constraints x variables roughly proportional to the size of the game tree. • The coefficient matrix is big: this makes inversion difficult. • Also: numerical instabilities**A Theoretical Guarantee? (No!)**HIDE ACTIONS:BLUE SEEK ACTIONS: RED**The Theoretical Problem**• Each new bot is a best response to a particular mixture of the previous bots. • There could be a different mixture over those bots which would do BETTER against that new bot: in fact, it could even beat the new bot!**A Theoretical Guarantee? (No!)**HIDE ACTIONS:BLUE SEEK ACTIONS: RED**A Theoretical Guarantee? (No!)**HIDE ACTIONS:BLUE SEEK ACTIONS: RED