Games, Optimization, and Online Algorithms

Games, Optimization, and Online Algorithms Martin Zinkevich University of Alberta November 8th, 2006

Question • Does pop culture have anything to offer advanced research projects?

Fun and Gamesfor Scientists • Fun problem (in scientist-ese) • (1) A problem which has a wide base of players at a variety of levels • (2) A problem which has aspects which provide interesting challenges for the human mind

Fun and Gamesfor Scientists • Game problem (in scientist-ese) • (1) A problem which has a formal structure (rules) with a variety of parameter settings (opponents). • (2) A problem where the world IS out to get you.

Fun and Games • “Fun” can capture aspects of difficultythat are orthogonal to the size of the state space or the algorithmic complexity of the problems involved. • “Games” are environments where issues such as: • learning-to-learncan be studied amongst a variety of opponents, and • non-stationarity can be studied in the presence of other learning agents.

Two Objectives of This Talk • Finding Nash equilibria • Developing “experts” a priori in games

Main Point • Algorithms that learn in self-play can be utilized to generate both an equilibrium as well as experts. • Constraint/column generation is among these

Question in This Talk • What are interesting unbalanced strategies to consider?

Outline • Introduction • Iterated Best Response • Iterated Generalized Best Response • Other Applications • Conclusion

Iterated Best Response(Broken Version) • One broken idea • INIT: start with an arbitrary strategy • RESPONSE: Compute the best response • REPEAT: step 2 until satisfied

Hide and Seek HIDE ACTIONS:BLUE SEEK ACTIONS: RED

Hide and Seek SEEK ACTIONS: RED HIDE ACTIONS:BLUE

Problem: No Balance • There is no one killer strategy in some games. • Without adding some balance, there is no way to fully explore the space.

What Games Require Balance? • Simultaneous move games • Imperfect Information Games (games with private information).

Balancing Existing Strategies SEEK ACTIONS: RED HIDE ACTIONS:BLUE 50/50 RESTRICTED NASH 50/50

Iterated Balanced Best Response • INIT: Start with strategies S for player 1 and T for player 2. • BALANCE: Make a bimatrix game and solve for equilibrium. • RESPONSE: Add the best responses to the equilibrium of the game to S and T. • REPEAT 2 and 3 until satisfied

What’s The Point? • In general, equilibrium computations are significantly harder than best responses. • In practice, it is easier to compute an approximate best response than an approximate Nash equilibrium.

Pure Poker • Player 1, Player 2 each receive a “card” in [0,1] (a real number) • Then, player 1 bets or checks. • If player 1 bets, player 2 calls or folds.

Fold Call Fold Call Check Bet Strategies Player 1 Player 2 Probability Mass Probability Mass 1 1 0 1 0 1 Card Card

Fold Call Check Pure Poker • Continuous state space • Given a strategy that splits [0,1] into a finite number of intervals and plays a fixed distribution in each interval, the best response is also of this form.

F F F Call Call Call Fold Fold Call Call Fold Call Check Check Check Bet Bet Bet Check Bet B C B Pure Poker Player 2 Call Player 1 Bet Bet Call

Real Poker • In one abstraction we are currently working with, each player has 625 private states, and there are about 16,000 betting sequences, for over several BILLION states. While it is possible to iterate over all possible states in a short period of time, you can’t really perform complex operations on this size of problem.

Positive Results • In under a hundred iterations, this technique can approximately solve simple variants of poker, such as Kuhn and Leduc Poker.

Outline • Introduction • Iterated Best Response • Iterated Generalized Best Response • Other Applications • Conclusion

Practical Problem • Although balance-response technique above works, it can generate lots of strategies before equilibrium is achieved. Is there a way to cut down on this?

Robustness • How do you develop a strategy that is robust assuming that your opponent will play a strategy you have already seen?

Strat a b c Min A 3 1 2 1 B 9 2 10 2 X 3 7 5 3 Y 5 4 4 4 Z 7 3 1 1 Robustness:Generalized Best Response Maximize the MINIMUM against a set of opponents

Strat a b c Min A 3 1 2 1 B 9 2 10 2 X 3 7 5 3 Y 5 4 4 4 Z 7 3 1 1 Robustness: Generalized Best Response Maximize the MINIMUM against a set of opponents The set of possible actions could be INFINITE

Iterated Generalized Best Response • Start with strategies S and T. • Add to T a generalized best response to S. • Add to S a generalized best response to T. • Repeat until satisfied.

Hide and Seek HIDE ACTIONS:BLUE SEEK ACTIONS: RED

How to Compute aGeneralized Best Response? • Use a linear program. • Could be slow • Could be arbitrarily high precision • Use iterated best response • Start with sets of strategies S and possibly empty T. • Compute a Nash equilibrium between S and T. • Find a best response to the mixture over S. • Add it to T.

Results in Poker • Using this technique (iterated GBR), we solved a four-round game of Texas Hold’Em • We beat Opti4 (Sparbot)! • By 0.01 small bets/hand 

Other Applications • Economics (non-zero sum) • Counterstrike/RTS Games (best response not easy)

Extensions • Non-zero sum games • Approximate best response operation (through reinforcement learning) • Learning the abstraction while learning the strategy

Conclusions • Algorithms that learn in self-play (such as iterated generalized best response) yield a wealth of useful strategies including approximate Nash equilibrium.

How Hard is a Game? • For a game to be hard, it has to be at least POSSIBLE to play it badly: otherwise, regardless of how complex it is, it is still easy. • The depth of human skill in a particular game indicates its complexity.

How Hard is a Game?

Formalism • If the complexity of a game is at least k, then there exists people 1 to k, such that for any two people in the list i>j, player i can beat player j with at least 2/3 probability.

How Hard is a Game?

Important Property: Transitivity in The List

Formalism • If the complexity of a game is at least k, then there exists people 1 to k, such that for any two people in the list i>j, player i can beat player j with at least 2/3 probability.

Why People?

Why People? • Choose a number between 1 and 100. Highest number wins a dollar, no money is exchanged on a tie.

Formalism • If the complexity of a game is at least k, then there exists strategies 1 to k, such that for any two strategies in the list i>j, strategy i can beat strategy j with at least 2/3 probability.

Formalism • The epsilon-complexity of a game is at least k if there exists strategies 1 to k, and for any two strategies i>j, EV[i playing against j]>epsilon

Make it a Linear Program? • The linear program (sequence form) has a number of constraints x variables roughly proportional to the size of the game tree. • The coefficient matrix is big: this makes inversion difficult. • Also: numerical instabilities

A Theoretical Guarantee? (No!) HIDE ACTIONS:BLUE SEEK ACTIONS: RED

The Theoretical Problem • Each new bot is a best response to a particular mixture of the previous bots. • There could be a different mixture over those bots which would do BETTER against that new bot: in fact, it could even beat the new bot!

A Theoretical Guarantee? (No!) HIDE ACTIONS:BLUE SEEK ACTIONS: RED

Games, Optimization, and Online Algorithms

Games, Optimization, and Online Algorithms

Presentation Transcript

Optimization with Genetic Algorithms

Evolution: Games, dynamics and algorithms

Graph Optimization Problems and Greedy Algorithms

More Natural Optimization Algorithms

Network Optimization Problems: Models and Algorithms

Algorithms, Games and the Internet

Applied Algorithms and Optimization

Particle Swarm Optimization Algorithms

Games, Proofs, Norms, and Algorithms

Comparison of Optimization Algorithms

Constrained optimization algorithms

Network Optimization Problems: Models and Algorithms

Algorithms and Optimization

Distributed Query Optimization Algorithms

T TIT33 – Algorithms and optimization

Constrained optimization algorithms

Integration Risk Optimization Models and Algorithms

Optimization algorithms using SSA

Programming Nanocells: Optimization Algorithms

Optimization of Routing Algorithms