310 likes | 483 Vues
Poker for Fun and Profit (and intellectual challenge). Robert Holte Computing Science Dept. University of Alberta. Poker. World Series of Poker. Poker Research Group - core. Darse Billings (Ph.D.) Aaron Davidson M.Sc., Poki Neil Burch P/A, PsOpti Terence Schauenberg (M.Sc.), Adapti
E N D
Poker for Fun and Profit(and intellectual challenge) Robert Holte Computing Science Dept. University of Alberta
Poker Research Group - core • Darse Billings (Ph.D.) • Aaron Davidson M.Sc., Poki • Neil Burch P/A, PsOpti • Terence Schauenberg (M.Sc.), Adapti • Advisors: J Schaeffer, D Szafron
Poker Research Group – new arrivals • Bret Hoehn (M.Sc.) • Finnegan Southey (postdoc) • Michael Bowling • Dale Schuurmans • Rich Sutton • Robert Holte
Play Us Online http://games.cs.ualberta.ca/poker/
Poki’s Poker Academy • http://poki-poker.com
Poker Variants • Many different variants of poker • Texas Hold’em the most skill-testing • No-Limit Texas Hold’em used to determine the world champion • Our research: Limit Texas Hold’em • Current focus: 2-player (heads up)
Initial Flop Turn River 1,624,350 2 private cards to each player Bet Sequence Bet Sequence Bet Sequence Bet Sequence 9 of 19 17,296 3 community cards 9 of 19 O(1018) 45 1 community card 9 of 19 44 1 community card 19 2-player, limit, Texas Hold’em
Research Issues • Chance events • Imperfect Information • Sheer size of the game tree • Opponent modelling is crucial • How best to use domain knowledge ? • Experimental method Variants have even more challenges: • More than 2 players (up to 10) • “No limit” (bid any amount)
Issues: Chance Events • Utility of outcomes • currently just reason about expected payoff • short-term vs. long-term • High variance • was the outcome due to luck or skill ? • experiment design
Issues: Imperfect Information • Probabilistic strategies are essential • Cannot construct your strategy in a bottom-up manner, as is done with perfect information games
Issues: Size of the game • 2-player, Limit, Texas Hold’em game tree has about 1018 states • Linear Programming can solve games with 108 states
Issues: Opponent Modelling • Nash equilibrium not good enough • Static • Defensive • Even the best humans have weaknesses that should be exploited • How to learn very quickly, with very noisy information ? • Expoitation vs. exploration • How not to be exploited yourself ?
Issues: Using Expert Knowledge • We are fortunate to have unlimited access to a poker-playing expert (Darse) • How best to use his knowledge ? • Expert system (explicitly encoded knowledge) was not effective • Used his knowledge to devise abstractions that reduced the game size with minimal impact on strategic aspects of the game • Use him to evaluate the system
Experimental Method • High variance • ‘bot play not the same as human play • Very limited access to expert humans other than our own expert
Coping with very large games abstraction Full game tree T Abstract game tree T* (lossy) Solve (LP) too big to solve Strategy For T Strategy For T* (reverse mapping)
Abstraction • Texas Hold'em 2-player game tree is too big for current LP –solvers (1,179,000,604,565,715,751) • Many ways of doing the abstractions • We require coarse-grained abstractions • Avoiding a severe loss of accuracy • Abstract to a set of smaller problems 108 states, 106 equations and unknowns
Alternate Game Structures • Truncation of betting rounds • Bypassing betting rounds • Models with 3 rounds, 2 rounds, or 1 round • Many-to-one mapping of game-tree nodes to single nodes in the abstract game tree • How you do the mapping determines the overall accuracy (few good and many bad mappings) • This is the limiting factor of the method
Initial Flop Turn River Bet Sequence Bet Sequence Bet Sequence Bet Sequence 3-round Model (expected value leaf nodes) 1,624,350 9 of 19 17,296 Texas Hold'em O(1018) 9 of 19 45 9 of 19 44 19
Initial Flop River Turn 1-round Preflop Model Bet Sequence Bet Sequence Bet Sequence Bet Sequence 3-round Postflop Model (single flop) 1,624,350 9 of 19 17,296 Texas Hold'em O(1018) 9 of 19 45 9 of 19 44 19
Abstractions • Board Q – 7 – 2 • Compare 1.A–3 2.A–4 3.A–K • Suit isomorphism (24X) (exact) • Rank near-equivalence (small error) • Bucketing Hands are mapped to a small set of buckets depending on • Current hand strength • Potential for improvement in hand strength
Original Bucketing 1,1 1,2 1,3 …. 6,6 Transition Probabilities Next Round Bucketing 1,1 1,2 1,3 .… 6,6 Bucketing • Reduce branching factor at chance nodes • Partition hands into six classes per player • Overlaying strategically similar sub-trees
Initial Flop Turn River 1,624,350 w2 (36) Bet Sequence Bet Sequence Bet Sequence Bet Sequence 7 of 15 9 of 19 Abstract Preflop Model O(107) x2 (36) 17,296 Texas Hold'em O(1018) 7 of 15 9 of 19 y2 (36) Abstract Postflop Model O(107) 45 7 of 15 9 of 19 44 z2 (36) 19 15
Reverse Mapping • Bucket splitting • LP solution gives a strategy (recipe) • Each partition class split strong / weak • Split the randomized mixed strategy • {0, 0.2, 0.8} => {0, 0, 1.0} & {0, 0.4, 0.6} • Better hand selection (with some risk)
Preflop Selby preflop model Flop Bets Turn 2 4 6 8 River Post Post Post Post Putting It All Together – PsOpti1
Preflop 3-round preflop model Bets + model Flop 2 4 4 6 6 8 8 Turn River Post Post Post Post Post Post Post Putting It All Together – PsOpti2
Conclusions • Game Theory can be applied to large problems and practical systems • Nash Equilibrium (minimax) too defensive, does not exploit the opponent’s weaknesses • Current work involves opponent modelling • Preliminary results are very promising • We hope to beat the best poker players in the world in the near future