1 / 22

The Pentium Goes to Vegas

The Pentium Goes to Vegas. Training a Neural Network to Play BlackJack. Paul Ruvolo and Christine Spritke. Goals. Investigate result based learning Develop strategy for a highly random game Train network to play effectively without explicitly teaching the rules of the game. Strategy.

vea
Télécharger la présentation

The Pentium Goes to Vegas

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Pentium Goes to Vegas Training a Neural Network to Play BlackJack Paul Ruvolo and Christine Spritke

  2. Goals • Investigate result based learning • Develop strategy for a highly random game • Train network to play effectively without explicitly teaching the rules of the game

  3. Strategy • Simplify game to only allow for HIT or STAY • Feedforward 3-layer backpropagation network • Give input units information about the hand and the dealer’s up card • 2 output units for HIT and STAY • 1 hidden layer • Measure performance with Efficiency • Efficiency = (win % * 2) + (tie %) • Return on a dollar

  4. Background

  5. Background • To form a basis of comparison we measured efficiency on a player using: • Random Guessing • Efficiency = 60.3% • Dealer’s Algorithm • Hit when below 17, otherwise Stay • Efficiency = 92.2%

  6. PHASE I Input Specific Cards Showing

  7. PHASE I – Network Setup • 104 Input Units • 52 input units for possible cards in player’s hand • 52 input units for possible dealer’s up card • 20 Hidden Units • 2 Output Units • HIT and STAY • Learning Rate = 0.3; Momentum = 0.3

  8. PHASE I – Network Setup • Target High = 0.9 • Target Low = 0.1 • Target Mid = 0.5 • If hitting and staying yield same result • HIT = STAY = Target Mid • If hitting produces a win while staying produces a loss • HIT = Target High • STAY = Target Low • Vice versa

  9. PHASE I – Results Efficiency peaks at about 88% but never settles

  10. PHASE I – Modifications • Tried multiple variations on initial network • Hidden units ranging from 1 to 20 • Learning rate and momentum adjustments • Aging algorithm for learning rate • 20 Input Units • 10 possible values for player’s cards • 10 possible values for dealer’s up card • No significant changes in performance

  11. PHASE I - Analysis • Analyzed why the network can’t improve, or even learn the dealer’s algorithm • Network hits on a hand summing to 21

  12. PHASE II Input “best” sum of current hand

  13. PHASE II – Strategy • 4 types of inputs • No dealer card, no ace differentiation • No dealer card, with ace differentiation • Include dealer card, no ace differentiation • Include dealer card, with ace differentiation • All use 2 output units and 4 hidden units

  14. PHASE II – No dealer, no aces • 18 input units • Represent all possible hand values when making a decision (ranging from 4 to 21) • Results: • Develops the dealer’s algorithm • Hits on sum < 17 • Stays on sum > 16

  15. PHASE II – No dealer, aces

  16. PHASE II – Dealer, no aces • 28 input units • 18 possible player hand values • 10 possible values for dealer’s up card • Results: • High efficiency • Good at accounting for dealer’s card in boundary cases

  17. PHASE II – Dealer, no aces

  18. PHASE II – Dealer, no aces

  19. PHASE II – Dealer, no aces Network is more likely to stay when the dealer has a bust card

  20. PHASE II – Dealer, aces • 38 input units • 28 units for player’s hand • 18 possible hard hand values • 10 possible soft hand values • 10 units for the dealer’s up card • Results: • Good at adjusting strategy for hard vs. soft hands

  21. PHASE II – Dealer, aces Network always hits a soft 17 and stays on a hard 17

  22. Conclusion • Neural networks are not magical! • Require the teacher to eliminate duplicate patterns • 5 of diamonds + 7 of clubs is equivalent to 8 of hearts + 4 of spades • Result based training is inherently more difficult • 2 hidden layers might help • We’re not optimistic!

More Related