Learning to Play Blackjack

Learning to Play Blackjack

Télécharger la présentation

Learning to Play Blackjack

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

1. Learning to Play Blackjack Thomas Boyett Presentation for CAP 4630 Teacher: Dr. Eggen

2. Objective

3. Specifying the Task Environment Can be considered the problem that a rational agent is the solution too. Designing a good solution always includes gaining an in depth knowledge of the problem. PEAS Performance – Objective measure of work quality. Environment – The things the agent will interact with. Actuators Sensors

4. Other Properties of the Task Environment Fully observable vs. partially observable Deterministic vs. stochastic Episodic vs. sequential Static vs. dynamic Discrete vs. continuous Single agent vs. multi-agent

5. The Rules Unlike most cards games you play only against the dealer. Whoever has the highest valued hand without exceeding 21 is the winner. Going above 21 is called Busting and is an immediate loss. Aces are worth 11 or 1, your choice. Kings, Queens, Jacks and Tens are worth 10. All other cards are worth their face value. The suit of the card is ignored. The dealer gives you two cards face up and deals himself two cards, one of them face up. This is one of the features of Blackjack that cause it to be a partially observable task environment.

6. The Rules If you want another card you can hit. If you are satisfied with your hand you can stand. Whoever has the best score wins the game. If the scores are equal then the game is a draw. If either player on the initial deal receives an ace and any card worth 10 then they have a Blackjack. A Blackjack is immediate victory unless both players have a Blackjack, this results in a draw.

7. Types of Rational Agents Simple reflex agents Model-based reflex agents Goal-based agents Utility-based agents All types can be extended to be learning agents. The Blackjack agent will be designed as a learning model-based reflex agent.

8. The Components of a Learning Agent Performance element Critic Learning element Problem Generator

9. The Performance Element The part of the agent that chooses what to do. The Blackjack agent in this design will be limited to Hitting and Standing. Optimal winning and betting are separate and Complex problems.

10. The Performance Element Chooses to hit or stand based on dealer’s Value and its own value. Actions stored in a reference table. Columns represent dealer’s value and rows represent the agent’s value.

11. The Performance Element

12. The Critic The critic tells the learning element if the results of an action were good or bad. The critical part of a learning agent must be objective and independent of the learning element.

13. The Critic If the agent chooses to hit: The outcome is good if the agent did not bust. The outcome is bad if the agent did. If the agent chose to stand: The outcome is good if the agent won the game. The outcome is bad if the agent lost. The outcome is ignored if the game ends in a draw. Neither dealer or player benefit from a draw or are penalized by it.

14. The Learning Element Makes improvements to the performance element. Works in direct response to feedback provided by the critic.

15. The Learning Element The actual structure of a lookup table entry are four values that represent the agent’s previous experience with a Specific dealer/player value combination. The learning element maintains these values.

16. The Learning Element The good/bad ratio of the hitting and standing results are computed and whichever ratio is largest decides the perceived optimal action. This approach allows the agent to improve based on previous results. Thousands of games must be played to generate a reliable lookup table.

17. The Learning Element An example of table computation. A hypothetical Table entry for dealer with Value 8 and player with value 12. Since GH/BH is greater than GS/BS this data evaluates to MUST HIT on (8,12).

18. Problem Generator The problem generator’s job is to occasionally tell the learning agent to try a non optimal action for a given situation. At the cost of sometimes behaving less optimally the agent is given the opportunity to find less obvious ways to perform better.

19. The Problem Generator Force the agent to play a set of games either only hitting or standing. Naïve policy if you are playing to win, but it allows the agent to learn about the quality of both choices in all circumstances. Problem generation may seem counter productive but it allows the agent to learn information that otherwise would have been left undiscovered.

20. Results Before Being allowed to learn: Average Win%: 17% Average Lose%: 80% After being allowed to learn without guidance from a problem Generator (50,000 games): Average Win%: 32% Average Loss%: 60% After being allowed to learn with a problem generator (50,000 games): Average Win%: 45% Average Loss%: 49%

21. Results in Perspective A player that always hits: Average Win%: 15% Average Loss%: 80% A player that flips a coin to decide hit/stand: Average Win%: 24% Average Loss%: 70%

22. Results in Perspective A player that always stands: Average Win%: 40% Average Loss%: 55% A professional Blackjack player who uses Basic (Optimal) Strategy: Average Win%: 45% Average Loss%: 49%

23. References Artificial Intelligence: A Modern Approach, Second Edition. Stuart Russel, Peter Norvig. Prentice Hall, 2003.