1 / 30

Bayesian Reinforcement Learning with Gaussian Processes

Bayesian Reinforcement Learning with Gaussian Processes. Huanren Zhang. Electrical and Computer Engineering Purdue University. Outline. Introduction to Reinforcement Learning (RL) Markov Decision Processes (MDPs) Traditional RL Solution Methods Gaussian Processes (GPs)

booth
Télécharger la présentation

Bayesian Reinforcement Learning with Gaussian Processes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University

  2. Outline • Introduction to Reinforcement Learning (RL) • Markov Decision Processes (MDPs) • Traditional RL Solution Methods • Gaussian Processes (GPs) • Gaussian Process Temporal Difference (GPTD) • Experiment • Conclusion

  3. Reinforcement Learning (RL) • An agent interacts with the environment and learns how to map situations to actions in order to maximize the reward. • Involves sequences of decision • Almost all Artificial Intelligence (AI) problems can be formulated as RL problems

  4. Reinforcement Learning (RL) • Evaluative feedback (reward or reinforcement) • Indicates how good the action taken is, but not whether it is correct or wrong. • Balance between exploration and exploitation • Exploitation— make most of the current information that has already got • Exploration — explore the unknown states that may cause higher return in the long run. • Online learning

  5. Markov Decision Processes (MDPs) • RL problems can be formulated as Markov Decision Processes (MDPs) • An MDP is a tuple • State space: S • Action space: A • Reward function • State transition function • represents the probability of making a transition from state s to state s’ taking action a • By a model, we mean the reward function and state-transition function.

  6. Maze world problem

  7. Traditional RL Solution Methods • Dynamic Programming (DP) • Monte Carlo (MC) Methods • Temporal Difference (TD) Methods • All the methods are based on estimate of value function under certain policy • The value of a state is the total amount of reward an agent can expect to accumulate starting from that state

  8. Maze world problem • The value for the states that are near the goal should be greater than those far away from the goal.

  9. Temporal Difference (TD) Methods • Learn directly from experience • Bootstrap: update estimate based on other learned estimate • Do Not need a model • Updating rule: • δt : the temporal difference • αt : time dependent learning rate • γ: discounting rate

  10. TD Method (With Optimistic Policy Iteration)

  11. Policy Learned by TD method(After 100 trails)

  12. Gaussian Processes (GPs) • A Bayesian approach: provides full posterior over values, not just point estimates • Forces us to make our assumptions explicit • Non-parametric – priors are placed and inference is performed directly in function space (kernels) • Domain knowledge intuitively coded in priors

  13. Gaussian Processes (GPs) • “An indexed set of jointly Gaussian random variables" • The index set X may be just about any set. • For a Gaussian Process F, • Kernel function k(x,x’) is symmetric positive definite (Mercer kernel).

  14. Conditioning Theorem

  15. GP regression

  16. GP regression

  17. GP regression

  18. GPTD Methods • Generative model for the reward of the trajectory s1,, s2 ,…,st, • In compact form

  19. GPTD Methods • Conditioning theorem • where

  20. Can we use the uncertainty information to help balance exploitation and exploration? • New value function (improved GPTD): • Parameter c balances the importance between exploitation and exploration. • Information theory – higher uncertainty means more information. • Visiting states with higher uncertainty gives higher information gain – another kind of value for a state

  21. Experiment • Gaussian kernel is used: • ||s-s’|| represents the Euclidean distance between two states s and s’ : adjacent state will have similar value function in maze problem. • Use Optimistic Policy Iteration (OPI) to determine policy • Take actions that lead to the highest expected returned based on current value estimate.

  22. GPTD Method

  23. Policy Learned by GPTD

  24. GPTD for Multi-goal Maze

  25. Policy Learned by GPTD

  26. Improved GPTD

  27. Policy Learned by Improved GPTD

  28. Conclusion • Gaussian process gives how certainty the estimate is along with the estimate. • GPTD will give much better results than traditional RL methods • The main contribution of the project is the proposal of one way to utilize the uncertainty to balance exploration and exploitation in RL; and the experiment shows its effectiveness

  29. Reference • Bishop, C. M. 2006. Pattern Recognition and Machine Learning. Secaucus, NJ, USA: Springer-Verlag New York, Inc. • Engel, Y.; Mannor, S.; and Meir, R. 2003. Bayesian meets bellman: The gaussian process approach to temporal difference learning. International Conference on Machine Learning. • Engel, Y.; Mannor, S.; and Meir, R. 2005. Reinforcement learning with gaussian process. International Conference on Machine Learning. • Engel, Y. 2005. Algorithms and Representations for Reinforcement Learning. Ph.D. Dissertation, The Hebrew University of Jerusalem, Israel. • Puterman, M. L. 1994. Markov Decision Process: Discrete Stochastic Dynamic Programming. Wiley-Interscience, New York, NY. • Russell, S. J., and Norvig, P. 2002. Artificial Intelligence: A Modern Approach (2nd Edition). Prentice Hall. • Sutton, R. S., and Barto, A. G. 1998. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA.

  30. Questions? Bayesian Reinforcement Learning with Gaussian Processes

More Related