1 / 26

Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One)

Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One). Li, Hailin. Outline. I ntroduction Least-Squares Method for Reinforcement Learning Evolutionary Algorithms For RL Problem (in progress)

deiondre
Télécharger la présentation

Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hybrid Agent-Based Modeling: Architectures,Analyses and Applications(Stage One) Li, Hailin

  2. Outline Introduction Least-Squares Method for Reinforcement Learning Evolutionary Algorithms For RL Problem (in progress) Technical Analysis based upon hybrid agent-based architecture (in progress) Conclusion (Stage One)

  3. Introduction • Learning From Interaction • Interact with environment • Consequences of actions to achieve goals • No explicit teacher but experience • Examples • Chess player in a game • Someone prepares some food • The actions of a gazelle calf after its born

  4. Introduction • Characteristics • Decision making in uncertain environment • Actions • Affect the future situation • Effects cannot be fully predicted • Goals are explicit • Use experience to improve performance

  5. Introduction • What to be learned • Mapping from situations to actions • Maximizes a scalar reward or reinforcement signal • Learning • Does not need to be told which actions to take • Must discover which actions yield most reward by trying

  6. Introduction • Challenge • Action may affect not only immediate reward but also the next situation, and consequently all subsequent rewards • Trial and error search • Delayed reward

  7. Introduction • Exploration and exploitation • Exploit what it already knows in order to obtain reward • Explore in order to make better action selections in the future • Neither can be pursued exclusively without failing at the task • Trade-off

  8. Introduction • Components of an agent • Policy • Decision-making function • Reward (Total reward, Average reward, Discounted reward) • Good and bad events for the agent • Value • Rewards in a long run • Model of environment • Behavior of the environment

  9. Introduction • Markov Property & Markov Decision Processes • “Independence of path”:all that matters is in the current state signal • A reinforcement learning task that satisfies the Markov property is called a Markov decision process, MDP • Finite Markov Decision Process (MDP)

  10. Introduction • Three categories of methods for solving the reinforcement learning problem • Dynamic programming • Complete and accurate model of the environment • A full backup operation on each state • Monte Carlo methods • A backup for each state based on the entire sequence of observed rewards from that state until the end of the episode • Temporal-difference learning • Approximate the optimal value function, and to view the approximation as an adequate guide

  11. LS Method for Reinforcement Learning • For stochastic dynamic system : Control decision generated by policy : Current State : Disturbance independently sampled from some fixed distribution MDP can be denoted by a quadruple : state transition probability : Action Set : State Set : The policy is a mapping : denotes the reward function is a Markov chain

  12. LS Method for Reinforcement Learning • For each policy , the value function is defined by equation: The optimal value function is defined by

  13. LS Method for Reinforcement Learning The optimal action can be generated through Introducing Q value function Now the optimal action can be generated through

  14. LS Method for Reinforcement Learning • The exact Q-values for all state-action pairs can be obtained by solving the Bellman equations (full backups): or, in matrix format: denotes the transition probability from to

  15. LS Method for Reinforcement Learning • Traditional Q-learning Popular variant of temporal-difference learning to approximate Q value functions. In the absence of the model of the MDP, using sample data The temporal difference is defined as: Consider one-step Q-learning, the updated equation is:

  16. LS Method for Reinforcement Learning The final decision base upon Q-learning: The reason for the development of approximation methods: • Size of state-action space • The overwhelming requirement for computation The categories of approximation methods for Machine Learning: • Model Approximation • Policy Approximation • Value Function Approximation

  17. LS Method for Reinforcement Learning • Model-Free Least-Squares Q-learning Linear Function Approximator : Basis Functions : A vector of scalar weights

  18. LS Method for Reinforcement Learning • For a fixed policy is matrix and If the model of MDP is available

  19. LS Method for Reinforcement Learning The policy where and If the model of MDP is not available: Model-Free Given Samples

  20. LS Method for Reinforcement Learning • Optimal policy can be found: The greedy policy is represented by the parameter and can be determined on demand for any given state.

  21. LS Method for Reinforcement Learning • Simulation • System is hard to model but easy to simulate • Implicitly indicate the features of the system in terms of the state visiting frequency • Orthogonal least-squares algorithm for training an RBF network • Systematic learning approach for solving center selection problem • Newly added center always maximizes the amount of energy of the desired network output

  22. LS Method for Reinforcement Learning Hybrid Least-Squares Method Action State Reward Simulation & Orthogonal Least-Squares regression Environment Feature Configuration Least-Squares Policy Iteration (LSPI) algorithm Optimal policy

  23. LS Method for Reinforcement Learning

  24. Simulation Cart-Pole System

  25. Simulation

  26. Conclusion(Stage One) • From Reinforcement learning perspective, the intractability of solutions to sequential decision problems requires value function approximation methods • At present, linear function approximators are the best alternatives as approximation architecture mainly due to their transparent structure. • Model-free least squares policy iteration (LSPI) method is a promising algorithm that uses linear approximator architecture to achieve policy optimization in the spirit of Q-learning. May converge in surprising few steps • Inspired by orthogonal least-squares regression method for selecting the centers of RBF neural network, a new hybrid learning method for LSPI can produce more robust and human-independent solution.

More Related