Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One)

Hybrid Agent-Based Modeling: Architectures,Analyses and Applications(Stage One) Li, Hailin

Outline Introduction Least-Squares Method for Reinforcement Learning Evolutionary Algorithms For RL Problem (in progress) Technical Analysis based upon hybrid agent-based architecture (in progress) Conclusion (Stage One)

Introduction • Learning From Interaction • Interact with environment • Consequences of actions to achieve goals • No explicit teacher but experience • Examples • Chess player in a game • Someone prepares some food • The actions of a gazelle calf after its born

Introduction • Characteristics • Decision making in uncertain environment • Actions • Affect the future situation • Effects cannot be fully predicted • Goals are explicit • Use experience to improve performance

Introduction • What to be learned • Mapping from situations to actions • Maximizes a scalar reward or reinforcement signal • Learning • Does not need to be told which actions to take • Must discover which actions yield most reward by trying

Introduction • Challenge • Action may affect not only immediate reward but also the next situation, and consequently all subsequent rewards • Trial and error search • Delayed reward

Introduction • Exploration and exploitation • Exploit what it already knows in order to obtain reward • Explore in order to make better action selections in the future • Neither can be pursued exclusively without failing at the task • Trade-off

Introduction • Components of an agent • Policy • Decision-making function • Reward (Total reward, Average reward, Discounted reward) • Good and bad events for the agent • Value • Rewards in a long run • Model of environment • Behavior of the environment

Introduction • Markov Property & Markov Decision Processes • “Independence of path”:all that matters is in the current state signal • A reinforcement learning task that satisfies the Markov property is called a Markov decision process, MDP • Finite Markov Decision Process (MDP)

Introduction • Three categories of methods for solving the reinforcement learning problem • Dynamic programming • Complete and accurate model of the environment • A full backup operation on each state • Monte Carlo methods • A backup for each state based on the entire sequence of observed rewards from that state until the end of the episode • Temporal-difference learning • Approximate the optimal value function, and to view the approximation as an adequate guide

LS Method for Reinforcement Learning • For stochastic dynamic system : Control decision generated by policy : Current State : Disturbance independently sampled from some fixed distribution MDP can be denoted by a quadruple : state transition probability : Action Set : State Set : The policy is a mapping : denotes the reward function is a Markov chain

LS Method for Reinforcement Learning • For each policy , the value function is defined by equation: The optimal value function is defined by

LS Method for Reinforcement Learning The optimal action can be generated through Introducing Q value function Now the optimal action can be generated through

LS Method for Reinforcement Learning • The exact Q-values for all state-action pairs can be obtained by solving the Bellman equations (full backups): or, in matrix format: denotes the transition probability from to

LS Method for Reinforcement Learning • Traditional Q-learning Popular variant of temporal-difference learning to approximate Q value functions. In the absence of the model of the MDP, using sample data The temporal difference is defined as: Consider one-step Q-learning, the updated equation is:

LS Method for Reinforcement Learning The final decision base upon Q-learning: The reason for the development of approximation methods: • Size of state-action space • The overwhelming requirement for computation The categories of approximation methods for Machine Learning: • Model Approximation • Policy Approximation • Value Function Approximation

LS Method for Reinforcement Learning • Model-Free Least-Squares Q-learning Linear Function Approximator : Basis Functions : A vector of scalar weights

LS Method for Reinforcement Learning • For a fixed policy is matrix and If the model of MDP is available

LS Method for Reinforcement Learning The policy where and If the model of MDP is not available: Model-Free Given Samples

LS Method for Reinforcement Learning • Optimal policy can be found: The greedy policy is represented by the parameter and can be determined on demand for any given state.

LS Method for Reinforcement Learning • Simulation • System is hard to model but easy to simulate • Implicitly indicate the features of the system in terms of the state visiting frequency • Orthogonal least-squares algorithm for training an RBF network • Systematic learning approach for solving center selection problem • Newly added center always maximizes the amount of energy of the desired network output

LS Method for Reinforcement Learning Hybrid Least-Squares Method Action State Reward Simulation & Orthogonal Least-Squares regression Environment Feature Configuration Least-Squares Policy Iteration (LSPI) algorithm Optimal policy

LS Method for Reinforcement Learning

Simulation Cart-Pole System

Simulation

Conclusion(Stage One) • From Reinforcement learning perspective, the intractability of solutions to sequential decision problems requires value function approximation methods • At present, linear function approximators are the best alternatives as approximation architecture mainly due to their transparent structure. • Model-free least squares policy iteration (LSPI) method is a promising algorithm that uses linear approximator architecture to achieve policy optimization in the spirit of Q-learning. May converge in surprising few steps • Inspired by orthogonal least-squares regression method for selecting the centers of RBF neural network, a new hybrid learning method for LSPI can produce more robust and human-independent solution.

Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One)

Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One)

Presentation Transcript

Physically Based Sound Modeling

Agent-Oriented Software Engineering and Tropos

Accelerating Molecular Modeling Applications with GPU Computing

Gasoline/Electric Hybrid Vehicle Update ‘06

aquatic ecosystem simulation modeling

Modeling Internet Topology

View Based Documentation of Software Architectures from “views and beyond “ by Clements and lots of people

Documenting Software Architectures

Thinking About Lambda-Based Network Architectures and Your Applications

Agent-Based Systems

Discrete Choice Modeling

BPM in Cloud Architectures: Business Process Management with SLAs and Events

Classification-based Glioma Diffusion Modeling

Thinking About Lambda-Based Network Architectures and Your Applications

Hybrid Vehicles

AOSE

Component-based Computing implications for Application Architectures

Multilevel Modeling

Algorithms and Architectures for Decimal Transcendental Function Computation

Introduction to UML, the Unified Modeling Language