1 / 18

Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning. Ersin Basaran 19/03/2005. Outline. Reinforcement Learning RL Agent Policy Hierarchical Reinforcement Learning The Need Sub-Goal Detection State Clusters Border States Continuous State and/or Action Spaces Options

Télécharger la présentation

Hierarchical Reinforcement Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005

  2. Outline • Reinforcement Learning • RL Agent • Policy • Hierarchical Reinforcement Learning • The Need • Sub-Goal Detection • State Clusters • Border States • Continuous State and/or Action Spaces • Options • Macro Q-Learning with Parallel Option Discovery • Experimental Results

  3. Reinforcement Learning • Agent observes the state, and takes the action according to the policy • Policy is a function from the state space onto the action space • Policy can be deterministic or non-deterministic • State and action spaces can be discrete, continuous or hybrid

  4. RL Agent • No model of the environment • Agent observes state s, takes action a and goes into state s’ observing reward r • Agent tries to maximize total expected reward (return) • Finite state machine model a, r S S’

  5. Policy • In a flat RL model, policy is a map from each state to a primitive action • In the optimal policy, the action taken by the agent return highest return at each each step • Can be kept in tabular format for small state and action spaces • Function approximators can be used for large state or action spaces (or continuous ones)

  6. The Need For Hierarchical RL • Increase the performance • Applying RL to the problems with large action and/or state space become feasible • Detection of sub-goals can help the agent to have the abstract actions defined over the primitive actions • Sub-goals and abstract actions can be used in different tasks on the same domain. The knowledge is transferred between tasks • The policy of the agent can be translated into a natural language

  7. Sub-goal Detection • A sub-goal can be a single state, a subset of the state space, or a constraint in the state space • Reaching a sub-goal should help the agent reaching the main goal (to get the highest return) • Sub-goals must be discovered by the agent autonomously

  8. State Clusters • The states in a cluster are strongly connected to each other • The number of state transitions among clusters are small • The states at two ends of a state transition between two different clusters are sub-goal candidates • Clusters can be hierarchical • Different clusters can be in the same cluster at a higher level

  9. Border States • Some actions cannot be applied in some states. These states are defined as border states • Border states are assumed to have a transition sequence. We can travel through the border states by taking some actions • Each end in this transition sequence is a candidate sub-goal assuming the agent sufficiently explored the environment

  10. Border State Detection • For discrete action and state space • F(s): set of states which can be reached from state s in one time unit • G(s): if an action in G(s) is applied at state s, no state transition occurs • H(s): if an action in H(s) is applied at state s, the agent moves to a different state

  11. Border State Detection • Detect the longest state sequence s0,s1,s2,…,sk-1,sk which satisfies the following constraints • siF(si+1) or si+1F(si) for 0i<k • G(si)G(si+1)   for 0<i<k-1 • H(s0) G(s1)   • H(sk) G(sk-1)   • s0 and sk are candidate sub-goals

  12. Border States on Continuous State and Action Spaces • Environment is assumed to be bounded • State and action vectors can include both continuous and discrete dimensions • The derivative of state vector with respect to the action vector can be used • The border state regions must have small derivatives for some action vectors • The large change in these derivatives is the indication of border state regions

  13. Options • An option is a policy • It can be local (defined on a subset of state space) or can be global • The option policy can use primitive actions or other options • It is hierarchical • Used to reach sub-goals

  14. Macro Q-Learning with Parallel Option Discovery • Agent starts with no sub-goal and option • It detects the sub-goals and learns the option policies and the main policy simultaneously • Options are formed and removed from the model according the sub-goal detection algorithm • When a possible sub-goal is detected, a new option is added to the model to have the policy to reach this sub-goal • All options policies are updated in parallel • The agent generates an internal reward if a sub-goal is reached

  15. Macro Q-Learning with Parallel Option Discovery • An Option is defined by the following: O = (o, o, Io, Qo, ro)where Qo is Q values for the option and ro is the internal reward signal associated with the option • Intra-option learning method is used

  16. Flat RL Hierarchical RL Experiments

  17. Options in HRL

  18. Questions and Suggestions!!!

More Related