1 / 8

Soar-RL and Reinforcement Learning

Soar-RL and Reinforcement Learning. Introducing talks by Shiwali Mohan, Mitchell Keith Bloch & Nick Gorski. Reinforcement Learning. Environment. Observations Rewards. Actions. Agent. Value in Reinforcement Learning. Value : future expected reward RL goal: maximize value

soyala
Télécharger la présentation

Soar-RL and Reinforcement Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Soar-RL and Reinforcement Learning Introducing talks byShiwali Mohan, Mitchell Keith Bloch & Nick Gorski

  2. Reinforcement Learning Environment Observations Rewards Actions Agent

  3. Value in Reinforcement Learning • Value: future expected reward • RL goal: maximize value • RL agent: select actions with highest value

  4. State and Observability Agent observes a representation of world state Can be Markovian or partial Agent doesn’t observesemantics of task Must does learn the meanings of symbols and actions 0, 2 Markovian representation Partial representation

  5. Why Reinforcement Learning Is Hard Temporal credit Assignment problem Exploration / exploitation tradeoff Curse of dimensionality

  6. Soar 9.3.1

  7. Soar-RL • Reward <state> ^reward-link ^reward ^value float • Value representation sp {rl*move*left (state <s> ^name left-right ^operator <op> +) (<op> ^name move ^dir left) --> (<s> ^operator <op> = -1.0) } • Decisions Move*left -0.8 Move*right -0.2 Move*sit -1.2 • Adaptive behavior

  8. Soar-RL Talks C • Modular RL in SoarShiwali Mohan • Improving Off-Policy HRLMitchell Keith Bloch • Learning to Use MemoryNick Gorski A/B

More Related