1 / 22

Emotion-Driven Reinforcement Learning

Emotion-Driven Reinforcement Learning. Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08. Introduction. Interested in the functional benefits of emotion for a cognitive agent Appraisal theories of emotion PEACTIDM theory of cognitive control

ramla
Télécharger la présentation

Emotion-Driven Reinforcement Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

  2. Introduction • Interested in the functional benefits of emotion for a cognitive agent • Appraisal theories of emotion • PEACTIDM theory of cognitive control • Use emotion as a reward signal to a reinforcement learning agent • Demonstrates a functional benefit of emotion • Provides a theory of the origin of intrinsic reward

  3. Outline • Background • Integration of emotion and cognition • Integration of emotion and reinforcement learning • Implementation in Soar • Learning task • Results

  4. A situation is evaluated along a number of appraisal dimensions, many of which relate the situation to current goals • Novelty, goal relevance, goal conduciveness, expectedness, causal agency, etc. • Appraisals influence emotion • Emotion can then be coped with (via internal or external actions) Appraisal Theories of Emotion Situation Goals Coping Appraisals Emotion

  5. Appraisals to Emotions (Scherer 2001)

  6. Cognitive Control: PEACTIDM (Newell 1990)

  7. Unification of PEACTIDM and Appraisal Theories Perceive Environmental Change Raw Perceptual Information Motor Encode Suddenness Unpredictability Goal Relevance Intrinsic Pleasantness Stimulus Relevance Motor Commands Prediction Outcome Probability Decode Attend Causal Agent/Motive Discrepancy Conduciveness Control/Power Action Stimulus chosen for processing Intend Comprehend Current Situation Assessment

  8. Emotion: Result of appraisals • Is about the current situation • Mood: “Average” over recent emotions • Provides historical context • Feeling: Emotion “+” Mood • What agent actually perceives Distinction between emotion, mood, and feeling(Marinier & Laird 2007)

  9. Reward = Intensity * Valence Intrinsically Motivated Reinforcement Learning(Sutton & Barto 1998; Singh et al. 2004) External Environment Environment Actions Sensations Critic “Organism” Internal Environment Actions Rewards States Appraisal Process Critic Agent +/- Feeling Intensity Decisions Rewards States Agent

  10. Extending Soar with Emotion(Marinier & Laird 2007) Episodic Semantic Symbolic Long-Term Memories Procedural Semantic Learning Episodic Learning Chunking Reinforcement Learning Appraisal Detector Short-Term Memory Situation, Goals Decision Procedure Visual Imagery Perception Action Body

  11. Extending Soar with Emotion(Marinier & Laird 2007) Episodic Semantic Symbolic Long-Term Memories Procedural Semantic Learning Episodic Learning Chunking Reinforcement Learning +/-Intensity Appraisal Detector Feeling .9,.6,.5,-.1,.8,… Short-Term Memory Situation, Goals Feelings Decision Procedure Feelings Appraisals Visual Imagery Emotion .5,.7,0,-.4,.3,… Mood .7,-.2,.8,.3,.6,… Perception Action Knowledge Body Architecture

  12. Learning task Start Goal

  13. Learning task: Encoding North Passable: false On path: false Progress: true East Passable: false On path: true Progress: true West Passable: false On path: false Progress: true South Passable: true On path: true Progress: true

  14. Learning task: Encoding & Appraisal North Intrinsic Pleasantness: Low Goal Relevance: Low Unpredictability: High East Intrinsic Pleasantness: Low Goal Relevance: High Unpredictability: High West Intrinsic Pleasantness: Low Goal Relevance: Low Unpredictability: High South Intrinsic Pleasantness: Neutral Goal Relevance: High Unpredictability: Low

  15. Learning task: Attending, Comprehending & Appraisal South Intrinsic Pleasantness: Neutral Goal Relevance: High Unpredictability: Low Conduciveness: High Control: High …

  16. Learning task: Tasking

  17. Learning task: Tasking Optimal Subtasks

  18. What is being learned? • When to Attend vs Task • If Attending, what to Attend to • If Tasking, which subtask to create • When to Intend vs. Ignore

  19. Learning Results

  20. Results: With and without mood

  21. Discussion • Agent learns both internal (tasking) and external (movement) actions • Emotion allows for more frequent rewards, and thus learns faster than standard RL • Mood “fills in the gaps” allowing for even faster learning and less variability

  22. Conclusion & Future Work • Demonstrated computational model that integrates emotion and cognitive control • Confirmed emotion can drive reinforcement learning • We have already successfully demonstrated similar learning in a more complex domain • Would like to explore multi-agent scenarios

More Related