Emotion-Driven Reinforcement Learning

Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

Introduction • Interested in the functional benefits of emotion for a cognitive agent • Appraisal theories of emotion • PEACTIDM theory of cognitive control • Use emotion as a reward signal to a reinforcement learning agent • Demonstrates a functional benefit of emotion • Provides a theory of the origin of intrinsic reward

Outline • Background • Integration of emotion and cognition • Integration of emotion and reinforcement learning • Implementation in Soar • Learning task • Results

A situation is evaluated along a number of appraisal dimensions, many of which relate the situation to current goals • Novelty, goal relevance, goal conduciveness, expectedness, causal agency, etc. • Appraisals influence emotion • Emotion can then be coped with (via internal or external actions) Appraisal Theories of Emotion Situation Goals Coping Appraisals Emotion

Appraisals to Emotions (Scherer 2001)

Cognitive Control: PEACTIDM (Newell 1990)

Unification of PEACTIDM and Appraisal Theories Perceive Environmental Change Raw Perceptual Information Motor Encode Suddenness Unpredictability Goal Relevance Intrinsic Pleasantness Stimulus Relevance Motor Commands Prediction Outcome Probability Decode Attend Causal Agent/Motive Discrepancy Conduciveness Control/Power Action Stimulus chosen for processing Intend Comprehend Current Situation Assessment

Emotion: Result of appraisals • Is about the current situation • Mood: “Average” over recent emotions • Provides historical context • Feeling: Emotion “+” Mood • What agent actually perceives Distinction between emotion, mood, and feeling(Marinier & Laird 2007)

Reward = Intensity * Valence Intrinsically Motivated Reinforcement Learning(Sutton & Barto 1998; Singh et al. 2004) External Environment Environment Actions Sensations Critic “Organism” Internal Environment Actions Rewards States Appraisal Process Critic Agent +/- Feeling Intensity Decisions Rewards States Agent

Extending Soar with Emotion(Marinier & Laird 2007) Episodic Semantic Symbolic Long-Term Memories Procedural Semantic Learning Episodic Learning Chunking Reinforcement Learning Appraisal Detector Short-Term Memory Situation, Goals Decision Procedure Visual Imagery Perception Action Body

Extending Soar with Emotion(Marinier & Laird 2007) Episodic Semantic Symbolic Long-Term Memories Procedural Semantic Learning Episodic Learning Chunking Reinforcement Learning +/-Intensity Appraisal Detector Feeling .9,.6,.5,-.1,.8,… Short-Term Memory Situation, Goals Feelings Decision Procedure Feelings Appraisals Visual Imagery Emotion .5,.7,0,-.4,.3,… Mood .7,-.2,.8,.3,.6,… Perception Action Knowledge Body Architecture

Learning task Start Goal

Learning task: Encoding North Passable: false On path: false Progress: true East Passable: false On path: true Progress: true West Passable: false On path: false Progress: true South Passable: true On path: true Progress: true

Learning task: Encoding & Appraisal North Intrinsic Pleasantness: Low Goal Relevance: Low Unpredictability: High East Intrinsic Pleasantness: Low Goal Relevance: High Unpredictability: High West Intrinsic Pleasantness: Low Goal Relevance: Low Unpredictability: High South Intrinsic Pleasantness: Neutral Goal Relevance: High Unpredictability: Low

Learning task: Attending, Comprehending & Appraisal South Intrinsic Pleasantness: Neutral Goal Relevance: High Unpredictability: Low Conduciveness: High Control: High …

Learning task: Tasking

Learning task: Tasking Optimal Subtasks

What is being learned? • When to Attend vs Task • If Attending, what to Attend to • If Tasking, which subtask to create • When to Intend vs. Ignore

Learning Results

Results: With and without mood

Discussion • Agent learns both internal (tasking) and external (movement) actions • Emotion allows for more frequent rewards, and thus learns faster than standard RL • Mood “fills in the gaps” allowing for even faster learning and less variability

Conclusion & Future Work • Demonstrated computational model that integrates emotion and cognitive control • Confirmed emotion can drive reinforcement learning • We have already successfully demonstrated similar learning in a more complex domain • Would like to explore multi-agent scenarios

Emotion-Driven Reinforcement Learning

Emotion-Driven Reinforcement Learning

Presentation Transcript

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

REINFORCEMENT LEARNING

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning