1 / 16

Modelling Motivation for Experience-Based Attention Focus in Reinforcement Learning

Modelling Motivation for Experience-Based Attention Focus in Reinforcement Learning. PhD Thesis Defence July, 2007. Candidate Kathryn Merrick School of Information Technologies University of Sydney. Supervisor Prof. Mary Lou Maher Key Centre for Design Computing

yvon
Télécharger la présentation

Modelling Motivation for Experience-Based Attention Focus in Reinforcement Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modelling Motivation for Experience-Based Attention Focus in Reinforcement Learning PhD Thesis Defence July, 2007 Candidate Kathryn Merrick School of Information Technologies University of Sydney Supervisor Prof. Mary Lou Maher Key Centre for Design Computing and Cognition, University of Sydney Objectives | Contributions | Results | Conclusions

  2. Introduction • Learning environments may be complex, with many states and possible actions • The tasks to be learned may change over time • It may be difficult to predict tasks in advance • Doing ‘everything’ may be infeasible • How can artificial agents focus attention to develop behaviours in complex, dynamic environments? • This thesis considers this question in conjunction with reinforcement learning Objectives | Contributions | Results | Conclusions

  3. A3 S4 S3 A4 A2 S1 S2 A1 1. Develop models of motivation that focus attention based on experiences • Model complex, dynamic environments using a representation that enables adaptive behaviour 3. Develop learning agents with three aspects of attention focus: • Behavioural cycles • Adaptive behaviour • Multi-task learning 4. Develop metrics for comparing adaptability and multi-task learning behaviour of MRL agents. 5. Evaluate performanceand scalability of MRL agents using different models of motivation and different RL approaches. Objectives | Contributions | Results | Conclusions

  4. Modelling Motivation as Experience-Based Reward Rm(t) = I(t) • Compute observations and events OS(t), ES(t) • Task selection using a self-organising map • Compute experience-based reward using: • Stanley’s model of habituation • Wundt Curve • No arbitration required Rm(t) = max(I(t), C(t)) • Compute observations and events OS(t), ES(t) • Task selection using a self-organising map • Compute experience-based reward using: • Policy error • Deci and Ryan’s model of optimal challenges • Arbitrate by taking maximum of interest and competence motivation Objectives | Contributions | Results | Conclusions

  5. Representing Complex, Dynamic Environments P = {P1, P2, P3, …, Pi , …} S <sensations> <sensations>  <PiSensations><sensations> | ε <PiSensations>  <sj><PiSensations> | ε <sj>  <number> | <string> <number>  1 | 2 | 3 | ... <string> ... S(1) = (<visiblePick:1> <visibleForge:1> <visibleSmithy:1>) A(1) = {A(pick-up, pick), A(pick-up, forge), A(pick-up, smithy)} S(2) = (<visibleAxe:1> <visibleLathe:1>) A(2) = {A(pick-up, axe), A(pick-up, lathe)} A<actions> <actions>  <PiActions><actions> | ε <PiActions>  <Aj><PiActions> | ε <Aj>  ... Objectives | Contributions | Results | Conclusions

  6. Metrics and Evaluation • A classification of different types of MRL and the role played by motivation in these approaches. • Metrics for comparing learned behavioural cycles in terms of adaptability and multi-task learning. • Evaluation of the performance and scalability of MRL agents using different: • Models of motivation • RL approaches • Types of environment • New approaches to the design of non-player characters for games, which can adapt in open-ended virtual worlds. Objectives | Contributions | Results | Conclusions

  7. Experiment 1 Behavioural Variety Behavioural Complexity • Task oriented learning emerges using a task-independent motivation signal to direct learning. • Greatest behavioural variety in simple environments is achieved by MFRL agents • Greatest behavioural complexity is achieved by MFRL and MHRL agents, which can interleave solutions to multiple tasks Objectives | Contributions | Results | Conclusions

  8. Experiment 2 Experiment 3 Experiment 4 MFRL MMORL MHRL • MFRL agents are most adaptable and most scalable as the number of tasks in the environment increases • MMORL are most scalable as the complexity of tasks increases • Agents motivated by interest and competence achieve greater adaptability, and show increased behavioural variety and complexity Objectives | Contributions | Results | Conclusions

  9. Conclusions • MRL agents can learn task-oriented behavioural cycles using a task-independent motivation signal • The greatest behavioural variety and complexity in simple environments is achieved by MFRL agents • The greatest adaptability is displayed by MRL agents motivated by interest and competence • The most scalable approach when recall is required uses MMORL Objectives | Contributions | Results | Conclusions

  10. Limitations and Future Work • Scalability of MRL in other types of environments • Additional approaches to motivation: • Biological models • Cognitive models • Social models • Combined models • Motivation in other machine learning settings: • Motivated supervised learning • Motivated unsupervised learning • Additional metrics for MRL: • Usefulness • Intelligence • Rationality (Linden, 2007) Objectives | Contributions | Results | Conclusions

  11. Sensed state sensors Observation Agent World state Tasks • Maintenance tasks: observations • Achievement tasks: Events

  12. Behavioural Cycles An-1 … Sn S3 A3 A2 A2 A2 An S1 S1 S2 S1 S2 S2 A1 S1 A1 A1 A1 S3 = (<location:NO_OBJECT> <Food Machine:1> <Food:1>) A3 = use (Food) S3 = (<location:Food> <Food Machine:1> <Food:1>) A2 = move to(Food) A4 = move to(Food Machine) S1 = (<location:Food Machine><Food Machine:1>) S2 = (<location:Food Machine><Food Machine:1><Food:1>) A1 = use (Food Machine)

  13. Agent Models W(t) W(t) sensors sensors S(t) O(t)-1),E(t-1) S(t) M EU OU π (t) U A(t) U A U B EU OU π (t) U A(t) U A O(t)-1,E(t-1) M O(t)),E(t) S(t), Rmt) O(t), E(t) B(t)-1 Reflex S(t), Rmt) π (t)-1), S(t)-1,A(t)-1) B(t) S(t), Rmt) RL π (t)-1), S(t)-1,B(t)-1) B(t-1).π,S(t-1), S(t), B(t-1).A B(t-1).Ω(S(t-1)) π (t), S(t), A(t) MORL π (t), S(t), B(t) A(t) A(t) effectors effectors T(t) T(t) MFRL MMORL

  14. Sensitivity Change in interest with (a) ρ+ = ρ- = 5,F+min = 0.5 and F-min = 1.5 and (b) ρ+ = ρ- = 30,F+min= 0.5 and F-min = 1.5 Change in interest with (a) ρ+ = ρ- =10, F+min = 0.1 and F-min = 1.9 and (b) ρ+ = ρ- = 10, F+min = 0.9 and F-min = 1.1

  15. Metrics • A task is complete when its defining observation or event is achieved • A task is learned when the standard deviation of the number of actions in h behavioural cycle completing the task is less than some error threshold • Behavioural variety measures the number of tasks learned • Behavioural complexity measures the number of actions in a behavioural cycle

More Related