160 likes | 302 Vues
Life-long Learning in Sociable Agents. A Hierarchical Reinforcement Learning Approach Professor Andrea Thomaz Peng Zhou. Sociable Agents. What are sociable agents? Essentially, agents that must interact with humans in a social manner Why sociable agents?. Major Issues.
E N D
Life-long Learning in Sociable Agents A Hierarchical Reinforcement Learning Approach Professor Andrea Thomaz Peng Zhou
Sociable Agents • What are sociable agents? • Essentially, agents that must interact with humans in a social manner • Why sociable agents?
Major Issues • Natural language processing • Required for talking systems • Activity recognition • Not just in the real world • User interface • Agent-human communication, non-linguistic • Life-long learning • Teach, explore, revise • The role of emotions • Not just fluff
My Focus (for the moment) • How to build persistent agents that accumulate concepts and skills “opportunistically” from its environment • Environment includes humans (usually non-expert) • Socially guided learning
Background: Teaching Agents Through Social Interaction • Human input is a long-standing topic in machine learning (ie supervised learning, learning by demonstration, etc.) • Many existing techniques for “teaching” the robot • Psychological benefits • Ease of use (“how humans want to teach”), increased believability, personal investment
Previous Work: Sophie’s Kitchen • Reinforcement Learning, domain ~1000 states • Autonomous exploration • Human input: guidance & state rewards • Communication channel: gazing, explicit actions • Conducted user studies • Results: • Improved learning speed • Insight into how humans like to teach • Fun for the human
Reinforcement Learning • Basic idea: Finding an Optimal Policy • Act in the environment, receive rewards, modify policy accordingly • Typical formulation: a MDP defined by (S, A, R, T) • Advantages: • Desirable statistical properties • Unsupervised, autonomous learning • Limitations • The curse of scale • Poor transfer of knowledge • Rewards can be hard to define
Hierarchical Reinforcement Learning • Tackles scaling and transfer problems • May more closely resemble human cognitive process and therefore inform their expectations for the agent • “I’m trying to teach you how to open doors, darn it!” • Two main components • Hierarchical task structure • State abstraction • Learning the hierarchy (as opposed to handcrafting) • U-trees, HEXQ, diverse density approaches, …
My Approach: Extend Sophie’s Kitchen to HRL • Basic idea behind Sophie’s Kitchen: unsupervised learning is great, but if non-expert supervision is available, why not make use of it? • Humans typically have insights into the domain • HRL could make very good use of those structures • Challenges extending this to HRL • Adapting non-expert, ambiguous input • Modifying existing HRL algorithms to use adapted input • Skill reuse and retention, evaluation of human suggestions, improvement through practice, personality and trust issues
Current Research Status • Extended Sophie’s Kitchen domain to a tool-use grid world domain: Sophie’s Adventure • Basic Features • Navigation • Tool use • Hierarchical Structure • Transferrable skills • Large number of states
Current Research Status • Options • Sutton, Precup, Singh (1999) • HRL method that addresses hierarchical task structure • Temporally extended actions consisting of: • (Ι, π, β), where input set I is a subset of S, πis a local policy, βis the termination condition mapping states in S to [0, 1] • Learning options is a natural extension of RL learning • Primitive actions can be thought of as one-step options, options framework optimal if augmenting
Current Research Status • Learning Options • Feature-based • “Clapping” reward channel • Multi-step guidance • Intra-options learning • Keep track of successes and failures • Practice when user is not around • Aggregate similar options
In Progress • Formalize Reward Types • State rewards: “doing good” • Object-specific rewards: “look at this…” • Special rewards: “that’s the way to do it” • Extracting state abstractions from rewards • Object-specific reward -> make object a feature • ???
Planned Future Work • Options-level state abstraction (MAXQ, HAM, etc.) • Learning options-level state abstraction • U-trees • Involving human input – ie pointing out salient features of the environment • The “trust” issue: extending the user evaluation process for the purpose of formulating “trust” for certain users
Planned Future Work • Actual transfer learning experiments, and exploring how humans could facilitate the process • Carry out user studies on the system • Agent transparency in HRL – how to communicate internal state to the human • Ambiguous user signals • Should agent ask for clarification?
Conclusion • Sociable agents are, or will be, ubiquitous • These agents should be able to learn from humans • Socially guided learning can both improve the learning speed and “personalize” the agent • Higher-order learning likely necessary for realistic applications • Interesting inquiry into our own social expectations and desires