150 likes | 248 Vues
This study by Shichao Ou and Roderic Grupen explores a developmental approach to robot learning, focusing on stages, maturation processes, and parental influence in learning contexts for infants and robots alike. The research delves into the control basis of robot actions, establishing stages of learning and applying prospective learning in a 2D navigation problem. Results show effective learning models and adaptation to new contexts, supporting interactive learning paradigms.
E N D
Learning Prospective Robot Behavior Shichao Ou and Roderic Grupen Laboratory for Perceptual Robotics University of Massachusetts Amherst
A Developmental Approach • Infant Learning • In stages • Maturation processes • Parents provide constrained learning contexts • Protect • EasyComplex • Motion mobile for newborns • Use brightly colored, easy to pick up objects • Use building blocks • Association of words and objects
Application in Robotics • Framework for Robot Developmental Learning • Role of teacher: setup learning contexts that make target concept conspicuous • Role of robot: acquire concepts, generalize to new contexts by autonomous exploration, provide feedback • Control Basis • Robot actions are created using combinations of <σ,ф,τ> • Establish stages of learning by time-varying constraints on resources • Easy Complex
Example • Learning to Reach for Objects • Stage 1: SearchTrack • Focus attention using single brightly colored object (σ) • Limit DOF (τ) to use head ONLY • Stage 2: ReachGrab • Limit DOF (τ) to use one arm ONLY • Stage 3: Handedness, Scale-Sensitive Hart et. al, 2008
Prospective Learning • Infant adapts to new situations by prospectively look ahead and predict failure and then learn a repair strategy
Robot Prospective Learning with Human Guidance a1 ai-1 ai aj-1 aj an-1 a0 S0 S1 Si Sj Sn a1 ai-1 ai aj-1 aj an-1 a0 S0 S1 Si Sj Sn Challenge g(f)=0 g(f)=1 a1 ai-1 ai aj-1 aj an-1 a0 S0 S1 Si Sj Sn sub-task Si1 Sij Sin
A 2D Navigation Domain Problem • 30x30 map • 6 doors, randomly closed • 6 buttons • 1 start and 1 goal • 3-bit door sensor on robot
Flat Learning Results • Flat Q-Learning • 5-bit state • (x,y, door-bit1, door-bit2, door-bit3) • 4 actions • up, down, left, right • Reward • 1 for reaching the goal • -0.01 for every step taken • Learning parameter • α=0.1, γ=1.0, ε=0.1 • Learned solutions after 30,000 episodes
Prospective Learning • Stage 1 • All doors open • Constrain resources to use only (x,y) sensors • Allow agent learn a policy from start to goal Down Right Right Up Right Right Right S0 S1 Si Sj Sn
Prospective Learning • Stage 2 • Close 1 door • Robot learns the cause of the failure • Robot back tracks and finds an earlier indicator of this cause
Prospective Learning • Stage 2 • Close 1 door • Robot learns the cause of the failure • Robot back tracks and finds an earlier indicator of this cause • Create a sub-task • Learn a new policy to sub-task
Prospective Learning • Stage 2 • Close 1 door • Robot learns the cause of the failure • Robot back tracks and finds an earlier indicator of this cause • Create a sub-task • Learn a new policy to sub-task • Resume original policy
Prospective Learning Results Learned solutions < 2000 episodes
Humanoid Robot Manipulation Domain • Benefits of Prospective Learning • Adapt to new contexts by maintaining majority of the existing policy • Automatically generates sub-goals • Sub-task can be learned in a completely different state space. • Supports interactive learning
Conclusion • A developmental view to robot learning • A framework enables interactive incremental learning in stages • Extension to the control basis learning framework using the idea of prospective learning