Real Robot Behavior Acquisition Through Vision-Based Reinforcement Learning

Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented by: Subarna Sadhukhan

Reinforced learning • Vision based reinforced learning by which a robot learns to shoot a ball into a goal. Develop a method which automatically acquires strategies for this. • The robot and its environment are modeled by two synchronized finite state automatons interacting in discrete time cyclical processes. • Robot : senses current state and selects an action Environment makes decision to transition to a new state and generates reward back to the robot • Robot learns through purposive behavior to achieve a given goal

Environment – Ball, Goal • Robot- Mobile and has a camera • Nothing about the system is known • Assume robot can discriminate the set S of states and take A actions on the world

Q-learning Let Q*(s,a) be the expected return for taking action a in situation s. Where T(s,a,s’) be probability of transition from s to s’, r(s,a) is the reward for state-action pair s-a γ is discounting factor Since T and r are not known we can write Where r is the actual reward for taking a. s’ is the next state and α is the learning rate

State Set • 9*27+27+9 states (3*3 of ball*3*3*3 of goal+no goal+no ball)

Action set • Two motors • Each motor – forward, stop, back • 9 actions in all. • State-action deviation problem- Small change near observer results in large change in image, large change far from observer small change in image

Learning from Early Missions • Delayed reinforcement problem due to no explicit teacher signal, since reward received only after ball is kicked to the goal. r(s,a) = 1 only in goal state • Construct the learning schedule so that robot can learn in easy situations at early stages and later on learn in more difficult situations – Learning from Easy missions

Complexity analysis • K states, m possible actions • Q-learning for first , for second hence • LEM m*k : Get reward at each step

Implementing LEM Rough ordering of easy situations Small -> medium -> large (sizes of ball roughly means reaching the goal) State space is categorized into sub-states such as ball size, position and so on. n = size of state space, m = number of ordered sets Apply LEM with m ordered states takes As opposed to

When to shift • S1 is nearest to goal, next is S2 and so on. • Shifting occurs when Where Δ t indicates a time interval for number of steps to change. We suppose that the current state set S(k-1) can transit only to its neighbors

From previous Q-learning equation if Q converges • Thus

LEM

Experiments

Real Robot Behavior Acquisition Through Vision-Based Reinforcement Learning

Real Robot Behavior Acquisition Through Vision-Based Reinforcement Learning

Presentation Transcript

Behavior-Based Formation Control for Multi-robot Teams

Vision-Based Robot Control

Reinforcement Learning: A Tutorial

Behavior-based Robot Design An Introduction

Reinforcement Based Strategies For Behavior Reduction

Vision-based Navigation and Reinforcement Learning Path Finding for Social Robots

Real-Time Vision on a Mobile Robot Platform

Reinforcement Learning

Vision-Based Robot Control

Learning Prospective Robot Behavior

Robot vision

Reinforcement Learning: A survey

Behavior-Based Formation Control for Multi-robot Teams

Metric State Space Reinforcement Learning for a Vision-Capable Mobile Robot

A Distributed Processing Architecture for Vision Based Domestic Robot Navigation

Robot Vision

Reinforcement Learning in the Multi-Robot Domain

Learning Prospective Robot Behavior

Behavior-based Robot Design An Introduction

Reinforcement Learning: A Tutorial

A Distributed Processing Architecture for Vision Based Domestic Robot Navigation