140 likes | 253 Vues
This research explores vision-based reinforcement learning for a real robot to shoot a ball into a goal. The robot's behavior is guided by purposive learning to achieve its goal in interaction with its environment. The study models the robot and environment as synchronized finite state automata, operating in discrete time cycles. Q-learning is applied to help the robot learn optimal strategies. The robot is equipped with a camera and mobile capabilities, interacting with the environment comprising a ball and a goal. The learning process is designed to start with easy tasks, gradually progressing to more complex challenges. Small state-space categorization promotes efficient learning, and a rough ordering strategy is applied to tackle varying difficulty levels. Different states are organized based on proximity to the goal, ensuring a systematic learning progression.
E N D
Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented by: Subarna Sadhukhan
Reinforced learning • Vision based reinforced learning by which a robot learns to shoot a ball into a goal. Develop a method which automatically acquires strategies for this. • The robot and its environment are modeled by two synchronized finite state automatons interacting in discrete time cyclical processes. • Robot : senses current state and selects an action Environment makes decision to transition to a new state and generates reward back to the robot • Robot learns through purposive behavior to achieve a given goal
Environment – Ball, Goal • Robot- Mobile and has a camera • Nothing about the system is known • Assume robot can discriminate the set S of states and take A actions on the world
Q-learning Let Q*(s,a) be the expected return for taking action a in situation s. Where T(s,a,s’) be probability of transition from s to s’, r(s,a) is the reward for state-action pair s-a γ is discounting factor Since T and r are not known we can write Where r is the actual reward for taking a. s’ is the next state and α is the learning rate
State Set • 9*27+27+9 states (3*3 of ball*3*3*3 of goal+no goal+no ball)
Action set • Two motors • Each motor – forward, stop, back • 9 actions in all. • State-action deviation problem- Small change near observer results in large change in image, large change far from observer small change in image
Learning from Early Missions • Delayed reinforcement problem due to no explicit teacher signal, since reward received only after ball is kicked to the goal. r(s,a) = 1 only in goal state • Construct the learning schedule so that robot can learn in easy situations at early stages and later on learn in more difficult situations – Learning from Easy missions
Complexity analysis • K states, m possible actions • Q-learning for first , for second hence • LEM m*k : Get reward at each step
Implementing LEM Rough ordering of easy situations Small -> medium -> large (sizes of ball roughly means reaching the goal) State space is categorized into sub-states such as ball size, position and so on. n = size of state space, m = number of ordered sets Apply LEM with m ordered states takes As opposed to
When to shift • S1 is nearest to goal, next is S2 and so on. • Shifting occurs when Where Δ t indicates a time interval for number of steps to change. We suppose that the current state set S(k-1) can transit only to its neighbors