Soft Computing Lab. Yongjun Kim

Hybridizing evolutionary computation and reinforcement learning for the design of almost universal controllers for autonomous robots (Neurocomputing 2009) Dario Maravall, Javier de Lope, Jose Antonio Martin H. Soft Computing Lab. Yongjun Kim 7th May, 2009

Outline • Introduction • Proposed Approach • Guidelines for the design of an optimum controller for autonomous robots by the combination of evolutionary algorithms and RL • A hard motion robot control problem • Evolving the table of situations-actions • The transition from innate behavior to knowledge-based behavior by means of on-line experience • Experimental results • Conclusions and further research work • Discussion

Introduction: How important are representations in robotics? • The representation of the external world is necessary for an agent to develop a particular task in a specific world. • The nature and the characteristics of representations depend strongly on the physical nature of the agent itself. • Robot representations are computational while human mind representations are “phenomenal”. • The development of intelligent robots raises the obtaining of proper computational representations of the robot’s environments.

Introduction: How important are representations in robotics? • The dominant approaches are based on the manipulation of mathematical models of the environment with varying levels of formal representation. • Extremely demanding as regards the reasoning and perceptual abilities of robots. • No robot has ever been able to navigate in truly real environments. • The question then is how to progress toward higher levels of robot autonomy. • Reduce the complexity of both the reasoning and the perceptual tasks to be accomplished by the robots. • Mainly through a direct coupling between perception and action. • Adaptation and learning have become the central issues concerning world representations.

Introduction: How important are representations in robotics? • Robots are practically restricted to the interaction between perception and action in the reactive approach. • The close coupling between perception and action extremely reduces the activities and the possibilities of reactive robots. • Use a primitive reasoning ability. • The reactive navigation schemes can design robots able to adapt to very dynamic, uncertain and unknown environments. • It is almost compulsory to integrate reactive schemes in any autonomous navigation system (a hybrid navigation system). • The representation issue in the robotics field coincides with the sensory-motor coordination problem which at the end is precisely the problem of designing the robotic controllers.

Guidelines for the design of an optimum controller for autonomous robots by the combination of evolutionary algorithms and RL • The current robotic systems require controllers able to solve complex problems under uncertain and dynamic environments. • Reinforcement Learning (RL) is particularly attractive and efficient in the very common and hard situation in which the designer does not have all the necessary information. • RL approach seamlessly fits the usual modeling of the robot-environment interaction as a Markov decision process (MDP). • Control techniques for MDPs like dynamic programming may be applied to the design of the robot controller. • The drawback of RL is related to the curse of dimensionality. • Need to initialize a typical Q-learning algorithm with a look-up table of situations-actions by means of an evolutionary algorithm.

Guidelines for the design of an optimum controller for autonomous robots by the combination of evolutionary algorithms and RL • The proposed method is consist of the following stages: • The selection and subsequent granulation of the state variables involved which is a designer dependant task is undertaken. • The obtaining of the knowledge rule base by means of a genetic algorithm. • The starting of the standard Q-learning algorithm to build its Q-table.

Guidelines for the design of an optimum controller for autonomous robots by the combination of evolutionary algorithms and RL • The first step is (1) the identification and selection and (2) the subsequent granulation of the state variables associated to the pair robot-environment. • The granulation of state variables can be done by applying one of the followings: • A fuzzy set concept : a fuzzy knowledge rule base (FKRB). • A Boolean sets-based granulation : a Boolean KRB (BKRB). • The next step is obtaining the knowledge base of production or control rules of the type “if situation then action”. • The search of the optimum knowledge base of rules can only be undertaken by efficient, parallel population-based search techniques like evolutionary algorithms in high number of possible control rules. • The last step is starting the standard Q-learning algorithm. • Exploit the knowledge provided by the genetic algorithm.

A hard motion robot control problem • A two-link L-shaped robot moving in a cluttered environment with polygonal obstacles. • Allow several degrees-of-freedom: • The linear movement along the XY Cartesian axes of the robot’s middle joint (x, y). • The rotational movement for controlling the robot’s orientation (Ф). • Two additional independent rotational movements around the central joint (θ1, θ2).

Evolving the table of situations-actions • Distinguish two different groups or classes of state variables to describe all the possible states: • Variables for reaching a desired goal position. • Variables for collision avoidance. • Use three different state variables: • εd : the error between the current position and target position. • Zero (Z), small (S), big (B). • εΦ : the error between the current orientation and target orientation. • Zero (Z), small (S), big (B). • ρo : the distance to the nearest obstacle. • Very small and could collide (VS), small (S), big (B), very far (VB). • Discretize actions: • Positional movement : move forward, move backward, move left, move right, and four diagonal movements. • Rotational movement : turn left, turn right.

Evolving the table of situations-actions • Encoding Mechanism • Take the complete knowledge rule base of look-up table as the genotype of an individual (i.e. the Pittsburgh approach). • Only mutation operator is applied due to the crossover operator could not be clearly identified for the proposed problem. • Two possible adjacent actions defined in an internal table. • This internal table is fixed and “circular”. • When one of the bounds are encountered, the other bound is used as the value for the mutation. • Replacing policy • The n worst individuals are replaced by new randomly generated ones.

The transition from innate behavior to knowledge-based behavior by means of on-line experience • A stationary deterministic policy πd commits to a single action choice per state. • πd : S -> A, πd(s) indicates the action that the agent takes in state s. • The goal is to produce a robot controller that is initially based on its innate behavior and experiments a transition to a knowledge-based behavior by means of on-line experience. • Use RL paradigm. • A classical temporal difference (TD) learning rule. • This basic update rule can be directly derived from the formula of the expected value. ⇒ ⇒ ⇒

The transition from innate behavior to knowledge-based behavior by means of on-line experience • It is proved that the Q-learning algorithm reaches an optimal control policy under certain strong assumptions. • Being Q-learning an off-policy algorithm we can separate the policy used to select actions from the policy that the experience-based behavior controller is learning. • Use πd as the initial behavioral policy while learning a new policy π which will be based on the information provided by the Q-table. • The method consists of behaving with innate capabilities until enough experience has been gained from the interaction with the environment and Q-table has converged to the optimal values. • Once the Q-learning controller has been trained, the policy πd can be inhibited letting the controller behave with the new policy π.

The transition from innate behavior to knowledge-based behavior by means of on-line experience • The final controller will be an adaptive on-line experience-based behavior controller. • It will remain adaptive in the sense that it can adapt to changes in the environment. • It is rational in the sense that it has an action selection mechanism over different choices based on cause and effect relations. • It will perform on-line, that is, can learn continuously by direct real-time interaction with the environment.

Experimental Results • Simple case • Move a stick in a cluttered environment with unknown obstacles. • Consider a total of 36 (3x3x4) possible states and 10 possible robot movements. • The dimension of the state space is not extremely high (1036). • The standard Q-learning algorithm can find the optimum solution. • Genetic algorithm (GA) can also find the optimum solution.

Experimental Results • Complex case • Control a two-link L-shaped robot. • Add two new rotational movements, left and right, for each link. • Consider a total of 324 (3x3x4x3x3) possible states and 14 possible robot movements. • The dimension of the state space is extremely high (14324). • The standard Q-learning algorithm can not find the optimum solution. • Genetic algorithm (GA) can find the optimum solution (within 150 gen.).

Conclusions and further research work • The advantages and disadvantages of RL are discussed. • Present very attractive features concerning real-time applications and on-line applications. • Sometimes show difficulties when the dimension of the state space is extremely high. • Propose the combination of both paradigms for the solution on real-time, extremely high dimensional state space problems. • Show the efficiency of hybrid approach by using complex problem. • Future Works • Use a fuzzy granulation rather than a Boolean granulation. • Use Michigan approach rather than Pittsburgh approach in GA. • Encode each individual as a single knowledge rule.

Discussion • Evolutionary algorithm has many successful results in various areas especially when the search space is enormous. • e.g., Multi-Agent Systems (MAS) • However, it may be unfair to compare it directly to any traditional methods since it requires more computing power. • The experiments of this paper are especially unfair since the evolutionary algorithm has (number of generations x number of populations) chances.

Soft Computing Lab. Yongjun Kim

Soft Computing Lab. Yongjun Kim

Presentation Transcript

SOFT COMPUTING Evolutionary Computing

Soft Computing

Soft Computing Applications

Soft Computing .

Soft Computing Methods

SOFT COMPUTING

Soft Computing

Soft computing

Soft Computing

Soft Computing

Soft computing

Soft Computing

Soft computing

Soft Computing

Soft Computing

Soft Computing

Soft computing

Soft Computing

Soft Computing