Navigation in Hostile Robotic Scenarios Using AEPSO/CAEPSO

Adham Atyabi Supervisor: Dr. Somnuk Phon-Amnuaisuk Co-supervisor: Dr. Chin Kuan Ho Navigate swarms with AEPSO/CAEPSO

What is Navigation? Navigation to steer a course through a medium to steer or manage (a boat) in sailing http://www.merriam-ebster.com/dictionary/navigated Different navigational techniques have evolved over the ages, all involve in locating one's position compared to known locations or patterns.

Navigation (R.Siegwart, I.R. Nourbakhsh, 2004)

Navigation without Map

Problem Statement The problem is Hostile robotic scenario based on cooperative robots trying to navigate bombs location and disarm them. The robots have limited knowledge about the bombs location (only know the likelihood of bombs in the area). The likelihood information is uncertain (because of noise and Illusion effects).

Objectives To identify design and evaluate strategies for implementing a new Particle Swarm Optimization (PSO) for robot navigation in hazard scenarios and hostile situations. To solve the uncertainty in the perception level of the robots/agents in cooperative learning scenario. To reduce the proportion of involved robots in the navigation tasks with the aim of reducing costs. To solve the initial location dependency in navigation scenarios.

Order of the Slides Conclusion Navigation Area Extension (AEPSO) PSO Particle Swarm Optimization (PSO) Scenarios& Results Robotic

Navigation in Robotic CLASSICAL APPROACHES The current developed classic methods are variations of a few general approaches: Roadmap (Retraction, Skeleton, or Highway approach) Cell Decomposition (CD) Potential fields (PF) mathematical programming. HEURISTICAL APPROACHES Artificial Neural Network (ANN) Genetic Algorithms (GA) Particle Swarm Optimization (PSO) Ant Colony (ACO) Tabu Search (TS) Heuristic algorithms do not guarantee to find a solution, but if they do, are likely to do so much faster than classical methods. (Latombe , 1991, Keil and Sack, 1985, Masehian and Sedighzadeh, 2007, Pugh et al, 2007, Ramakrishnan and Zein-Sabatto, 2001, Hettiarachchi, 2006, Hu et.al, 2007, Liu et.al 2006, Mohamad et. Al 2006, Mclurkin and Yamins, 2005, Ying-Tung et. Al 2004)

Difficulty in Conventional Navigating Techniques Navigation techniques performances are highly dependent to their initialization and reliability of their map. According to the literatures, in real robotic domains, a small difference in the starting location of the robots or goals may shows high effect on the overall performance. Due to the dynamic, noisy and unpredictable nature of real-world robotic applications, it is quite difficultto implement navigation technique based on a well-known predefined map. (Pugh and Zhang,2005, Pugh and Martinoli,2006,2007; Gu et al. , 2003)

Particle Swarm Optimization PSO is an Evolutionary Algorithm inspired from animal social behaviors. (Kennedy, 1995, Ribeiro and Schlansker, 2005; Chang et al., 2004; Pugh and Martinoli, 2006; Sousa et al., 2003; Nomura,2007) PSO outperformed other Evolutionary Algorithms such as GA in some problems (Vesterstrom and Riget, 2002; Ratnaweera et al., 2004; Pasupuleti and Battiti,2006). Particle Swarm Optimization (PSO) is an optimization technique which models a set of potential problem solutions as a swarm of particles moving about in a virtual search space. (Kennedy, 1995 ) The method was inspired by the movement of flocking birds and their interactions with their neighbors in the group. (Kennedy, 1995 ) PSO achieves optimization using three primary principles: Evaluation, where quantitative fitness can be determined for some particle location; Comparison, where the best performer out of multiple particles can be selected; Imitation, where the qualities of better particles are mimicked by others.

Particle Swarm Optimization Every particle in the population begins with a randomized position X(i,j) and randomized velocity V(i,j) in the n-dimensional search space. where i represent the particle index and j represents the dimension in the search space Each particle remembers the position at which it achieved its highest performance (p). Each particle is also a member of some neighborhood of particles, and remembers which particle achieved the best overall position in that neighborhood (g). Vij(t)= last Velocity + Cognitive component + Social component Vij(t)= w*Vij(t-1) + C1*R1*(pij-xij(t-1)) + C2*R2*(gi-Xij(t-1)) X(t)= X(t-1)+ V(t)

Related Works on PSO Single objective domains Improvement on neighborhood topology, velocity equation, global best and personal best. Multi objective domains: Niching PSO, Mutation, Parallelism, Re-initialization, Clearing memory, Using Sub-Swarms (Brits, Engelbrecht, and Van Den Bergh, 2002,2003; Yoshida, et al.,2001; Stacey, Jancic and Grundy,2003;Chang, et al., 2005; Vestestrom, Riget, 2002; Qin et al., 2004; Pasupuleti and Battiti, 2006; Ratnaweera et al., 2004;Peram et al., 2003; Parsopoulos and Vrahatis, 2002)

Order of the Slides Conclusion Navigation Area Extension (AEPSO) PSO Particle Swarm Optimization (PSO) Scenarios & Results Robotic

Robotic Swarm The amount of robots used in literatures are 20 to 300 robots (Lee at al.,2005; Hettiarachchi, 2006; Werfel et al., 2005; Chang et al., 2005; Ahmadabadi et al., 2001; Mondada et al. 2004). Robots can use more knowledge (e.g. robots have knowledge about the location of goals and their teammates) (luke et al., 2005; Ahmadabadi et al., 2001; Yamaguchi et al., 1997; Martinson and Arkin, 2003). It is commune to train robots individually (Ahmadabadi et al., 2001; Yamaguchi et al, 1997; Hayas et al., 1994).

PSO-based surveys in Robotic Parallel Learning in Heterogeneous Multi-Robot Swarms-2007,2006. Evaluation in robotic learning is costly even more than the processing of the learning algorithm itself. On real robots, sensors and actuators may have slightly different performances due to variations in manufacturing. As a result, multiple robots of the same model may actually perceive and interact with their environment differently, creating a heterogeneous swarm. Path planning for mobile robot using the particle swarm optimization with mutation operator-2004. Obstacle avoidance with multi-objective optimization by PSO in dynamic environment-2005. Robot Path Planning using Particle Swarm Optimization of Ferguson Splines-2006. Obstacle-avoidance Path Planning for Soccer Robots Using Particle Swarm Optimization- 2006.

Area Extension version of PSO To handle dynamic Velocity To handle Direction and Fitness criteria To handle Cooperation To handle diversity of search: To handle Lack of reliable perception (Pugh and Martinoli, 2006; Bogatyreva and Shillerov, 2005):

AEPSO / CAEPSO New velocity heuristic which solved the premature convergence Credit Assignment heuristic which solve the cul-de-sacs problem Hot Zone/Area heuristic.Different communications ranges condition which provide dynamic neighborhood and sub-swarms Help Request Signal which provide cooperation between different sub-swarms Boundary Condition heuristic which solve the lack of diversity in basic PSO Leave Force which provide the high level of noise resistance. Speculation mechanism which provide the high level of noise resistance.

Dynamic Velocity

Hot Zone/Area Heuristic The idea is based on dividing the environment to sub virtual fixed areas with various credits. Areas credit defined the proportion of goals and obstacles positioned in the area. particles know the credit of first and second layer of its current neighborhood

Communication Methodology and Help Request Signal Robots can only communicate with those who are in their communication range. Various communication ranges were used (500, 250, 125, 5 pixels). This heuristic has major effect on the sub swarm size. Help request signal can provide a chain of connections.

Credit Assignment and Boundary Condition Reward and Punishment Suspend factor In AEPSO, robots would be suspend each time that they cross boundary lines. By this conditions they can escape from the areas that they are stuck in it and it is as useful as reinitializing the robot states in the environment.

Illusion Effect (Uncertainty) The Illusion idea is inspired from our real world perceptions errors and mistakes which can be easily imagined as corrupted data which could be caused by the lack of communication (satellite data’s) or even sensation elements (sensors) weaknesses. Illusion effect forced approximately over 50% noise to the environment.

Illusion Effect

Cooperative Learning It is commune to do the experiences in 2 phases (; Ahmadabadi et al., 2001). Training Testing In the training phase, the suggested training method is important (Individual training or Team based training) In the testing phase, there are two different suggestions. Use same initialization as the training Use different initialization

Speculation mechanism and Leave Force Heuristics Speculation mechanism is based on using an extra memory in robots called Mask. Masks can take values by: Illusion effect. Robots self observation. Self Speculation. Neighbor’s observation. Neighbors Speculation. Leave Force is an extra punishment which will force robots to decrease 10% of their current area’s credit after certain iteration.

Order of the Slides Conclusion Navigation Area Extension (AEPSO) PSO Particle Swarm Optimization (PSO) Scenarios & Results Robotic

Simulated Scenarios Static Scenario. Dynamic scenario. Real-Time scenario. Cooperative learning scenario: Homogeneous Heterogeneous

Empirical parameter setups in various scenarios

Static and Dynamic Scenarios In contrast with static scenario, in dynamic domain, Bombs are able to run away. Bomb velocity is set to 2 pixel/iteration and robots velocity is a value between 1 to 3 pixel/iteration. Bombs’ explosion time is set to 20,000 iteration (maximum iteration).

Static and Dynamic Scenarios - Results The results are based on 100 run(each run is 20,000 iteration). In each run, 5 robots, 15 goals, and 44 obstacles are used. Located bombs Runs Experimental results Iterations Iterations

Real Time Scenario Bombs explosion time is a random value between 3,000 to 20,000 iterations. Robots should locate bombs before they reach to their explosion time. A simple noise is presumed in the environment (an additional +/- value to areas’ credit).

Movement Trajectory Basic PSO vs. AEPSO

AEPSO vs. Random Search and Linear Search

Contribution 1 AEPSO perform better local search compare with Basic PSO, Random search and Linear search in real-time and dynamic domains. AEPSO perform well in dynamic environment and the results was reliable in noisy environment.

Cooperative Learning Scenarios Higher level of Noise (Illusion) is presumed. The scenarios have two phases: Training Testing Higher level of cooperation is needed.

Cooperative Learningwith Homogeneous robots The robots have limited knowledge about the bombs location (only know the likelihood of bombs in the area). The likelihood information is uncertain (because of noise and Illusion effects). Robots should find the true credit of each area and observe those areas who have the most effect on the others first. Robots can inspire from their training results knowledge with the aim of solving the task faster. Robots should give priority to areas’ with highest effect on others.

Homogeneous Cooperative Learning - Results Bomb detection with Homogeneous Robots Detected bombs Iterations The results are based on 20 run(each run is 20,000 iteration). In each run, 5 robots, 51 bombs, and 51 obstacles are used.

Contribution 2 CAEPSO achieved to reliable results with homogeneous robots due to its ability to reduce the effect of illusion from the environment Results shows that CAEPSO achieved to 99% performance with same initialization and it also achieved to 97% performance with new initialization.

Cooperative Learningwith Heterogeneous robots There are various type of bombs and robots. Each robot can only disarm an specific type of bomb. Robots and bomb types are set randomly. Robots use more accurate version of Help Request Signal. Three scenarios are presumed: Homogeneous robots (S1). Heterogeneous robots (S2). Heterogeneous robots (S3).

Cooperative Learningwith Heterogeneous robots

Movement TrajectoryHeterogeneous robot- training phase

Movement Trajectory Heterogeneous robot- Testing phase

Contribution 3 CAEPSO achieved to 95% performance with heterogeneous robots. CAEPSO was able to reduce the illusion effect from the environment and improve its movements.

What are the advantages of AEPSO/CAEPSO? AEPSO showed better movements compare with Basic PSO. AEPSO achieved to reliable results with only 5 robots which is a big advantage compare with surveys. AEPSO/CAEPSO proved its efficiency in complex scenarios based on navigation in hostile situations. CAEPSO achieved to reliable results in homogeneous scenario with new and same initialization constrains. CAEPSO achieved to reliable results with heterogeneous robots.

Conclusion and Future work In this study, we introduced AEPSO as a new modified version of Basic PSO and we also investigated its effectiveness on static, dynamic, real-time, multi dimension, and multi objective problem domains. It is necessary to mentioned that the small number of particles (only 5 robots) gave a great advantage to AEPSO (due to being able to reduce the costs). Robots were able to solve problems with high level of complexities based on using poor level of knowledge (training knowledge) and high level of cooperation and experience sharing. We are going to compare CAEPSO results with a behaviour-based version of q-learning in a Cooperative Learning scenario with Heterogeneous robots.

Contribution AEPSO performed better local search compare with other techniques (Basic PSO, Random Search, Linear Search). AEPSO and CAEPSO are robust to Noise and Time dependency. Cooperation between agents allowed CAEPSO to perform well.

Navigation in Hostile Robotic Scenarios Using AEPSO/CAEPSO