Environmental RobustnessinMulti-agent Teams Terence Soule Robert B Heckendorn Presented at GECCO 09 (The Genetic and Evolutionary Computation Conference)
Introduction • The topic I am interested in for the project is the use of multi-agent teams in rescue missions during a disaster • This paper discusses several techniques used to train autonomous agents in exploring unknown environments. • Emphasis is given to finding the training environment that will best prepare agents for the real world. Throughout the paper the following terms are used: • Training – The environment where the agents learn. • Testing – The real world where the agents put into practice what they learned in training.
Introduction • Some sample environments where these autonomous agents could be used include: • Clearing landmines • Search and Rescue (Crandall Canyon) • Environmental cleanup • Mining and Resource Discovery • Aircraft Debris Recovery • Individual Guides in Evacuation and Assessment
Approaches • Several algorithms that could be used in this type of team building exercise are mentioned. The authors chose to use a genetic algorithm to address the problem. • A main concern with this approach is the time required to train agents. This requires agents to learn in a simulated environment rather than the real world. This is why it is critical to find the training environment that will best train the agents for the real world.
The Environment • The training world is made up of a 45x45 grid where 10 to 20 percent of the cells are labeled as ‘interesting’. • Two types of agents are defined as follows: • Scouts are fast agents who locate interesting cells and mark them with a beacon. They can move 2 lengths at a time. • Investigators are slower than scouts, but have good distance vision. They investigate the interesting cells and then mark them as investigated. If the cell contains a beacon it is deactivated. They can move 1 length at a time. • Example roles for scouts and investigators include: • Scouts locate and mark landmines. Investigators deactivate them. • Scouts locate interesting geological formations on a distant planet. Investigators take soil samples.
The Environment • Using the 45x45 grid the authors present three sample environments for their study. These are: • Random where approximately 20% of the cells are interesting (see Figure 1). • Clumped where exactly 20% of the cells are interesting (see Figure 2). • Linear where exactly 10% of the cells are interesting (see Figure 3). • For each evaluation a new random case is generated so the agent can’t memorize the layout of an environment.
Fitness Function • The genetic algorithm requires a fitness function to determine the quality of agents. The functions are: • (3B - .1b) for scouts • (3I - .1b) for investigators • Where: • B = The number of beacons placed. • I = The number of interesting areas investigated. • b = The number of time steps outside of the problem area. • In the Linear environment the fitness numbers are doubled to offset the effect of having half as many interesting cells. • It was noted that if the boundary penalty is too high some agents evolve that sit still to avoid the penalty. • The team fitness is the sum of fitness values of the team members.
Training Algorithms • Experiments are run using three different training algorithms. These are: • Teams made up of 3 scouts and 3 investigators. They tend to have strong cooperation but individual team members can become lazy. • Islands train specific members or individuals which are then used to form teams. This requires more evaluation than the team approach because of the focus on individuals. This approach creates highly fit individuals who may not cooperate very well because their areas of expertise overlap. • Orthogonal Evolution of Teams (OET) is a hybrid of the other 2 approaches. It alternates between treating the entire population as islands or as teams. In this paper they use islands during the selection step and teams during the replacement step.
Algorithm Definitions • Some definitions that were not clear to me include: • OET – Orthogonality in Computer Science guarantees that modifying one component of a system doesn’t cause side effects in another component. A car example is given where accelerating does not interfere with other components such as steering. • Definition of a three member tournament.
Results • Each of the 3 algorithms was tested by using it as the training environment and then testing in the other 2 environments. The results are shown in tables 2-5 (the order of the tables could be confusing). Some guidelines include: • Values in parentheses are standard deviations. • Bold values represent cases where the training and test environments are the same. • View tables 2 through 5.
Conclusions • The authors come to 3 main conclusions: • Evolutionary techniques evolve teams that are robust in the given environments. • The best results came when training in a linear environment. • The best teams were produced using the OET algorithm.