Evolution of Teamwork in Multiagent Systems

Evolution of Teamwork in Multiagent Systems Research Preparation Examination by Jacob Schrum

Why Multiple Agents? • Many applications • Physical World • Robotics • Autonomous automobiles • Military applications • Network Systems • Artificial World • Games • Graphics • Entertainment • Artificial Life

Why Multiagent Perspective? • Decentralized control • Failure recovery • Individual agents simpler than whole • Some environments don’t support central control • Human interaction • Humans are also agents • Agents interacting with humans are in MAS

Teamwork in Multiagent Systems • Problem divided amongst many agents • Teamwork often required for success • Communication sometimes an issue • How to learn teamwork: open question

Direct Approach: Careful Design • Hand code everything • Benefits: • Understand end product • Drawbacks: • Not general • Difficult • Programmer time • Common in: • Robotics • Video games • Most deployed systems • What if no one knows how to program it?

Learn it: Reinforcement Learning • Environment is Markov Decision Process • Learn optimal policy • Depends on value function (TD methods) • Proven convergence in tabular case • Function approximation needed for bigger problems • Problems with Partially Observable MDPs • Successes in • Pred/Prey Scenarios (Tan 1993) • Soccer keep away (Kalyanakrishnan, Stone 2009) • Robocup soccer (many…)

Breed it: Evolution • Based on evolution via natural selection • Benefits: • Less restrictive policy representation • Demonstrated success in POMDP domains • Drawbacks: • Computationally intensive • Time intensive • Focus of talk

Evolution Basics • Initialize population P • Evaluate all p in P (assign fitness) • Derive P’ by selecting/modifying members of P based on their fitness scores • Repeat from step 2 with P’ as P until done • P’ is usually similar to P, but slightly better • Many variations: • Genetic Algorithms, Evolution Strategies, etc.

Evolution in Multiagent Systems • Team Composition • Homogeneous • Heterogeneous • Heterogeneous from Subpopulations • Entire population • Type of Selection • Individual • Team • Self-Selection • Multiple Objectives Pick one member from each subpopulation to make a team

1.A. Homogeneous Teams • Team members share same policy • Members know what to expect from team members • One individual evaluated per trial • Evaluations reliable because of consistent team composition

1.B. Heterogeneous Teams • Team composed of several policies • Uncertainty as to who teammates will be • Multiple individuals evaluated per trial • Evaluation differs depending on choice of team members

1.C. Subpopulations • Each slot filled by representative from specific subpopulation • Subpopulations specialize • Individuals know what to expect of members in each slot • Team composition is still heterogeneous

1.D. Entire Population • The entire population is seen as a cooperating team • Team level selection not possible • Population may divide into competing subpopulations • Mating restrictions • Genetic/Tag-based recognition

2.A. Individual Selection • Individuals selected based on own fitness • Commonly used with heterogeneous teams • Can result in selfish behaviors • Altruism relevant • sacrificing own fitness to raise fitness of another • Reciprocity relevant • helping another to get help in return

2.B. Team Selection • Individuals selected based on team fitness • Common fitness, sum, average, etc. • Commonly used with homogeneous teams • Enables slackers in heterogeneous teams • Altruism and reciprocity have no meaning • No credit assignment problems between members

2.C. Self-Selection • Individuals choose when and with whom to mate • Common in Artificial Life simulations • AL studies emergence of biological phenomena • Usually involves a spatial component • Extinction is possible • Auto restart • Spawn new members

3. Multiple Objectives • Assume individual has fitness scores: • F = (f1,…,fN) in objectives 1 through N • Which values of F are best? • Traditional approach • fitness(F) = f1*w1 + … + fN*wN for weights w1,…,wN • Pareto-based approach • Partition population into non-dominated Pareto fronts • Assign fitness based on Pareto-front

Pareto Front Example • Each point represents an individual’s scores • Point dominates other points in its box • 3 Pareto fronts of non-dominated points

Case Studies • Review State of the Art • For each study: • Classify type of selection • Classify team composition • Identify unanswered questions • Future research directions

AntFarm • Evolve foraging behavior • Pheromones to communicate • Individual selection • Entire population as a team • No cooperative foraging! • Likely cause: individual selection • Individual selection offers less incentive for teamwork • Teamwork especially difficult when there is only one team * AntFarm: Towards Simulated Evolution. Collins, Jefferson. 1991

Evolving Communication • Exploration task • Pheromones to communicate • Team selection • Homogeneous teams vs. static bots • Pairs of objectives, Pareto-based • Different behaviors in different runs • Compromise strategy • Blocking strategy • Teamwork possible with homogeneous teams • Need to move beyond grid-worlds • Move beyond two objectives * Emergence of Communication in Competitive Multi-Agent Systems: A Pareto Multi-Objective Approach. McPartland, Nolfi, Abbass. 2005

SwarmEvolveTags • Birds visit food stations • Energy can be shared • Sharing based on tags • Self-selection • Entire population as team • Competing subpopulations emerged • Cooperation in entire population without team selection • Altruism via aiding similar individuals • Teamwork as a result of subpopulation homogeneity * Evolution of cooperation without reciprocity. Riolo, Cohen, Axelrod. 2001 * Tags and the Evolution of Cooperation in Complex Environments. Spector, Klein, Perry. 2004

Legion-I • Roman legions defend countryside and cities • Team level selection • Homogeneous teams • Multi-modal behavior • Defend city • Pursue barbarians • Homogeneous team members must fill all roles • Could not learn more complicated/strategic tasks • Example: building roads to speed up travel * Neuroevolution for Adaptive Teams. Bryant, Miikkulainen. 2003

Role-Based Cooperation • Toroidal predator/prey grid world • Individual selection • Team fitness shared by team members • Multi-Agent ESP: subpopulation based • Simple non-communicating method outperforms communicating method • Teamwork without homogeneity • Communication not always needed • May only apply to simple domains • Still need to scale up complexity • Get away from grid worlds * Coevolution of Role-Based Cooperation in Multi-Agent Systems. Yong, Miikkulainen. 2007

NERO • Machine Learning game • Human interaction via fitness function • Individual selection • Entire population is team • Multiple objectives • User defines weights dynamically • Maintenance of fitness function • Old behaviors can be forgotten when learning new ones • Need to learn multiple tasks simultaneously * Evolving Neural Network Agents in the NERO Videogame. Stanley, Bryant, Miikkulainen. 2005

Pareto Multi-objective NPCs • Evolved monsters vs. bot with stick • Individual selection • Large heterogeneous teams of 15 • Third of entire population • Multiple objectives, Pareto-based • Credit assignment trick • Learns multiple objectives simultaneously • Different runs can lead to very different results • Different areas of trade-off surface • Population becomes mostly homogeneous * Constructing Complex NPC Behavior via Multi-Objective Neuroevolution. Schrum, Miikkulainen. 2008

Dead End Game • Human prey vs. predators • Offline evolution vs. bot • Team level selection • Homogeneous teams • Online evolution vs. human • Individual selection • Small heterogeneous team • Different configurations appropriate at different levels • Sometimes the domain leaves no choice * Interactive Opponents Generate Interesting Games. Yannakakis, Hallam. 2004

Cooperating Robots • Retrieve tokens • Simulation → Robots • Compared selection levels • Individual vs. Team • Compared team compositions • Homogeneous vs. heterogeneous • Homogeneous better with teamwork and altruism • Homogeneous best with team selection • Heterogeneous best with individual selection • Did not consider subpopulations • Tasks only involved foraging (no other objectives) * Genetic Team Composition and Level of Selection in the Evolution of Cooperation. Waibel, Keller, Floreano. 2008

Summary of Issues • More complexity • Move beyond grid worlds • Need multiple contradictory objectives • Act in continuous, real-time world • Best evolutionary configuration • More comparisons between team compositions • Especially subpopulation-based method • Task/configuration pairings? • Credit assignment issues • Multi-modal behavior • What to do and when

Experiment • Four monsters vs. bot with stick • Smaller team makes task harder • Compare homogeneous, heterogeneous and subpopulation • Homogeneous uses team selection • Others use individual selection • Multiple objectives: • Group damage • Individual injury • Individual time alive

Heterogeneous Results • Many generations (600+) • Not that long in real time • Mostly selfish • Good teamwork can arise though (Baiting) • Teamwork depends on population being homogeneous Selfish Teamwork

Homogeneous Results • Fewer Generations (100-200) • Actually longer in real time • Always some form a teamwork • Baiting • Timed Assault Time Assault Baiting

Subpopulations Results • Many Generations (400+) • Each generation takes a lot of real time • Easy for slacker subpopulation to persist • Limited teamwork • Only some members participate Cooperating Pair

Discussion • Can subpopulation method do better? • Better credit assignment • Team level selection (how?) • Speed up homogeneous and subpopulations • Heterogeneous: discourage selfishness

Future Research Questions • Credit assignment issues • Cooperating individuals cannot be identified • Objectives define best evolutionary configuration? • Complex domains/real problems • Many objectives • Continuous, real-time • Potential challenge domains • Robocup Soccer • Unreal Tournament

Conclusion • Teamwork in Multiagent Systems important area • Evolution has been successful • Better understand why • Team configuration • Level of selection • Presence/absence of credit assignment problems • Apply to harder domains • Real-time • Continuous/noisy • Multiple contradictory objectives

Questions? schrum2@cs.utexas.edu

Auxiliary Slides

Cooperation Without Reciprocity • Abstract study of the evolution of cooperation • Donor/recipient model • 3 random pairings with option of donating fitness c so that recipient can gain fitness b • Choice to donate based on similarity of tags • Individual selection with entire population as team • Subpopulations emerged based on tags • Donation rate changes cyclically, but generally stays high (73%) for c < b • Need to apply in actual domain requiring teamwork * Evolution of cooperation without reciprocity. Riolo, Cohen, Axelrod. 2001

Cooperation Without Reciprocity Results

Team Composition in MAS • Taxonomy proposed by Stone*: • Definition of communication is broad: • Message passing, blackboard, information sharing, etc. * Multiagent Systems: A Survey from a Machine Learning Perspective. Stone. 2000

Evolution of Teamwork in Multiagent Systems

Evolution of Teamwork in Multiagent Systems

Presentation Transcript

Analyzing Control Trust in Normative Multiagent Systems

Multiagent Systems

Cooperative Games in Multiagent Systems AAMAS’11

Introduction to Multiagent Systems

MultiAgent Systems

Multiagent Systems and Organizations

Influence in MultiAgent Systems Application to Coalitions

Software Multiagent Systems: CS543

MultiAgent Systems

From Multiagent Systems to Multiagent Societies

Multiagent Systems and Societies of Agents

Learning in Multiagent systems

Norms in multiagent systems

Multiagent Systems

Multiagent Systems

Autonomous Multiagent Systems

Multiagent Systems

Learning in Multiagent Systems

Multiagent Systems