270 likes | 417 Vues
Evolving Multimodal Networks for Multitask Games. Jacob Schrum – schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu University of Texas at Austin Department of Computer Science. Evolution in videogames Automatically learn interesting behavior Complex but controlled environments
E N D
Evolving Multimodal Networks for Multitask Games Jacob Schrum – schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu University of Texas at Austin Department of Computer Science
Evolution in videogames • Automatically learn interesting behavior • Complex but controlled environments • Stepping stone to real world • Robots • Training simulators • Complexity issues • Multiple contradictory objectives • Multiple challenging tasks
Multitask Games • NPCs perform two or more separate tasks • Each task has own performance measures • Task linkage • Independent • Dependent • Not blended • Inherently multiobjective
Test Domains • Designed to study multimodal behavior • Two tasks in similar environments • Different behavior needed to succeed • Main challenge: perform well in both Back Ramming Front Ramming
Front Ramming Attack w/front ram Avoid counterattacks Back Ramming Attack w/back ram Avoid counterattacks Front/Back Ramming • Same goal, opposite embodiments
Predator Attack prey Prevent escape Prey Avoid attack Stay alive Predator/Prey • Same embodiment, opposite goals
Multiobjective Optimization High health but did not deal much damage • Game with two objectives: • Damage Dealt • Remaining Health • A dominates B iff A is strictly better in one objective and at least as good in others • Population of points not dominated are best: Pareto Front • Weighted-sum provably incapable of capturing non-convex front Tradeoff between objectives Dealt lot of damage, but lost lots of health
NSGA-II • Evolution: natural approach for finding optimal population • Non-Dominated Sorting Genetic Algorithm II* • Population P with size N; Evaluate P • Use mutation to get P´ size N; Evaluate P´ • Calculate non-dominated fronts of {P È P´} size 2N • New population size N from highest fronts of {P È P´} *K. Deb et al. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. Evol. Comp. 2002
Constructive Neuroevolution • Genetic Algorithms + Neural Networks • Build structure incrementally (complexification) • Good at generating control policies • Three basic mutations (no crossover used) Perturb Weight Add Connection Add Node
Multimodal Networks (1) • Multitask Learning* • One mode per task • Shared hidden layer • Knows current task • Previous work • Supervised learning context • Multiple tasks learned quicker than individual • Not tried with evolution yet * R. A. Caruana, "Multitask learning: A knowledge-based source of inductive bias" ICML 1993
Multimodal Networks (2) Starting network with one mode • Mode Mutation • Extra modes evolved • Networks choose mode • Chosen via preference neurons • MM Previous • Links from previous mode • Weights = 1.0 • MM Random • Links from random sources • Random weights • Supports mode deletion MM(P) MM(R)
Experiment • Compare 4 conditions: • Control: Unimodal networks • Multitask: One mode per task • MM(P): Mode Mutation Previous • MM(R): Mode Mutation Random + Delete Mutation • 500 generations • Population size 52 • “Player” behavior scripted • Network controls homogeneous team of 4
MO Performance Assessment • Reduce Pareto front to single number • Hypervolume of dominated region • Pareto compliant • Front A dominates front B implies HV(A) > HV(B) • Standard statistical comparisons of average HV
Front/Back Ramming Behaviors Multitask Back Ramming Front Ramming MM(R)
Predator/Prey Behaviors Multitask Predator Prey MM(R)
Discussion (1) • Front/Back Ramming • Control < MM(P), MM(R) < Multitask • Multiple modes help • Explicit knowledge of task helps
Discussion (2) • Predator/Prey • MM(P), Control, Multitask < MM(R) • Multiple modes not necessarily helpful • Disparity in relative difficulty of tasks • Multitask ends up wasting effort • Mode deletion aids search for one good mode
How To Apply • Multitask good if: • Task division known, and • Tasks are comparably difficult • Mode mutation good if: • Task division is unknown, or • “Obvious” task division is misleading
Future Work • Games with more tasks • Does method scale? • Control mode bloat • Games with independent tasks • Ms. Pac-Man • Collect pills while avoiding ghosts • Eat ghosts after eating power pill • Games with blended tasks • Unreal Tournament 2004 • Fight while avoiding damage • Fight or run away? • Collect items or seek opponents?
Conclusion • Domains with multiple tasks are common • Both in real world and games • Multimodal networks improve learning in multitask games • Will allow interesting/complex behavior to be developed in future
Questions? Jacob Schrum – schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu University of Texas at Austin Department of Computer Science