1 / 72

What? Why? How?

Learning to Win. David W. Aha Head, Intelligent Decision Aids Group Navy Center for Applied Research in AI Naval Research Laboratory (Code 5515) Washington, DC david.aha@nrl.navy.mil home.earthlink.net/~dwaha. What? Why? How?. 15 April 2005; Georgia Institute of Technology.

bcook
Télécharger la présentation

What? Why? How?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning to Win David W. Aha Head, Intelligent Decision Aids Group Navy Center for Applied Research in AI Naval Research Laboratory (Code 5515) Washington, DC david.aha@nrl.navy.mil home.earthlink.net/~dwaha • What? • Why? • How? 15 April 2005; Georgia Institute of Technology

  2. Cognitive agents that learn Evaluation TIEL T (Case-based Tactician) AAAI’05 Game Competition IJCAI/ICCBR Workshops NRL Transfer Learning Ron Brachman USC/ICT Lehigh U. U. Mich. Mad Doc Software NWU UT Arlington UT Austin UWisc. Troika Games UT Austin “Outline” Testbed for Integrating and Evaluating Learning Techniques

  3. Reflective Processes LTM CognitiveAgent Concepts STM Deliberative Processes Other reasoning Sentences Communication (language, gesture, image) Prediction, planning Perception Action Reactive Processes Sensors Effectors Affect External Environment Attention (Brachman, 2003) TIELT: Objective Reduce the effort required to evaluate learning techniques on performance tasks from complex simulators. Learning

  4. Few deployed cognitive systems integrate techniques that exhibit rapid & enduring learning behavior on complex tasks It’s costly to integrate & evaluate embedded learning techniques Complication Machine learning research has tended to focus on: ¬Rapid: Knowledge poor algorithms ¬Enduring: Learning over a short time period ¬Embedded: Stand-alone evaluations Learning in Cognitive Systems Status

  5. Limitations (?) of Some ML Research

  6. Supervised Learning Supervised Learning ML System ML System Reasoning System Reasoning System Interface (standard format) Interface (standard format) Database Database (e.g., UCI Repository) (e.g., UCI Repository) Cognitive Learning Cognitive Learning Cognitive Learning Decision Systemk Reasoning Modules Reasoning Modules 2005 ML Module ML Module ML Module Sensors Sensors Sensors Supervised Learning ML Module ML Module ML Module World (Simulated/Real) World (Simulated/Real) Worldi (Simulated/Real) Interface (standard API) Interface (standard API) Interface (standard API) ML Module ML Module ML Modulej ML Systemj Decision Systemk Interface (standard format) Databasei Effectors Effectors Effectors (e.g., TIELT) (e.g., TIELT) (e.g., TIELT) (e.g., UCI Repository of ML Databases) A Vision for Supporting Cognitive Learning 1986

  7. TIEL T Simulator1 Decision System1 m*n integrations Simulator1 Decision System1 . . . . . . Simulatorm Decision Systemn Proposed Solution: Middleware m+n integrations Simulator1 Decision System1 . . . . . . Simulatorm Other Goals • Permit benchmark studies on selected simulator tasks • Encourages study of learning on knowledge-intensive problems • Provide support for DARPA Challenge Problems on Cognitive Learning Goal: Reduce Integration Costs Problem: Simulator/system integrations are expensive! (time, $) 1 integration Decision Systemn

  8. Simulator Desiderata • Adversarial • Realistic relational data sources • e.g., Organizational (social networks), temporal, spatial • Challenging environments • Uncertainty (“fog of war”) • Noisy • Real-time (not assumed by most learning systems) • Collaborative, multiple reasoning levels • e.g., strategic, tactical • Requires multi-echelon decision making (military analogue)

  9. Characteristics of Adversarial Games • Several AI research challenges • e.g., huge search spaces, adversarial, collaborative, real-time (some), on-line, uncertainty, multiple reasoning levels, relational (social, temporal, spatial) data • Feasible: Breakthroughs are expected to occur (w/ funding) • Inexpensive • Military analog (i.e., training simulators involving CGF) • Popular (with public, industry, academia, & military)

  10. Interest in Complex Simulation Games Academia: Pushing research boundaries • Interactive Computer Games: Human-Level AI’s Killer Application (Laird & van Lent, AAAI’00 Invited Talk) • Meetings (e.g., AAAI symposia/workshops, AI-IDE’05) • Journals (e.g., J. ofGame Development, IJIGS) Industry: Improving the gaming experience • USA: $7B in sales in 2003 (ESA, 2004) • Focusing on Game AI (e.g., GDC’03 Roundtable Report) • Great interest, but few successes (e.g., B&W) • Many simulators (e.g., SimCity, Quake, SoF, UT) Military: Training, analysis, & experimentation • Learning: Acquisition of new knowledge or behaviors • Many simulators, but none incorporate learning (Reece, 2003) • Concerns: Learning non-doctrinal & unpredictable behaviors

  11. Gaming Genres of Interest(modified from (Laird & van Lent, 2001) & (Fairclough et al., 2001))

  12. TIELT: Description • A free, supported tool for integrating decision systems with simulators • Users define their own APIs (message format and content) • “As-is” license • Initial focus: Evaluating learning techniques in complex games • Learning foci: Task, Player/Opponent, or Game Model • Targeted technique types: Broad coverage • Supervised/unsupervised, immediate/delayed feedback, analytic, active/passive, online/offline, direct/indirect, automated/interactive

  13. Game Engine Library Decision System Library Reasoning System Decision System Reasoning System Stratagus Learning Module Learning Module Learning Module EE2 Full Spectrum Command . . . . . . . . . . . . Learning Module Learning Module Learning Module Functional Architecture TIELT’s User Interface Advice Interface Evaluation Interface Prediction Interface Coordination Interface TIELT User TIELT’s Internal Communication Modules Selected Game Engine Selected Decision System Learned Knowledge (inspectable) Game Player(s) TIELT’s KB Editors Selected/Developed Knowledge Bases Game Model Game Interface Model Decision System Interface Model Agent Description Experiment Methodology TIELT User Knowledge Base Libraries GM AD EM GIM DSIM GM AD EM GIM DSIM GM AD EM GIM DSIM

  14. Knowledge Bases Game Interface Model Defines communication processes with the game engine Decision System Interface Model Defines communication processes with the decision system Game Model • Defines interpretation of the game • e.g., initial state, classes, operators, behaviors (rules) • Behaviors could be used to provide constraints on learning Agent Description • Defines what decision tasks (if any) TIELT must support • TMK representation Experiment Methodology Defines selected performance tasks (taken from Game Model Description) and the experiment to conduct

  15. Game Engine Library Game Player(s) Decision System Library Reasoning System Decision System Reasoning System Processed State Stratagus Learning Module Learning Module Learning Module Raw State EE2 Action Decision Full Spectrum Command . . . . . . . . . . . . Learning Module Learning Module Learning Module Use: Controlling a Game Character TIELT’s User Interface Advice Interface Evaluation Interface Prediction Interface Coordination Interface TIELT User TIELT’s Internal Communication Modules Selected Game Engine Selected Decision System Learned Knowledge (inspectable) TIELT’s KB Editors Selected/Developed Knowledge Bases Game Model Game Interface Model Decision System Interface Model Agent Description Experiment Methodology TIELT User Knowledge Base Libraries GM AD EM GIM DSIM GM AD EM GIM DSIM GM AD EM GIM DSIM

  16. Game Engine Library Game Player(s) Decision System Library Reasoning System Decision System Reasoning System Prediction Processed State Stratagus Learning Module Learning Module Learning Module Raw State EE2 Full Spectrum Command Prediction . . . . . . . . . . . . Learning Module Learning Module Learning Module Use: Prediction TIELT’s User Interface Advice Interface Evaluation Interface Prediction Interface Coordination Interface TIELT User TIELT’s Internal Communication Modules Selected Game Engine Selected Decision System Learned Knowledge (inspectable) TIELT’s KB Editors Selected/Developed Knowledge Bases Game Model Game Interface Model Decision System Interface Model Agent Description Experiment Methodology TIELT User Knowledge Base Libraries GM AD EM GIM DSIM GM AD EM GIM DSIM GM AD EM GIM DSIM

  17. Game Engine Library Game Player(s) Decision System Library Reasoning System Decision System Reasoning System Processed State Stratagus Learning Module Learning Module Learning Module Raw State EE2 Edit Full Spectrum Command . . . . . . . . . . . . Learning Module Learning Module Learning Module Game Model Edit Use: Game Model Revision TIELT’s User Interface Advice Interface Evaluation Interface Prediction Interface Coordination Interface TIELT User TIELT’s Internal Communication Modules Selected Game Engine Selected Decision System Learned Knowledge (inspectable) TIELT’s KB Editors Selected/Developed Knowledge Bases Game Model Game Interface Model Decision System Interface Model Agent Description Experiment Methodology TIELT User Knowledge Base Libraries GM AD EM GIM DSIM GM AD EM GIM DSIM GM AD EM GIM DSIM

  18. Game Engine Library Decision System Library Reasoning System Decision System Reasoning System Stratagus Learning Module Learning Module Learning Module Selected Game Engine Selected Decision System EE2 Full Spectrum Command . . . . . . . . . . . . Learning Module Learning Module Learning Module Game Interface Model Decision System Interface Model Game Model Agent Description Experiment Methodology A Researcher Use Case • Define/store decision system interface model • Select game simulator & interface • Select game model • Select/define performance task(s) • Define/select expt. methodology • Run experiments • Analyze displayed results Selected/Developed Knowledge Bases Knowledge Base Libraries GM AD EM GIM DSIM GM AD EM GIM DSIM GM AD EM GIM DSIM

  19. TIELT: Status Features (v0.6a) • Pure Java implementation • Message protocols • Java Objects, Console I/O, TCP/IP, UDP • Note: Now supports external shared memory • Message content: User-configurable • Instantiated templates tell it how to communicate with other modules • Messages • Start, Stop, Load Scenario, Set Speed, … • Agent description representations (w/ Lehigh University) • Task-Method-Knowledge (TMK) process models • Hierarchical Task Networks (HTNs) • Experimental results can be stored in databases supporting JDBC or ODBC standards (e.g., Oracle, MySQL, MS Excel) • 100 downloads from 50 people (half are collaborators)

  20. First release: 10/04. Will achieve beta status in ~3 months TIELT User’s Manual, Tutorial, publications e.g., for uploading/obtaining TIELT KBs Allows users to notify Matt about bugs, and for him to respond to them. • Downloads as of 22 March 2005: • 44 people (50% collaborators), 87 total TIELT: http://nrlsat.ittid.com

  21. Current Development Priorities Speed: Compile TIELT Script XML to Java GUI: • Parse interface code from free text (e.g., emacs plugin) • Simplify deletion of variables • Graphical depiction of system information flow Language: • TSL: Higher level data structures (e.g., Vector, Hash) • Game Model: Parameterized game events State: Separate agent state from game state Experiment Methodology: • More analytical tools • Link independent variables with the other models Debugging: • Real-time view of the Game Model • More powerful logging scheme (e.g., access to log levels)

  22. Platform Library Stratagus U. Minn-D. Lehigh U. Urban Terror FSC/R UT Arl. USC/ICT RoboCup FY05 Collaboration Projects User Interfaces TIELT User Advice Interface Evaluation Interface Prediction Interface Coordination Interface U.Mich. U.Minn-D. USC/ICT Game Library TIELT Test Bed Decision System Library ISLE: ICARUS Lehigh: Case-based planner U.Michigan: SOAR UTA: DCA UT Austin: Neuroevolution ToEE EE2 Mad Doc Troika FreeCiv ISLE NWU KB Editors Knowledge Base Library Many LU, USC U. Mich. Mich/ISLE Many Game Interface Model Decision System Interface Model Game Model Agent Descriptions Experiment Methodology TIELT User

  23. TIELT APIs: Status

  24. Stratagus • Free Real-time strategy (RTS) game engine • Player objective: Achieve some game-winning condition in real time in a simulated world • Data sets (i.e., games): 8 exist of varying maturity • Multiple unit types • Combat follows rock-paper-scissors principle

  25. Wargus Magnant Battle of Survival Battle for Mandicor Aleona's Tales RoboVasion Acorn Hunt GnomeGnation Stratagus Data Sets Wargus • Clone of Warcraft II • Easily modified by changing the various unit properties • Some characteristics • Real-time • Partial observability • Adversaries • Make asynchronous modifications to the state • Have unknown models available for download

  26. Wargus (informal intro) Goal: Defeat opponents, build Wonder, etc. Measure: Score = f(time, achievements) Map: Size (e.g., 128x128), location, terrain, climate Time Stamp: Real-time advancement • State: • Map (partially observed) • Resources • Units • Attributes: Location, Cost, Speed, Weapon, Health, … • Actions: Move, attack, gather, Build generator, … • Generators (e.g., buildings, farms) • Attributes: Location, cost, maintenance, … • Actions: Build units, Conduct research, Achieve next era • Consumables (used to build): Gold, wood, iron • Research achievements: Determines what can be built • Actions pertain to • Generators • Units • Diplomacy • Adversaries: State for each (partially observed)

  27. Wargus: Decision Space Analysis Variables • W: Number of workers • A: Number of possible actions • P: Average number of workplaces • U: Number of units • D: Average number of directions a unit can move • S: Choice of unit’s stance • B: Number of buildings • R: Average number of research objectives • C: Average number of buildable units Decision complexity per turn (for a simple game state) • O( 2W(A*P) + 2U(D+S) + B(R+C)) ; this is a simplification • This example: W=4; A=5; P=3; U=8 (offscreen); D=2; S=3; B=4; R=1; C=1 • Decision Space =1.4x103 • Exponential in the number of units • Motivates the use of domain knowledge (to constrain this space) • Comparison: Average decision space per chess move is ~30

  28. Background Knowledge Background Knowledge Learning to Win in Wargus Ponsen, M.J.V., Muñoz-Avila, H., Spronck, P., & Aha, D.W. (IAAI’05). Automatically acquiring adaptive real-time strategy game opponents using evolutionary learning. Tactics1 Tactics1 Game Opponent Scripts Evolved Counter- Strategies Dynamic Scripting Genetic alg. . . . . . . Tactics20 Tactics20 Initial Policy (i.e., equal weights) Opponent-Specific Policy f(Statei)=Tacticij Action Language Goal: Eliminate all enemy units and buildings Wargus

  29. Low level actions are hard-coded in the engine An “action language” Game AI is defined in a script (action sequence) in LUA Opponent AI in Wargus AiNeed(AiCityCenter) AiWait(AiCityCenter) AiSet(AiWorker,4) AiForce(1, {AiSoldier, 1}) AiWaitForce(1) AiAttackWithForce(1) ...

  30. Evolved Counter-Strategies Buildings A State Space Abstraction for Wargus A lattice of 20 “building states” • Manually predefined • Node = building state • Corresponds to the set of buildings a player possesses, which determines the units and buildings that can be created, and the technology that can be pursued • Arc = New building (state transition) • Tactic = Sequence of actions within a state • A sub-plan Evolved_SC5

  31. One action Genetic Chromosomes (one per script):The source for Tactics Chromosome (a complete plan) Action Sequence = “Tactic” (for 1 Building State)

  32. Space of Tactics A1 A11 A1n … A2n A2 A21 … …… …… …………… Amn Am … Am A Decision Space Abstraction for Wargus Space of Primitive Actions

  33. Tactics1 Tactics1 Dynamic Scripting . . . . . . Tactics20 Tactics20 Initial Policy (i.e., equal weights) Opponent-Specific Policy f(Statei)=Tacticij Dynamic Scripting Learns to assign weights to tactics for a single opponent • Consequences: • Separate policies must be learned per opponent • the opponent must be known (a priori) Our Goal: Relax the assumption of a fixed adversary

  34. Case-Based Reasoning (CBR)Applied to Games (partial summary)

  35. Learning to Win in Wargus(versus random opponents) Aha, D.W., Molineaux, M., & Ponsen, M. (ICCBR’05). Learning to win: Case-based plan selection in a real-time strategy game. Sources of domain knowledge • State space abstraction (building state lattice) • Decision space abstraction (tactics) • f(Statei,Tacticij) = Performance Evolved Counter- Strategies Evaluation Interface Matt & Marc Game Opponent Scripts TIELT’s Internal Communication Modules Editors Tactics1 Wargus CaT . . . Matt Tactics20 Knowledge Bases (Case-based Tactician)

  36. A value in [0,1] that reflects the utility of choosing this tactic for this BuildingState An index into the building state lattice A selected counter-strategy’s action sequence (for this BuildingState) CaT: Case Representation C = <BuildingState, Description, Tactic, Performance>

  37. CaT: Retrieval • Tactic retrieval occurs when a new building state b is entered • retrieve(b,k,e) • d = description() // 8 features • IF (#casesb < e * #tacticsb) • THEN t = least_used(#tacticsb) // then explore • ELSE C = max_performer(max_similar(d, casesi, k)) • t = CTactic • Return t Sim(C, d) = (CPerformance/dist(CDescription, d)) - dist(CDescription, d) CPerformance =i=1,nCPerformance,i /n CPerformance,i= ½(ΔScorei + ΔScorei,b) ΔScorei = (Scorei,p-Scorei,p,b)/( (Scorei,p-Scorei,p,b) + (Scorei,o-Scorei,o,b)) ΔScorei,b = (Scorei,p,b+1-Scorei,p,b)/( (Scorei,p,b+1-Scorei,p,b) + (Scorei,o,b+1-Scorei,o,b))

  38. Wargus: Defend_City task Snapshot CaT: Reuse • Adaptation not controlled by CaT • Instead, it’s performed by the build-in primitives of Wargus • e.g., if an action is to create a building, the game engine determines its location and which workers will build it

  39. CaT: Retention • Execute all selected tactics, recording scores for player and opponent at beginning/end of each building state • Game ends when a win occurs or it times out (10min) • For each building state: • IF a case C matches the recorded <description,tactic> pair • THEN update CPerformance (via averaging) • ELSE store a new case for this building state CaT acquires at most 1 case per building state per game

  40. CaT: Evaluation • Competitors • Uniform: Select a tactic randomly (uniform distribution) • Best Evolved Counter-Strategy: • Test each of 8 counter-strategies (i.e., the best ones evolved vs. each opponent strategy) • Record the one that performs the best on randomly selected opponents • CaT: Tested on randomly-selected opponents (uniform distr.) Hypothesis: CaT can significantly outperform (i.e., attain a higher winning percentage and higher average relative scores) Uniform and the evolved counter-strategies

  41. Variables Fixed: k=3, e=3 // e: Exploration parameter Independent: Amount of Training Dependent: Win percentage, relative score Strategy: LOOCV (leave-one-out cross validation) Train on 7 opponents, test on eighth (10x after every 25 games) Trial: 140 games (100 for training, 40 for testing) Empirical Methodology

  42. CaT Average Performance of the evolved counter-strategies Best evolved counter-strategy Results I: Win% Performance 0.9 0.8 0.7 0.6 0.5 % games won vs. test set (LOOCV) 0.4 0.3 0.2 0.1 0 25 50 75 100 # Training Trials Result: CaT outperforms the best counter-strategy, but not significantly • Significance test: • One-tail paired two sample t-test gives t=.18: 82% probability that CaT is better than best counter-strategy. (However, performance is significantly better at the .005 level against the second best counter-strategy.) • Paired sample t-test is appropriate when two algorithms are compared against identical problem; we use to compare performance against same opponent

  43. CaT Average Performance of the evolved counter-strategies Best evolved counter-strategy Result: Outperforms best counter-strategy in terms of score (strongly significant) Results II: Score Performance # Training Trials • Significance test: • One-tail paired two sample t-test gives p=.0002: 99.98% probability that CaT is better than the best counter-strategy

  44. Potential Future Work Foci • Reduce/eliminate non-determinism • Optimize state descriptions (e.g., test for correlations) • Stop cheating! • Tune CaT’s parameters (i.e., k and e) • Plan adaptation • Compare vs. an extension of dynamic scripting • Permit random initial states • Multiple simultaneous adversaries • On-line learning

  45. e.g., Knight-Zone Chess AAAI’05 General Game Playing (GGP) Competition(M. Genesereth, Stanford U.) Goal: Win any game in a pre-defined category • Initial category: “Chess-like” games • Games are produced by a game generator • Input: Rules on how to play the game • Move grammar is used to communicate actions • GDL: Language based on relational nets • Output (desired): A winning playing strategy Annual AAAI Competition • WWW: games.stanford.edu • AAAI’05 Prize: $10K

  46. GGP Games (2005) Dimensions One player or n players Simultaneous or alternating play x Complete or incomplete information x Competition and Cooperation Current Examples Blocks Minichess Buttons and Lights Endgame Maze Corridor Tictactoe Roshambo Ticticoe Diplomacy

  47. Integration Architecture W W W Reference Opponents TIELT-Ready GGP Competitors GGP Game Manager TIELT . . . GGP Competitors GGP-TIELT Integration Use Case Researcher develops a General Game Player competitor Connects to GGP Game Manager through TIELT Using TIELT, designs experiment for playing n matches of m different General Games, against reference opponents and unknown GGP competitors Runs experiment online through GGP Game Manager Evaluates experimental results using TIELT evaluation methodologies and GGP performance data available online

  48. Upcoming TIELT-Related Events General Game Playing Competition • AAAI’05 (24-28 July 2005; Pittsburgh) • Chair: Michael Genesereth (Stanford U.) • We our integrating TIELT with GGP, & attracting additional competitors TIELT-Related Workshops • At IJCAI’05 (31 July 2005; Edinburgh) • Reasoning, Representation, and Learning in Gaming • Co-chairs: Héctor Muñoz-Avila, Michael van Lent • 21 submissions • At ICCBR’05 (24 August 2005; Chicago) • Computer Gaming and Simulation Environments • Co-chair: David C. Wilson TIELT Challenge Problems Design: 15 April Code: 15 May

  49. City Control Challenge Problem (MDS) Messages received from the decision system Pressing the “T” button calls on the TIELT-connected decision system to identify the enemies closest to the TIELT-controlled units and calculates the appropriate force to send into battle. Real-time strategy game, where the goal is to gain control over city blocks that vary in utility, and require different levels of control. Design due 4/15, implementation due 5/15, in time for summer workshops.

  50. Initial Challenge Problem (CP) Components • State S: • Map M= {B11,…,Bmn} ; A map consists of a set of city blocks • City Block Bij • features(Bij) = {f1(),…,fl()} ; A block has a set of features (e.g., location, utility) • objects(Bij)O • Objects O: objectType(), location(), status(), … • Units U: player(), type(), actions(), weapon(), armor(), … • Resources R: type(), amount(), … • Categories: Controllable (e.g., petrol, electricity), natural (e.g., uranium, water) • Actions A: • apply(Ui,Aj,S) = S ; Units can apply actions to modify the state • Tasks T: type(), preconditions(), subtasks(), subtaskOrdering(), ... • Categories: Strategic, tactical (where primitive tasks invoke actions) • CP = {M,S0,SG,F,R,E} • M: Map • SG: Goal is to achieve a state (i.e., “control” of specified city blocks) • e.g., {(Bij, Gij) | BijM & GijG} (1≤i≤m, 1≤j≤n) • Goals can be achieved by tasks or actions • F: Friendly units ; Controllable units • R: Resources ; Objects that can be used in actions • E: Enemy units ; Non-controllable adversary units

More Related