Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett

Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise ICARUS

Transfer Learning Site Visit August 4, 2006 Results from Year 1 for the ISLE Team

ISLE: Transfer in ICARUSPI: Pat Langley ICARUS Architecture Testbeds: Urban Combat and GGP Contains descriptions of the perceived objects Generates beliefs using observed environment and long term conceptual knowledge Contains relational and hierarchical knowledge about relevant concepts Creates internal description of the perceived environment Perceptual Buffer Long-Term Conceptual Memory Short-Term Conceptual Memory Conceptual Inference Perception Contains inferred beliefs about the environment Selects relevant skills based on beliefs and goals Environment Skill Retrieval Contains hierarchical knowledge about executable skills Finds novel solutions for achieving goals Contains goals and intentions Long-Term Skill Memory Short-Term Goal/Skill Memory Problem Solving Skill Learning Skill Execution • First-person real-time shooter game • Goal: find and defuse IEDs • Addressed by learning new skills • Logically defined arbitrary rules of play • Addressed by learning value function over game states Motor Buffer Acquires new skills based on successful problem solving traces Executes skills on the environment • Results • Urban Combat: Evaluation ongoing • GGP: Transfer ratio of 1.3 on TL 7, jump start of 20.. Architecture Components • Conceptual inference: Icarus performs bottom-up inference from relational ground state literals to higher level state concepts. • Skill execution: Icarus retrieves relevant skills for goals and executes them reactively. • Skill learning: Icarus acquires general hierarchical reactive skills that explain/generate successful solution paths. • Value learning: Icarus employs reinforcement learning to acquire a value function over game states using a factored state representation (hierarchy of first-order predicates)

University of Michigan: Transfer in SoarPI: John Laird Soar Problem/Objective Long-Term Memories • Study transfer learning using multiple online architectural learning mechanisms • Chunking (EBL) • Reinforcement Learning, • Semantic Learning • Episodic Learning • Determine strengths and weaknesses • Develop reasoningstrategies that maximize transfer Procedural Episodic Semantic Reinforcement Learning Chunking Semantic Learning Episodic Learning Short-Term Memory Decision Procedure Perception Action Body Payoff Solution Approach/Accomplishments • Fair comparison of learning mechanisms • All use same performance system • Integration and synthesis of multiple learning mechanisms and reasoning strategies on same problem • Not reliant on one mechanism • Best technique used for given problem • Positive interaction between methods • Integrated Soar & Urban Combat Testbed • Three learning approaches in UCT • Levels 0-2 • Significant transfer

Northwestern University: CompanionsPIs: Kenneth D. Forbus, Thomas Hinrichs Problem/Objective Learning to solve problemsby studying worked solutions • Extend Companion Cognitive Systems architecture to achieve transfer learning • Advance analogical processing technology • Develop techniques for learning self-models • Test using ETS Physics testbed ETS-generatedAP Physics test Learned strategies, encoding rules, and cases Worked Solutions Sketches included Y2-3 Payoff Solution Approach/Accomplishments • New techniques for robust near and far transfer learning • Advances can be incorporated in other cognitive architectures, systems • Near term: Analogy Servers • Long-range: Companions architecture used in military/intelligence systems • Today’s cluster is tomorrow’s multi-core laptop • Analogy approach based on how humans seem to do transfer • Study worked solutions to learn equations, modeling assumptions • e.g., when could something be a point mass? • Pilot experiment: Achieved transfer levels 1, 3, & 5

UT Arlington: Urban Combat Testbed (UCT)PIs: L. Holder, M. Youngblood, D. Cook Vision/Goals • Develop Urban Combat Testbed (UCT) capable of generating tasks to evaluate transfer learning performance • Conduct significant human trials to evaluate human transfer learning performance • Disseminate UCT to community as a benchmarking tool for cognitive performance • Investigate novel cognitive architectures for achieving transfer learning in Urban Combat and similar domains • Achieve 70% of human transfer learning performance Technical Details Highlights • Develop Urban Combat Testbed (UCT), a simulated, real-time, urban combat domain • Agent interface provides detailed, real-time perceptual information and command execution • Human interface provides compelling video interface and keyboard/mouse command interface • Develop scenarios for human and agent trials for each level of transfer • Execute human and agent trials, compare transfer learning performance • Investigate other approaches to transfer learning • Human transfer learning • Hierarchical reinforcement learning • Agent-based cognitive architectures • UCT version 1.0 available • Based on Quake 3 Arena first-person shooter (FPS) game • Enhanced to include realistic urban combat environments • Agent version provides interface to game percepts and commands • Human-player version provides standard interface as in commercial FPS games • Under development • Set of scenarios to evaluate different levels of transfer learning • Random generation of scenarios • Ability to log game interaction

UT Arlington: Reinforcement LearningPI: M. Huber Skill Hierarchy Concept Hierarchy Benefits and New Capabilities Technical Approach Transfer of skill and concept hierarchies from training to transfer tasks • Transfer skills and concepts are found automatically and carry probability and value attributes • Transfer skills are extracted based on local system characteristics in the task domain • Sub-skills are reward-independent • Transfer skills have an associated probabilistic model • Hierarchical concepts capture capabilities of the skill set • Concepts capture probabilistic behavior of skills • Concepts capture value attributes of the task domain • Generated representation hierarchy and refinement process have bounded optimality properties • Policies learned on the representation are within a bound of optimal Selective, task-specific state space construction • Extraction of sub-skills using subgoal discovery • Learning of concepts characterizing skill capabilities • Transferred concepts and skills are used to construct a more abstract Bounded Parameter state representation • Learning on new, more compact representation leads to improved learning performance Hierarchical state representation Skill and concept extraction Task learning Example and Performance Integration and Deliverables • The approach provides skill and concept hierarchies for use as representations by reasoning systems • Provides probabilistic and utility information to representation hierarchies in ICARUS • Explicit tie of reasoning structure and reinforcement learning • Generates new, capability-specific concepts that could serve as new predicates in Markov Logic Networks (MLN) • Probabilistic attributes can facilitate fast integration into MLN • Integration and Delivery Milestones • Integration: ICARUS int. MLN MLN/ICARUS • Development: Skill utility Skill generalization Skill extension • Deliverables: Prototype w. Prototype w. Final system • UCT interface skill gen. • Urban Combat Testbed (UCT) • Training task: Go to flag • Transfer task: Retrieve different flag • Transfer from training to transfer task • 29 sub-skills and associated concepts • Reduction from 20,000 to 81 states • Transfer Performance (Transfer Ratio - TR) • TR 2.5 with skill transfer • TR 5 with skill and concept transfer Year 1 Year 2 Year 3

University of Washington: Markov LogicPI: Pedro Domingos Problem/Objective Weight Learning ILP • Transfer learning requires: • - Relational inference & learning • - Uncertain inference & learning • Markov logic provides this • - Simple, general, unified framework • Needs: • - Scaling to large problems • - Online, “lifelong” operation • - Extension to continuous data • - Extension to decision-making Markov logic Source Domain Target Domain WalkSAT MCMC Approach/Accomplishments Payoff • Key approaches: • - Representation mapping • - Statistical predicate invention • Accomplishments to date: • - LazySAT: Efficient use of memory • (400,000 X less than WalkSAT on BibServ) • - MC-SAT: Fast mixed inference • (>1000 X faster than Gibbs, tempering) • - Alchemy system • - Collaborated on integration w/ Icarus, etc. • Enables highest levels of transfer • - Between relational structures, • as opposed to surface descriptions • Enables transfer “in the wild” • - Noisy, rich, real-world domains • - As opposed to shoehorning problems • into standard machine learning form • Broadly applicable AI technology • -Greatly increases speed of adaptation

CycorpPI: Michael Witbrock Problem/Objective: Cyc KB (background knowledge) Collect knowledgerelevant to a task,domain, or problem • Knowledge-based transfer learning • Supply background knowledge and well-encoded, logically meaningful domains and problem spaces • Elaborate on background knowledge and knowledge gathered from source tasks and domains • Informed by existing background knowledge in Cyc SourceTestbed TargetTestbed • Elaborate on knowledge: • Inferential expansion • Probabilistic weighting • Rule formation (ILP) Situation,Status, &Queries New Facts: Automated KnowledgeAcquisition New Rules and Skills: RuleInduction Expanded Knowledge: Inference & Markov Logic Perform inference;supply advice, queryresults, backgroundknowledge ExecutionAgent(s) Advice &Support Payoff Solution Approach/Accomplishments • Information flow among complementary learningand transfer mechanisms and approaches • Establish a well-founded, mutually compatible baseof assumptions and facts – necessary for transfer • Allow systems to communicate observations, conclusions, skills, memories and intentions • Learning can take full advantage of existing background knowledge, knowledge from less- obviously related domains and problems • Representation of initial domains and solutions • Existing knowledge relevant to domains identified • Physics testing domain: encoding developed, first transfer level problems represented in Cyc • Urban Combat (FPS) testbed: map space semantics defined; distribution being developed • Initial integration of probabilistic reasoning • System integrated, scalability testing underway • Alchemy system Extended • Rule and skill learning underway • First utomatically-generated resultsfrom evaluation domains • Application of work from BUTLER seedling • New high-level, semantically connected knowledge, within a context of existing knowledge: understanding

Maryland/Lehigh: Hierarchical Task NetsPIs: Dana Nau, Héctor Muñoz-Avila Problem/Objective Scenario Generator • Learn applicability conditions of HTN methodsthat tell how to decompose tasks into subtasks • Input: plan traces produced by an expert problem-solver • Reflects abstraction levels in the game • Output: methods consistent with plan traces • Can be transferred in different games TIELT HDL++ Learning Agent MadRTS real-time strategy game Statistical methods to compare learning curves Solution Approach/Accomplishments Payoff for TL • HTNs represent knowledge of different granularity at different levels • Facilitates transfer to different games • Increasingly capable HTN learning algorithms • Y1: transfer levels 1-3 • Y2: transfer levels 4-7 • Y3: transfer levels 8-10 • Approach: our new HDL algorithm • Canstart with no prior information • Can start with info transferred from a previous learning session • Accomplishments: • Development of the HDL algorithm • Theoretical conditions in which HDL achieves full convergence [paper at ICAPS-06] • Experiments: even when only halfway to convergence, HDL solved > 3/4 of test set

Rutgers University: Relational TemplatesPI: Michael Pazzani Approach Problem/Objective • Learn templates from Markov Logic Networks (MLNs) • Learn Markov Logic Networks (MLNs) from templates Payoff Solution Approach/Accomplishments • Constraining Learning of MLN clauses • Creating template from MLN clauses by Least General Generalization Speed Up • Learning general concepts and strategies applicable across many domains, • Transitivity • thwarting, feigning

UT Austin: Theory RefinementPI: Mooney Problem/Objective Summary • Develop transfer learning methods for Markov Logic Networks (MLNs) that: • Efficiently revise structure and parameters of learned knowledge from source domain to fit novel target domain. • Automatically recover an effective mapping of the predicates in the source domain to those in the target domain. • Faster learning in target domain by efficiently transferring probabilistic relational knowledge using bottom-up theory refinement. • Determine appropriate predicate mapping by searching possible mappings to find the most accurate for the target domain. • Use relational path-finding to more effectively construct new clauses in the target domain. Approach Experimental Results 2. Determine which parts of the source structure are still valid in target domain and which need to be revised; annotate source MLN accordingly. (Mihalkova & Mooney, ICML06-TL workshop) 1. Find an effective predicate mapping. • Alchemy and our transfer algorithm equally improve accuracy over learning from scratch. • Our approach decreases learning time and number of revision candidates significantly. 3. Specialize only overly-general clauses and generalize only overly-specific ones, leaving the good ones unchanged. 4. Search for additional clauses using relational path-finding.

UT Austin: Reinforcement LearningPI: Peter Stone X O X X O X X O O X X O X O X O O X Source Actions (a) 3v2 Hold 3v2 Pass1 3v2 Pass2 3v2 Hold 382 (100%) 0 (0%) 0 (0%) 3v2 Pass1 0 (0%) 330 (93%) 26 (7%) 3v2 Pass2 2 (<1%) 25 (8%) 297 (92%) Target Actions (a’) 4v3 Hold 227 (76%) 0 (0%) 71 (24%) 4v3 Pass1 1 (<1%) 174 (64%) 97 (36%) 4v3 Pass2 0 (0%) 133 (50%) 133 (50%) 4v3 Pass3 1 (<1%) 51 (24%) 163 (76%) Connect-3 (4x4, same opp) CaptureGo (3x3, same opp) Minichess (5x5) t.r. = 5.6 t.t.r. = 83% t.r. = 4.3 t.t.r. = 73% t.r. = 5.8 t.r. = 4.3 t.t.r. = 88% t.t.r. = 84% SME-QDBN I-TAC Problem/Objective β(A’)→A γ(S’)→S • Develop core architecture-independent unified transfer learning technology for reinforcement learning • Key technical idea: transfer via inter-task mapping • Generalization of value-function-based transfer • Automatic discovery of inter-task mapping • I-TAC (inter-task action correlation) • SME-QDBN (structure mapping + qualitative dynamic Bayes networks) • Value-function-based transfer and policy-based transfer • Focus on results in many domains • Transfer of knowledge among reinforcement learning tasks (within the same domain/testbed) • RoboCup Soccer, GGP • Compare with Icarus GGP performance 1  2 2 1  Technical Approach Results I-TAC SME-QDBN • Automatic discovery of inter-task mapping • I-TAC (inter-task action correlation) • Data-centered approach • Train a classifier to map state transition pairs to actions in the source • Use the classifier and state mapping to obtain the action mapping • SME-QDBN (structure mapping + qualitative dynamic Bayes networks) • Knowledge/model-centered approach • Represent action model using qualitative DBNs • Specialized and optimized SME for QDBNs, using heuristic search • RoboCup soccer • Value-function-based transfer: sarsa-learning, function approximators • Policy-based transfer: neuro-evolution (NEAT) • GGP: value-function-based • Using symmetry to scale up the same type of games • Identifying game-tree features to transfer among different types of games RoboCup GGP

UT Austin: Mapping Value FunctionsPI: Stone Objective • Current work focuses on automatic discovery of mappings of state variables and actions • Data oriented approach (I-TAC) • Model/knowledge oriented approach (structure mapping) • can be decomposed into two parts • Mappings of states () and actions () • Transforming representation of value functions (table-based or function approximation) • Model/knowledge oriented approach • Using knowledge about • How actions affects state variables? • How state variables relate to each other? • Use structure mapping to find similarities between source and target tasks • Discover β and γ together Structure Mapping Inter-Task Action Correlation (I-TAC) • Data oriented approach to automatic discovery of  • Considers mappings of states () and actions () separately (S’)→S (A’)→A • Assume that  is given. How can we learn β? Technical Approach Technical Approach • Representation: qualitative dynamic Bayes networks • Specialized and optimized SME for QDBNs • SME-QDBN uses heuristic search to find the mapping of the maximal score • Collect transition data in source domain • Train a classifier from state pairs to actions • Collect transition data in target domain, define  as β(a’) = arg maxa #{all tuples with a’ | C(γ(s’1),γ(s’2)) = a} • Generate local matches and calculate the conflict set for each local match; • Generate initial global mappings based on immediate relations of local matches; • Merge global mappings with common structures; • Search for a maximal global mapping with the highest score; Source Actions (a) Results Target Actions (a’) Results Keepaway match scores

UT Austin: Feature ConstructionPI: Stone • Feature extraction/matching based on abstract game-tree expansion upto 2 levels Objective Features discovered Feature discovery • Scale up from small to large version of same game • Simultaneous update of isomorphic states • Exploit symmetry to scale up RL in board games • Transfer between different small games • Table-based learning but transfer in feature space • Automated discovery of state-features • Initialization by feature-matching • Two person, complete-information, turn-taking games Technical Approach Results (rand opp) Othello, 4x4 Connect-3, 4x4 • Verify presence of symmetries on smaller task (larger task => too much memory) • Transfer knowledge to larger task (simultaneous backups for upto 8 transitions) tr =1.66, ttr=56.7% Minichess (5x5) tr~ 40, ttr~ 99% t.r. = 4.3 t.t.r. = 73% transfer Minimax lookahead Findings Limited look-ahead based features are quick to extract and match, few (manageable knowledgebase), highly common/reusable, and faster than minimax lookahead against suboptimal opponents Future Plan: Abstraction Matching

Transfer Learning Site Visit August 4, 2006 Proposal for Year 2 from the ISLE Team

Changes from Initial Plans Year 1 • Full integration did not happen in Y1 • Component systems (Icarus, Soar, Companions, LUTA, CaMeL) were developed independently and did not emerge as a single system • Ideas and/or subsystems of component efforts to be integrated in later years. • Little use of background knowledge in Y1 • Still believe it is critical for taking full advantage of transfer opportunities, but… • Y1 concentrated on basic navigation and problem solving without exploiting deep semantic domain knowledge • Markov logic not used in Y1 testbed evaluations • Initial integration with ICARUS is finished, but efficiency issues advised against its use for Y1 tasks • Improving efficiency is a top Y2 priority Year 2 • Continuing with three main architectures • Development of Component systems will continue. • Evaluation will focus on comparing and contrasting agent architectures. • Focus on highest transfer levels in all three testbeds – Urban Combat, Physics, and GGP • More interesting scientific results linked to key claims, but fewer total experimental conditions and less engineering • Management structure for project will change to a matrix organization

Year 2 Matrix Management Structure Michigan (Laird) Soar extension ISLE (Konik) Skill learning UT (Mooney) Theory Revision Technology work breaks down into extending Markov logic, integrating Markov logic and HTNs into ICARUS, and extending other agent architectures ISLE (Langley) Oversight ISLE (Langley) ICARUS UW (Domingos) Markov logic UW (Domingos) Alchemy NU (Forbus) Companions Maryland (Nau) HTN planning Rutgers (Pazzani) Rel. templates Technology Development Cyc. (Witbrock) CYC integr. UT (Stone) LUTA extension Experimental Evaluations ISLE (Stracuzzi) GGP evaluation ISLE (Choi) UC evaluation UT (Stone) LUTA on GGP Evaluation efforts focus on GGP (external), Urban Combat (internal), and ETS Physics (external), each used on two agent architectures WSU (Holder) UC extensions ISLE (Stracuzzi) ICARUS on GGP ISLE (Langley) Oversight ISLE (Shapiro) Urban Combat Michigan (Laird) UC evaluation NU (Forbus) Compns Physics Cyc. (Matuszek) Physics eval. WSU (Holder) Humans on UC ISLE (Konik) ICARUS Physics

Expected Year 2 Products Extended Alchemy software that includes: • Techniques for inventing new predicates that support mapping across domains • Methods for revising inference rules based on observed regularities (from UT Austin) • Methods for using relational templates to learn from few instances (from Rutgers) • Ability to access background knowledge from CYC (from Cycorp) Extended ICARUS software that includes: • Techniques for learning goal-oriented mappings that support transfer • More flexible inference using Alchemy as a central module (from Washington) • Extended methods for learning skills in adversarial contexts (from Maryland) • Methods for combining skill learning with value learning (from UT Austin) Extended versions of software for: Extended Urban Combat testbed that: • Soar (Michigan) that supports transfer by semantic learning and chunking • Companions (Northwestern) that supports transfer by deep structural analogy • LUTA (UT Austin) that achieves transfer by knowledge-based feature construction • Includes a richer variety of objects, activities, and spatial settings • Supports multi-agent coordination and multi-agent competition • Allows tests of high-level transfer from urban military operations to search and rescue activities

Claims about Transfer Learning • Claim: Transfer that produces human rates of learning depends on reusing structures that are relational and composable • Test: Design source/target scenarios which involve shared relational structures that satisfy specified classes of transformations • Example: Draw source and target problems from branches of physics with established relations among statements and solutions • Claim: Deep transfer depends on the ability to discover mappings between superficially different representations • Test: Design source/target scenarios that use different predicates and distinct formulations of states, rules, and goals • Example: Define two games in GGP that are nearly equivalent but have no superficial relationship • Meta-Claim: These claims hold for domains that involve reactive execution, problem-solving search, and conceptual inference • Test: Demonstrate deep transfer in testbeds that need these aspects of cognitive systems • Example: Develop transfer learning agents for Urban Combat, GGP, and Physics We will explore four paths to deep transfer: • Predicate invention for representation mapping in Markov logic (Washington) • Goal-directed solution analysis for hierarchical skill mapping (ISLE) • Representation mapping through deep structural analogy (Northwestern) • Semantic learning augmented with procedural chunking (Michigan)

ISLE Year 2 Plans for ICARUS Integration Plans ICARUS’ Unique Capabilities Washington Add methods for learning value fns Replace w/Alchemy inference software UT Austin Perceptual Buffer • Combine rapid analytic creation of hierarchical skills with statistical estimation of their utilities • Learn relational concepts that characterize the conditions under which skills achieve goals • Retrieve relevant skills even when the goals that index them match only incompletely • Acquire mappings among domain representations based on analysis of problem solution traces • Use these capabilities to support deep transfer Long-Term Conceptual Memory Short-Term Conceptual Memory Conceptual Inference Perception Augment with CYC knowledge base Environment Skill Retrieval Cycorp Incorporate HTN planning methods Maryland Long-Term Skill Memory Short-Term Goal/Skill Memory Problem Solving Skill Learning Skill Execution Motor Buffer Mapping Concepts and Skills Plans for Evaluation ICARUS will not only learn hierarchical skills and concepts, but also how they map across different settings source skills Urban Combat Problem Solving in Physics source concepts General Game Playing target skills target concepts We will demonstrate deep transfer in three separate testbeds with distinct characteristics.

Goals Demonstrate discovery / transfer of structural domain knowledge Build on Y1 success with first-order concepts Learn relationships among concepts to capture domain structure Expand learning of relative concept utility to revise concepts to improve utility Generalize existing concepts to expand coverage Specialize general concepts to improve utility Derive new concepts from game description Y2 Plans for ICARUS on GGP Challenges Domain Independence • Remove assumption of “chess-like” games • Expand beyond common board games, consider puzzles or games with many players • Concept learning and revision • Remove assumption that domain-specific concepts will be provided • Agent must discover new concepts or revise existing ones New Technology: Concept Revision 1. Learn new domain-specific concepts 2. Generalize these concepts to expand possible transfer opportunities 3. Specialize again in target domain to increase utility Details vary, but underlying structure unchanged New Technology: Concept Derivation 1. Derive basic concepts from game description 2. Evaluate utility through experience 3. Construct more complex structures by combining concepts and expanding derivation Return to description and derived concepts for further expansion Domain-specific concept (source) Specialized concept (target) Generalized concept GGP Game Description Simple derived concepts Complex structures

Michigan Year 2 Plans for Soar General Concept • Soar provides • Extreme flexibility in every phase of transfer • Multiple performance methods • Task-dependent knowledge for abstraction, transformation, instantiation • Multiple learning mechanisms Long-Term Memories Procedural Episodic Semantic Reinforcement Learning Chunking Semantic Learning Episodic Learning Short-Term Memory Appraisal Detector Decision Procedure Perception Action Body • Perform source task • Generate behavior • Sense environment • Create internal situational assessment • Perform target task • Generate behavior • Sense environment • Create internal situational assessment • Identify elements that might be useful • Everything, but literal (episodic) • Categories, structures (semantic) • Results of processing (chunking) • Explicit analysis (reflection) • Transform/map retrieved memory • Explicitly map to current situation or • Instantiate for current situation • Retrieve a memory based on transformed situation • Automatic (procedural) or • Deliberate (semantic/episodic) • Transform the current situation • When expecting or searching for transfer • Retrieve from memory a related concept • For some memories this will be automatic • For others, it will be deliberate • Create general concept/skill/… • Generalization based on multiple examples • Abstraction based on prior semantic knowledge • Use transfer memory to impact behavior • Control selection of actions • Decide on strategy or tactic • Store in memory for later recall • Different memories for different types of knowledge • Procedural, semantic, episodic Retrieve Transform Experience Identify Generalize Abstract Store Experience Use Transform Retrieve Source Problems Target Problems

Level 9 Transfer in Soar • Source: Hunted dies after getting trapped in a dead end • learns spatial configuration of dead end • learns dead end is deadly to hunted • Target: Hunter tries to chase hunted to a location it has recognized as a dead end. Long-Term Memories Procedural Episodic Semantic Reinforcement Learning Chunking Semantic Learning Episodic Learning Short-Term Memory Appraisal Detector Decision Procedure Perception Action Body • Perform source task • Hunted dies after getting trapped in a dead end • Perform target task • As hunter, tries to develop strategy for killing hunted • Identify elements that might be useful • Death is feedback that made mistake • Retrieve a memory based on transformed situation • Queries memory – what would be bad when I imagine myself as hunted? • Retrieves memory of dead end • Transform the current situation • Creates internal model of hunted • Create general concept/skill/… • Uses episodic knowledge to recall behavior that led up to death • Analyzes spatial configuration • Causal knowledge determines critical features • Use transfer memory to impact behavior • Searches for dead ends • Tries to “herd” hunted into dead ends • Store in memory for later recall • Stores dead-end concept in semantic memory and associates bad result Experience Identify Generalize Abstract Store Experience Use Transform Retrieve Source Problems Target Problems

Level 10 Transfer in Soar • Source: 1v1 • Learns to pick up ammo to deny enemy • Target: Fire rescue • Transforms to remove gasoline near fire Long-Term Memories Procedural Episodic Semantic Reinforcement Learning Chunking Semantic Learning Episodic Learning Short-Term Memory Appraisal Detector Decision Procedure Perception Action Body • Perform source task • Tries to kill enemy • Perform target task • As fire rescuer, try to search building (and avoid dieing, flames, etc.) • Identify elements that might be useful • Encounters experience when can pick up enemy ammo and realizes that would deny enemy ammo • Retrieve a memory based on transformed situation • Queries memory for ways to defeat an enemy • Retrieves general concept about resources • Transform the current situation • Analyzes situation • Determines that fire is its enemy • Create general concept/skill/… • Uses background knowledge to generalize to a concept of deny enemy its resources necessary to hurt me • Use transfer memory to impact behavior • Instantiates general concept in current situation [resource map to air and fuel – wood, gasoline, …] • Takes actions to eliminate fuel • Store in memory for later recall • Stores general concept in semantic memory Experience Identify Generalize Abstract Store Experience Use Transform Retrieve Source Problems Target Problems

Northwestern Year 2 Plans for Companions Foundation: Analogical Processing • Northwestern’s technology is based on how humans seem to do transfer – by analogy and similarity • Based on Gentner’s (1983) Structure-Mapping theory • Simulations of cognitive processes engineered into components in prior DARPA research • SME: Analogical matching, similarity estimation, comparison • SEQL: Generalization • MAC/FAC: Similarity-based retrieval Approach • Extend Companions Cognitive Systems architecture by • Creating and incorporating advances in analogical processing • Develop techniques to learn self-models to help formulate own knowledge goals • Compare Companions and ICARUS in physics testbed • Help ISLE and Cycorp integrate our representations and support libraries into ICARUS • Extend as necessary (e.g., sketching support) Metrics • Coverage = Fraction of time an answer is generated • Accuracy = Whether the answer is right (including partial credit) Companions Cognitive Systems Architecture • Structure-mapping operations appear to be heavily used throughout human reasoning and learning • Hypothesis: Can achieve human-like reasoning and learning by making structure-mapping operations central in a cognitive architecture

Northwestern Year 2 Plans for Far Transfer (7-10) Expanded self-modeling capabilitiesto improve skills and knowledge Need to study pulley problems more Need to figure out trig inverses Metamappingswill guide cross-domain analogies by first matching general knowledge Persistent Mappings store ongoing understanding of cross-domain analogy Advice can be about appropriate analogs, mappings, analogical inferences KB Analogical encodingwill let Companions work with more abstract advice “A battery is like a pump”

University of Washington Year 2 Plans Technologies Evaluation • Representation mapping • - Entities - Attributes - Relations • - Ontologies - Situations - Events • Based on statistical predicate invention • - Discover abstract relations, etc., & transfer • Infrastructure • - Efficient inference and learning • - Online, “lifelong” operation • - Extension to continuous data • - Extension to decision-making • Integrate into Icarus • Apply to Physics and Urban Combat • Transfer Level 8: • - 60% of human performance • - Transfer Levels 9-10: • - 30% of human performance • Infrastructure: - Component-wise evaluation - “White box” evaluation Integration Unique Contribution Predicate Invention Representation Mapping Infrastructure Alchemy Inferences Percepts ICARUS

UT Austin Year 2 Plans (Mooney) Source MLN Predicate Mapping Theory Refinement • Improve system’s ability to accurately map predicates from source to target domain. • Use schema-mapping techniques from information integration to suggest predicate mappings by analyzing source and target data. • Use lexical knowledge (e.g. WordNet) to guide matching of predicate names in source and target. • Use heuristic search to improve efficiency of finding the best overall predicate mapping. • Improve system’s ability to revise the structure of the source Markov Logic Network (MLN) • to fit the target domain. • Improve efficiency of clause generalization and specialization procedures by using bottom-up search to directly identify productive changes rather than blindly searching the space of possible refinements. • Improve generation of new clauses in the target domain by exploiting advanced ILP methods. Integration with Alchemy and ICARUS Evaluation on Testbeds • Evaluate MLN TL methods on ISLE Testbeds • In Urban Combat and other ISLE testbeds, measure the accuracy of transfer learning at making within-state inferences (using AUC) compared to learning an MLN from scratch by adapting knowledge from source to target tasks for several levels of transfer. • Measure training time of our system versus existing Alchemy to demonstrate improved efficiency. • Compare ablated version of Icarus without MLN-transfer-learning to enhanced version on final testbed performance metrics and demonstrate improved performance. • Incorporate MLN Transfer Learning Methods • into Alchemy and Icarus • Integrate predicate mapping and theory refinement methods into UW Alchemy MLN software package. • Integrate our transfer learning methods for MLNs into Icarus+Alchemy to provide transfer of static inferential knowledge from the source to the target domain. Source Training Data MLN Learner (Alchemy) Predicate Mapping Target MLN Target Training Data MLN Revision

Rutgers University Year 2 Plans Technologies Evaluation • Learning and Instantiating Templates • Based on second order learning - Discover general regularities & transfer • Entailment Learning • Combining Inductive and deductive learning • Discover simple rules and combine (e.g., OCCAM) • Infrastructure • Template Learner for Markov Logic Networks • Deductive Learner for Markov Logic Networks • Alchemy Integration into Icarus • Apply to Physics and Urban Combat • - Transfer Levels 9-10: • - 50% of human performance • Infrastructure: - Component-wise evaluation: Alchemy Integration Unique Contributions Templates Template Learning and Instantiation Learning by entailment MLNS Alchemy Inferences Percepts ICARUS

Cycorp Year 2 PlansTechnologies and Capabilities Problem/Objective: Collect knowledgerelevant to a task,domain, or problem Cyc background knowledge& inference capability • Knowledge-based transfer learning • Supply formalized domain expertise and well-encoded, logically meaningful domains and problem spaces • Elaborate on background knowledge via ILP and inference, provide advice, and extend knowledge gathered from source tasks and domains • Informed by existing background knowledge in Cyc • Technical Development • Provide domain knowledge for use by Urban Combat, Physics, and GGP performers • Provide inference capabilities, including query support, goal advice, and knowledge elaboration, for UCT, Physics and GGP performers • Pursue knowledge gathering and elaboration via ILP over domain and background knowledge • Pursue inference speedup and results improvement via Reinforcement Learning of inference pathways • Integration & Coordination • Integrate Alchemy and other probabilistic reasoning approaches with Cyc’s inference capabilities • Develop knowledge: • Inferential expansion • Probabilistic weighting • Rule formation (ILP) SourceTestbed TargetTestbed Situation,Status, &Queries New Facts: Domain & General Knowledge New Rules and Skills: Rule Inductionvia ILP Expanded Knowledge: Inference, Advice, & Probabilities Performinference;advice, query results,background, skills andmemories ExecutionAgent(s) Advice,Support, &Elaboration Tasks Payoff • Information flow among complementary learningand transfer mechanisms and approaches • Establish a well-founded, mutually compatible baseof assumptions and facts – necessary for transfer • Allow systems to communicate observations, conclusions, skills, memories and intentions • Learning can take full advantage of existing background knowledge, knowledge from less- obviously related domains and problems • High-level, semantically connected knowledge, within a context of existing knowledge =understanding • Responsibility for technical integration of Alchemy, Cyc, and other inference approaches • Responsibility for technical coordinationof groups developing on the Physics testbed

Cycorp Year 2 PlansEvaluation and Integration UrbanCombat,GGP,Physics Soar, ICARUS,Companions ICARUS S1 S2 S3 IS2-1 IS1-1 IS2-1 IS1-i IS2-j IS1-i IS1-1 IS3-1 IS1-i IS1-1 IS2-1 IS2-j COMPANIONS Coordination & Integration Representation • How many formal representations of problems and queries in different testbeds are shared by different architectures? • How many inference requests, of how many types, go through a common interface? • How effectively can knowledge be probabilistically qualified (as measured by crossfold validation)? • Coverage: in each testbed, • How many problems are represented? • How many types of problem? Whatnovel problem categories? • How many and what type of obstacles,goals, percepts, and actions? • What novel types of solution information? • Accuracy: • Well-represented domains are critical forsuccessful performance; accuraterepresentations are demonstrated bysuccessful agent evaluations. Knowledge& Inference Inference & Learning • Learning & Transfer: • What novel fact-level knowledge gathered forthe source is reused in the target space? • How many facts, in what domains? • How many rules can be obtained via ILP over gathered and domain knowledge, in what domains? • What agent skills are obtainable by ILP within Cyc? • Advice-giving and query results: • What appropriate, novel goals are presented? • What improvement on random search can be obtained through advice? • Skills, abilities, and long-term memories: Inferences (Queries, Goals, Search Paths, Elaborations) CycKB Reasoning OverQueries & Testbeds Inference Engines &Approaches Coordinating Inferences Queries & Inference Needs;Skills, Concepts, LTMs Large Scale ILP, Knowledge Seeking, Generalization, LTMs Domain Knowledge Coordination &Semantic Content Background & ProblemRepresentation Queries, Goals, Elaborations LTMs &Followup Queries Testbeds:Urban Combat,Physics, GGP Knowledge, Goals,Analysis, Advice • What novel abilities can agents demonstratewith knowledge and inference support? • What new problems are solvable that could not be solved without that support? Soar

Maryland/Lehigh Year 2 Plans a1 a2 a5 a6 a3 a4 possible outcome 1 State 1 action possible outcome 2 State 2 New Technologies Capabilities • Icarus does plan abstraction by grouping actions • The groups are analogous to Hierarchical Task Network (HTN) decomposition templates (e.g.., as in SHOP2) • “Planner-modification” techniques for systematically generalizing planners to work with nondeterministic actions (i.e., multiple possible outcomes) • Mapping between Icarus’ hierarchical representations and HTNs • Techniques for systematically extending planners to work in adversarial domains (i.e., multiple possible responses from an adversary) • Extensions to Icarus to learn in such domains adversary response 1 State 1 our action State 2 adversary response 2 Solution and Evaluation Contributions • A mapping between Icarus’ hierarchical representations and Hierarchical Task Networks • New algorithms will provide capabilities to reason about adversaries: I.e., to learn about them and to plan against them • This will provide high-level transfer in adversarial environments via learning about abstract strategies/models of the behaviors of single or groups of adversaries in one scenario and transferring this knowledge to another scenario • How: Generalize our planner-modification techniques to deal with adversaries • Work with ISLE to generalize Icarus’ learning to learn about adversaries • When: • September-December 2006: develop the theory and implement the new algorithms • January-April 2007: Work with the ISLE team to incorporate algorithms into Icarus • May 2007: Evaluation: Use the GGP testbed for Year 2

UT Austin Year 2 Plans (Stone) X O X X O X X O O X X O X O X O O X  - Hold t t+1 Dist(K1,C) Dist(K1,C) 13 inputs, 3 outputs Dist(K2,C) Dist(K2,C) min 19 inputs, 4 outputs Dist(T1,C) Dist(T1,C) Dist(K1,K2) Dist(K1,K2) Dist(K1,T1) Dist(K1,T1) min Dist(K2,T) Dist(K2,T) Ang(K2,K1,T) Ang(K2,K1,T) DAng(C,K1,K2) DAng(C,K1,K2) DAng(C,K1,T1) DAng(C,K1,T1) Dist(K2,T1) Dist(K2,T1) Ang(K2,K1,T1) Ang(K2,K1,T1) Evaluation Capabilities/Technologies Core architecture-independent TL for reinforcement learning • Evaluate same core algorithms in multiple domains • GGP: value-function-based • Use symmetry to scale up within same game types • Game-tree features to transfer among different types of games • Automatic abstraction discovery • RoboCup Soccer • Value-function-based transfer: sarsa, function approximators • Policy-based transfer: neuroevolution (NEAT) • Urban Combat -- continued evaluation of year 1 effort SME-QDBN I-TAC β(A’)→A γ(S’)→S   Unique Abilities • Automatic discovery of inter-task mapping • I-TAC (inter-task action correlation) • Train a classifier to map state transition pairs to actions in the source • Use the classifier and state mapping to obtain the action mapping • SME-QDBN (structure mapping + qualitative dynamic Bayes nets) • Knowledge/model-centered approach • Represent action model using QDBNs • Specialized and optimized SME for QDBNs using heuristic search • Policy-based transfer Integration • Incorporate RL into Icarus and/or Soar • Focus on leveraging action-value functions into generalizable planning knowledge • Abstract learned RL knowledge to relational representations • ISLE team comparisons: compare value function transfer vs. Icarus approach in GGP

UT Arlington Year 2 Plans (Huber) Skill Hierarchy Concept Hierarchy Skill and concept utility Selective, task-specific state space construction Skill and concept generalization Hierarchical state representation Skill and concept extraction Task learning Technical Approach Novel Capabilities Generalization of skills and concepts, and estimation of skill and concept utilities to improve transfer • Automatic definition of relevant representational concepts and utility-based guidance for efficient hierarchy construction and skill exploration in RL. • Learning of generalized, “parametric” skills • Generalized policies apply in novel situations and environments • Skills have operator descriptions with utilities and probabilities • Automatic generation of useful representational concepts • Generation of task-relevant relations in the form of predicates • Discovery of relevant feature sets and object types • Automatic derivation of skill and concept utilities • Concept utilities allow construction of appropriate representation • Skill utilities guide exploration or guide planning • Generalization of skills and concepts • Policy homomorphisms as general skills • Relational concepts and features • from homomorphic mapping • Relational Learning • Reinforcement learning • with relational concepts • and generalized skills • Estimation of skill/concept utility • Skill utility to regulate exploration • Concept utility to improve state hierarchy construction Evaluation Plans Integration • Provides RL-based creation of hierarchical skills and concepts with symbolic representations, and skill and concept utilities to provide guidance on their use. • RL-based skill learning component for use in ICARUS • Learned operator representations facilitate integration of skills • Learned features and concepts can augment concept hierarchy • Skill and concept utilities for search and planning guidance • Skill utility estimates can guide operator selection • Concept utility can inform the representation investigated • Development and Integration Timeline • Skill generalization: • Skill/concept utility: • New capabilities will extend the set of transfer levels the Hierarchical RL system can address • Evaluation within the Urban Combat Testbed (UCT) • Application of standalone system to • transfer levels 1-6 • Evaluation focus on tasks with • significant change of the environment • and of the task objective • Evaluation of performance using • Transfer Ratio (TR) • Target of TR values larger than 2 • Evaluation of use of capabilities by evaluation of • frequency and task utility of generalized skills Development Integration Year 2 Year 3

Year 1 Evaluation Plans Urban Combat ETS Physics GGP Comparison among architectures should reveal the conditions for successful transfer learning. But implementing agents that can operate in multiple testbeds takes considerable time and resources. Instead, we will develop agents within two architectures for each testbed, with only one (ICARUS) being applied to all three of them. Experiments will evaluate how well each pair of frameworks supports transfer involving quite different forms of knowledge. Soar Companions LUTA ICARUS

Year 2 Evaluation Plans Urban Combat ETS Physics GGP Comparison among architectures should reveal the conditions for successful transfer learning. But implementing agents that can operate in multiple testbeds takes considerable time and resources. Instead, we will develop agents within two architectures for each testbed, with only one (ICARUS) being applied to all three of them. Experiments will evaluate how well each pair of frameworks supports transfer involving quite different forms of knowledge. Soar Companions LUTA ICARUS

Urban CombatLevel 9 Transfer: Hunted to Hunter Hunted Hunter Transfer Tactical reasoning and strategies; Symbolic and spatial representations Learn to check the hidden path periodically Learn that there is a path with very low visibility Learn that getting caught in a dead end is deadly Learn to try to trap hunted in dead end Avoid path that goes near ambush places Discover a place that makes a good ambush Scenarios in Urban Combat Testbed (UCT)

Urban CombatLevel 10 Transfer: 1v1 to Fire Rescue Fire Rescue 1 vs. 1 Transfer Tactical reasoning and strategies; Symbolic and spatial representations Consume enemy resources Backburn or backdraft Remove wood from fire’s path Pick up enemy ammo Take advantage of terrain Avoid being seen or shot Use doors, walls for protection Always leave an out Always have multiple exits Don’t get caught in dead end Scenarios in Urban Combat Testbed (UCT)

Year 2 Plans for Urban Combat Evaluation Resource Tactical Source (A) Problems UCT Ammunition Spatial Terrain Interface Transferred Knowledge (TK) Source BK TL0 Deadend Source: 1v1, 2v2 Resource Tactical BK+ TK Transfer Learning Target BK Combustible Spatial Terrain TL0 TL1 B NoExit UCT UCT Target (B) Problems Target: FireRescue • Performance: • Transfer ratio (go/no-go) • Demonstrate deep transfer • Comparison to human trials Transfer Ratio > 30% (Y2 Go/No-Go) Performance Experience

Year 2 Plans for Urban Combat Evaluation • Metrics • Hunter: Time it takes agent to kill opponent plus time penalties for health loss • Hunted: Inverse of time before opponent kills agent plus time penalties for health loss • 1v1: Time to kill N opponents plus time penalties for health loss and fewer than N kills • Fire Rescue: Time to rescue ally from fire plus time penalties for health loss

Year 2 Plans for Urban Combat Evaluation • Transferred knowledge • Hunter ↔ Hunted (TL level 9) • Spatial • Visibility, dead-ends, ambush places, terrain • Tactical • Check (hunter) / seek (hunted) low visibility areas • Trap in (hunter) / avoid (hunted) dead-ends • Seek (hunter) / avoid (hunted) ambush places • 1v1 ↔ Fire Rescue (TL level 10) • Spatial • Accessibility of resources (ammunition / combustibles) • Dead-ends, exits, terrain • Tactical • Consume enemy resources • Use terrain for protection • Always leave an out • Taxonomic • Types of terrain, spaces, resources and tactics

Year 2 Plans for Urban Combat Evaluation • Performance milestones • Based on TL levels 9 and 10 • Go/No Go: Transfer ratio > 30% • Demonstrate achievement of specific deep transfer opportunities (e.g., ammunition combustible • Comparison to human trials • Transfer ratios • Deep transfer

Year 2 Experimental Plans for GGP 50 65 75 70 Protocol • Two transfer levels 9. Reformulating 10. Differing • Seven scenarios per level • Multiple consecutive matches in each game • Fixed opponent (non-learning) • Players receive score according to goal • Domain performance metric: Score from satisfied goal • Domain performance goal: Maximize score GGP Terms • Game: defines environment in which agent operates. • Includes initial state, terminal states, transition function, goals • Score associated with goal and terminal states • Match: competition between two agents in a game • Scenario: source / target pairing of games • Source / target may vary in one or more ways (initial state, terminal states, goals, transitions) • Typically exactly one source and one target game Level 10: Differing Source / target game graphs share substructure corresponding to transferable strategy. Level 9: Reformulating Source / target game graphs are isomorphic. • Source / target game descriptions fundamentally different • Different axioms • Different structural representation • Equivalent meaning (same state graph structure) • Transfer must occur at structural level Example: Generalized Fork Current goal value is 50 Opponent choices can lead to several possible successor states Source Target Structural Transfer Regardless of the opponent’s move, agent can reach a state with a higher goal value

Year 2 Plans for Physics Testbed • All of Newtonian Dynamics • Requires calculus, higher-degree polynomials, graphs • Two areas from Dynamical Analogies • Well-explored cross-domain analogies in various physical domains • Excellent venue for exploring distant transfer • Two areas from Dynamical Analogies • Well-explored cross-domain analogies in various physical domains • Excellent venue for exploring distant transfer • Example: Domain A = linear motionDomain B = rotational motion, thermal systems, hydraulics, electricity, …

Physics TestbedDynamical Analogies in Detail

BACKUP SLIDES

Transfer Using Structure Mapping - Hold t t+1 Dist(K1,C) Dist(K1,C) Dist(K2,C) Dist(K2,C) min Dist(T1,C) Dist(T1,C) Dist(K1,K2) Dist(K1,K2) Dist(K1,T1) Dist(K1,T1) min Dist(K2,T) Dist(K2,T) Ang(K2,K1,T) Ang(K2,K1,T) DAng(C,K1,K2) DAng(C,K1,K2) DAng(C,K1,T1) DAng(C,K1,T1) Dist(K2,T1) Dist(K2,T1) Ang(K2,K1,T1) Ang(K2,K1,T1) Objective Algorithm • Generate local matches and calculate the conflict set for each local match; • Generate initial global mappings based on immediate relations of local matches; • Merge global mappings with common structures; • Search for a maximal global mapping with the highest score; • Model/knowledge oriented approach • Using knowledge about • How actions affects state variables? • How state variables relate to each other? • Use structure mapping to find similarities between source and target tasks • Discover β and γ together Step 4 Step 3 Technical Approach • Qualitative DBNs • Dynamic Bayes networks are structure representation for actions: an action (directly) affects a small number of state variables • Probabilities are less relevant; more qualitative properties matter: no change, increase/decrease, etc. • Specialized and optimized SME for QDBNs • Fixed types of entities • How entities match? • How to evaluate mappings? • SME-QDBN uses heuristic search to find the mapping of the maximal score • Prune with upper bounds Results Keepaway Summary • Works nicely for Keepaway • Strong demand for domain knowledge • Provides similarity measures for source and target • Future work • Improve efficiency • Learn QDBN from data • Apply to GGP and Urban Combat

Policy Transfer Using NEAT  13 inputs, 3 outputs 19 inputs, 4 outputs Objective Results for Scheduling • An autonomic computing task • Task: determine in what order to process jobs • Goal: maximize aggregate utility • Source task: 2 job types (8 state variables, 8 actions) • Target task: 4 job types (16 state variables, 16 actions) • An alternative to value-function-based transfer • Direct policy transfer based on the mappings of state variables (β) and actions (γ) • NEAT (NeuroEvolution of Augmenting Topologies) • Uses genetic algorithms to evolve neural networks • Neural networks are used as action selectors With Transfer Scratch t.r. = 35 Cost t.t.r. = 80% Episodes Comparison of Sarsa and NEAT Results for Keepaway • Taylor, Whiteson, & Stone (GECCO-06) • NEAT evolves 3v2 players • Use a  (from β & γ) to transform organisms • NEAT evolves 4v3 with population from 3v2 t.r. = 5.8 t.t.r. = 84%

Evaluation Process • Year 1: Near-transfer (levels 1-6) • A = set of basic problems, B= transfer variations • Training runs include quiz of four problems, followed by worked solutions • Experiment design worked out by ETS, NU • ETS provided training + test examples for NU’s research needs • Novel problems from same templates were used for evaluation • Tests were carried out on a sequestered NU cluster • 5 nodes to ETS for Physics • Scripting language developed to facilitate creation of experiments • Code frozen at start of evaluation • Efforts to make Companions usable by others has been an important step towards making the architecture into a robust product

Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett

Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett

Presentation Transcript

Quality Texas Foundation Site Visit Team

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information

Field Visit Report of Team:3

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information

Pat Langley Dileep George Stephen Bay Computational Learning Laboratory

August 4, 2006

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information

ISLE Transfer Learning Team Main Technology Components

Quality Texas Foundation Site Visit Team

Quality Texas Foundation Site Visit Team

The Site Visit

SITE VISIT REPORT OF KOLHAPUR

Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and

visit the site

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information

Pat Langley Arizona State University and Institute for the Study of Learning and Expertise

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information

August 4, 2006