Transfer Learning with Inter-Task Mappings

Transfer Learning with Inter-Task Mappings Matthew E. Taylor Joint work with Peter Stone Department of Computer Sciences The University of Texas at Austin

Transfer Motivation • Learning tabula rasa can be unnecessarily slow • Humans can use information from previous tasks • Soccer with different numbers of players • Agents: leverage learned knowledge in novel/modified tasks • Learn faster • Larger and more complex problems become tractable • Different numbers of state variables and actions in tasks

Common TL Metrics Also: total reward accumulated

Transfer Goals • Autonomous transfer • AI Goal • Explore the world, learning • Transfer autonomously • Utilize past knowledge • Learn difficult tasks faster • Engineering Goal • Learn a set of simple tasks • Eventually learn target task • Total time reduction

ρ Transfer via Inter-Task Mappings Source Task πnot defined for S’ and A’ ρ is a transfer functional task-dependant: relies on inter-task mappings π(S) → A π’(S’) → A’ Target Task

Inter-Task Mappings • χA: atarget → asource Given target task action, return similar source task action • χX: starget → ssource Similar, but for state variables: for all x in each target task state: s = ⟨x1, x2, … xn⟩ • ρ automatically formed from χAand χX to enable transfer of: • π(s) • Q(s, a) • Rules • Model • etc.

Transfer Functional: ρCMAC New states and actions in target task → new tiles Source Target • Counterintuitive: • Q-Values are very low-level • Very task-specific

Sample Results • Can significantly reduce target task time and total time • Able to learn inter-task mappings with little data Keepaway Transfer: 3 vs. 2 to 4 vs. 3 Source Task Time Target Task Time Source Task Episodes

Empirical Domains • Robot Soccer Keepaway • Server Job Scheduling • Mountain Car • Killer Application? • Epilepsy? • Robotics?

Open Questions: 1/3 • Optimize for Total Time? Source Task Time Target Task Time Source Task Episodes

Open Questions: 2/3 • Guarantee transfer efficacy? • Avoid Negative Transfer (“Giveaway”)? • Similarity measure? • Jumpstart in Target • MDP similarity [Ferns, others] • Analysis of learned source task knowledge

Open Questions: 3/3 • Learn an inter-task mapping efficiently? • Sample complexity • Computational complexity • Select Source Task? • In library (sunk cost) • To learn first (total time metric)

MASTER OverviewModeling Approximate State Transitions by Exploiting Regression Record observed (ssource, asource, s’source) tuples in source task Record small number of (starget, atarget, s’target) tuples in target task Learn one-step transition model, T(S,A), for the target task: M(starget, atarget) →s’target for every possible action mapping χA for every possible state variable mapping χX Transform recorded source task tuples Calculate the error of the transformed source task tuples on the target task model: ∑(M(stransformed, atransformed) – s’ transformed)2 returnχA,χX with lowest error

Utilizing Mappings in 3D Mountain Car

Transfer Learning with Inter-Task Mappings

Transfer Learning with Inter-Task Mappings

Presentation Transcript

Debugging Schema Mappings with Routes

Chaos Mappings

Inter-Hospital Transfer Project Standards for transfer

Investigating Immersive Learning Communities with 'Inter-Life’ (pt2)

Learning Task Analysis

Transfer Learning

Task Based Learning

Inter-Laboratory Method Transfer

Inter-comparison and Validation Task Team

INTER-PROFESSIONAL LEARNING

Functions (Mappings)

Debugging Schema Mappings with Routes

Task: Role ORG Transfer

Task Based Learning

Inter-platform file transfer

MMS Mappings

Learning Feature Mappings Using Evolutionary Computation

Autonomous Inter-Task Transfer in Reinforcement Learning Domains

Debugging Schema Mappings with Routes

DECODE THE FRENCH LEARNING TASK WITH EASE