1 / 30

Transfer Learning

Transfer Learning. Lisa Torrey University of Wisconsin – Madison CS 540. Transfer Learning in Humans. Education Hierarchical curriculum Learning tasks share common stimulus-response elements Abstract problem-solving Learning tasks share general underlying principles Multilingualism

walker
Télécharger la présentation

Transfer Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transfer Learning Lisa Torrey University of Wisconsin – Madison CS 540

  2. Transfer Learning in Humans • Education • Hierarchical curriculum • Learning tasks share common stimulus-response elements • Abstract problem-solving • Learning tasks share general underlying principles • Multilingualism • Knowing one language affects learning in another • Transfer can be both positive and negative

  3. Transfer Learning in AI Given Learn Task T Task S

  4. Goals of Transfer Learning higher asymptote higher slope performance higher start training

  5. Inductive Learning Search Allowed Hypotheses All Hypotheses

  6. Transfer in Inductive Learning Search Allowed Hypotheses All Hypotheses Thrun and Mitchell 1995: Transfer slopes for gradient descent

  7. Transfer in Inductive Learning Bayesian methods Bayesian Learning Bayesian Transfer Prior distribution + Data = Posterior Distribution Raina et al.2006: Transfer a Gaussian prior

  8. Transfer in Inductive Learning Hierarchical methods Pipe Surface Circle Line Curve Stracuzzi2006: Learn Boolean concepts that can depend on each other

  9. Transfer in Inductive Learning Dealing with Missing Data or Labels Task T Task S Shi et al. 2008: Transfer via active learning

  10. Reinforcement Learning Agent Q(s1, a) = 0 π(s1) = a1 Q(s1, a1)  Q(s1, a1) + Δ π(s2) = a2 s2 s3 a1 a2 r2 r3 s1 Environment • δ(s2, a2) = s3 • r(s2, a2) = r3 • δ(s1, a1) = s2 • r(s1, a1) = r2

  11. Transfer in Reinforcement Learning Starting-point methods Hierarchical methods Alteration methods New RL algorithms Imitation methods

  12. Transfer in Reinforcement Learning Starting-point methods Initial Q-table transfer Source task no transfer target-task training Taylor et al. 2005: Value-function transfer

  13. Transfer in Reinforcement Learning Hierarchical methods Soccer Pass Shoot Run Kick Mehta et al. 2008: Transfer a learned hierarchy

  14. Transfer in Reinforcement Learning Alteration methods Task S Original states Original actions Original rewards New states New actions New rewards Walsh et al. 2006: Transfer aggregate states

  15. Transfer in Reinforcement Learning New RL Algorithms Agent Q(s1, a) = 0 π(s1) = a1 Q(s1, a1)  Q(s1, a1) + Δ π(s2) = a2 a1 a2 s2 s3 s1 r2 r3 Environment • δ(s2, a2) = s3 • r(s2, a2) = r3 • δ(s1, a1) = s2 • r(s1, a1) = r2 Torrey et al. 2006: Transfer advice about skills

  16. Transfer in Reinforcement Learning Imitation methods source policy used target Torrey et al. 2007: Demonstrate a strategy training

  17. My Research Starting-point methods Hierarchical methods Hierarchical methods New RL algorithms Imitation methods Skill Transfer Macro Transfer

  18. RoboCup Domain 3-on-2 KeepAway 3-on-2 BreakAway 2-on-1 BreakAway 3-on-2 MoveDownfield

  19. Inductive Logic Programming IF [ ] THEN pass(Teammate) IF distance(Teammate) ≤ 5 THEN pass(Teammate) IF distance(Teammate) ≤ 10 THEN pass(Teammate) … IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 15 THEN pass(Teammate) IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate)

  20. Advice Taking Batch Reinforcement Learning via Support Vector Regression (RL-SVR) Agent Agent Compute Q-functions … Environment Environment Batch 2 Batch 1 Find Q-functions that minimize: ModelSize + C × DataMisfit

  21. Advice Taking Batch Reinforcement Learning with Advice (KBKR) Agent Agent Compute Q-functions … Environment Environment Advice Batch 1 Batch 2 + µ × AdviceMisfit Find Q-functions that minimize: ModelSize + C × DataMisfit

  22. Skill Transfer Algorithm Source ILP IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate) Mapping Advice Taking Target [Human advice]

  23. Selected Results Skill transfer to 3-on-2 BreakAway from several tasks

  24. Macro-Operators pass(Teammate) move(Direction) IF [ ... ] THEN pass(Teammate) IF [ ... ] THEN move(ahead) IF [ ... ] THEN shoot(goalRight) IF [ ... ] THEN shoot(goalLeft) IF [ ... ] THEN pass(Teammate) IF [ ... ] THEN move(left) IF [ ... ] THEN shoot(goalRight) IF [ ... ] THEN shoot(goalRight) shoot(goalRight) shoot(goalLeft)

  25. Demonstration An imitation method source policy used target training

  26. Macro Transfer Algorithm Source ILP Demonstration Target

  27. Macro Transfer Algorithm Learning structures Positive: BreakAway games that score Negative: BreakAway games that didn’t score ILP IF actionTaken(Game, StateA, pass(Teammate), StateB) actionTaken(Game, StateB, move(Direction), StateC) actionTaken(Game, StateC, shoot(goalRight), StateD) actionTaken(Game, StateD, shoot(goalLeft), StateE) THEN isaGoodGame(Game)

  28. Macro Transfer Algorithm Learning rules for arcs Positive: states in good games that took the arc Negative: states in good games that could have taken the arc but didn’t ILP pass(Teammate) shoot(goalRight) IF [ … ] THEN loop(State, Teammate)) IF [ … ] THEN enter(State)

  29. Selected Results Macro transfer to 3-on-2 BreakAway from 2-on-1 BreakAway

  30. Summary • Machine learning is often designed in standalone tasks • Transfer is a natural learning ability that we would like to incorporate into machine learners • There are some successes, but challenges remain, like avoiding negative transfer and automating mapping

More Related