1 / 43

Transfer Learning Via Advice Taking

Transfer Learning Via Advice Taking. Jude Shavlik University of Wisconsin-Madison. Acknowledgements. Lisa Torrey, Trevor Walker, & Rich Maclin DARPA IPTO Grant HR0011-04-1-0007 NRL Grant N00173-06-1-G002 DARPA IPTO Grant FA8650-06-C-7606. What Would You Like to Say to This Penguin?.

Télécharger la présentation

Transfer Learning Via Advice Taking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transfer Learning Via Advice Taking Jude Shavlik University of Wisconsin-Madison

  2. Acknowledgements • Lisa Torrey, Trevor Walker, & Rich Maclin • DARPA IPTO Grant HR0011-04-1-0007 • NRL Grant N00173-06-1-G002 • DARPA IPTO Grant FA8650-06-C-7606

  3. What Would You Like to Say to This Penguin? IF a Bee is (Near and West) & an Ice is (Nearand North) Then Begin Move East Move North END

  4. Empirical Results With advice Without advice

  5. Our Approach to Transfer Learning Mapping Extracted Knowledge Transferred Knowledge Extraction Refinement Target Task Source Task

  6. Potential Benefits of Transfer steeper slope higher asymptote higher start performance withtransfer without transfer training

  7. Outline • Reinforcement Learning w/ Advice • Transfer via Rule Extraction & Advice Taking • Transfer via Macros • Transfer via Markov Logic Networks(time permitting) • Wrap Up

  8. Reinforcement Learning (RL) Overview Described by a set offeatures Sense state Choose action Policy: choose the action with the highest Q-value in the current state Receive reward Use the rewards to estimate the Q-values of actions in states

  9. The RoboCup Domain

  10. RoboCup Subtasks MoveDownfield Mobile KeepAway BreakAway Variant of Stone & Sutton, ICML 2001

  11. Q Learning (Watkins PhD, 1989) policy(state) = argmaxaction state Q function value action For large state spaces, need function approximation

  12. Learning the Q Function distance(me,teammate1) distance(me,opponent1) angle(opponent1, me, teammate1) … 0.2 -0.1 0.9 … A Std Approach: Linear support-vector regression Q-value = Feature vector Weight vector ● T ● Set weights to minimize Model size + C × Data misfit

  13. Advice in RL • Advice provides constraints on Q values under specified conditions IF an opponent is near me AND a teammate is open THEN Q(pass(teammate)) > Q(move(ahead)) • Apply as soft constraints in optimization Model size + C × Data misfit + μ× Advice misfit

  14. Aside: Generalizing the Idea of a Training Example for Support Vector Machines (SVMs) Can extend the SVM linear program to handle “regions as training examples” Fung, Mangasarian, & Shavlik: NIPS 2003, COLT 2004

  15. Specifying Advice for Support Vector Regression B x ≤ d  y ≥ h’ x + β If input (x) is in region specified by Bandd then output (y) should be above some line (h’x + β) y x

  16. Advice format Bx≤d f(x) ≥ hx + Sample Advice If distanceToGoal≤ 10 and shotAngle≥ 30 Then Q(shoot) ≥ 0.9 0.9

  17. Sample Advice-Taking Results if distanceToGoal  10 and shotAngle  30 then prefer shoot over all other actions Q(shoot) > Q(pass) Q(shoot) > Q(move) advice 2 vs 1 BreakAway, rewards +1, -1 std RL

  18. Outline • Reinforcement Learning w/ Advice • Transfer via Rule Extraction & Advice Taking • Transfer via Macros • Transfer via Markov Logic Networks • Wrap Up

  19. Close-Transfer Scenarios 4-on-3 BreakAway 2-on-1 BreakAway 3-on-2 BreakAway

  20. Distant-Transfer Scenarios 3-on-2 KeepAway 3-on-2 BreakAway 3-on-2 MoveDownfield

  21. Our First Transfer-Learning Approach:Exploit fact that models and advice in same language Source Q functions Mapped Q functions Q´x = wx1f´1 + wx2f´2 + bx Q´y = wy1f´1+ by Q´z = wz2f´2 + bz Qx = wx1f1 + wx2f2 + bx Qy = wy1f1+by Qz = wz2f2 + bz Advice Advice (expanded) ifwx1f´1 + wx2f´2 + bx > wy1f´1 + by andwx1f´1 + wx2f´2 + bx > wz2 f´2 + bz then prefer x´ to y´ and z´ ifQ´x>Q´y andQ´x>Q´z then prefer x´

  22. User Advice in Skill Transfer • There may be new skills in the target that cannot be learned from the source • We allow (human) users to add their own advice about these skills User Advice for KeepAway to BreakAway IF: distance(me, GoalPart) < 10 AND angle(GoalPart, me, goalie) > 40 THEN: prefer shoot(GoalPart)

  23. Sample Human Interaction “Use what you learned in KeepAway, and add in this new action SHOOT.” “Here is some advice about shooting …” “Now go practice for awhile.”

  24. Policy Transfer to 3-on-2 BreakAway Torrey, Walker, Shavlik & Maclin: ECML 2005

  25. Our Second Approach: Use Inductive Logic Programming (ILP) on SOURCE to extract advice good_action(pass(t1), state1) good_action(pass(t2), state3) good_action(pass(t1), state2) good_action(pass(t2), state2) good_action(pass(t1), state3) Given • Positive and negative examples for each action Do • Learn first-order rules that describe most positive examples but few negative examples good_action(pass(Teammate), State) :- distance(me, Teammate, State) > 10, distance(Teammate, goal, State) < 15.

  26. Searching for an ILP Clause(top-down search using A*)

  27. Skill Transfer to 3-on-2 BreakAway Torrey, Shavlik, Walker & Maclin: ECML 2006, ICML Workshop 2006

  28. Approach #3: Relational Macros pass(Teammate) ← isOpen(Teammate) hold ← true isClose(Opponent) allOpponentsFar • A relational macro is a finite-state machine • Nodes represent internal states of agent in which independent policies apply • Conditions for transitions and actions are learned via ILP

  29. Step 1: Learning Macro Structure move(ahead) pass(Teammate) shoot(GoalPart) • Objective: find (via ILP) an action pattern that separates good and bad games macroSequence(Game, StateA) ← actionTaken(Game, StateA, move, ahead, StateB), actionTaken(Game, StateB, pass, _, StateC), actionTaken(Game, StateC, shoot, _, gameEnd).

  30. Step 2: Learning Macro Conditions For the transition from move to pass transition(State) ← distance(Teammate, goal, State) < 15. For the policy in the pass node action(State, pass(Teammate)) ← angle(Teammate, me, Opponent, State) > 30. move(ahead) pass(Teammate) shoot(GoalPart) • Objective: describe when transitions and actions should be taken

  31. Learned 2-on-1 BreakAway Macro pass(Teammate) move(Direction) shoot(goalRight) shoot(goalLeft) Player with BALL executes the macro This shot is apparently a leading pass

  32. Transfer via Demonstration Demonstration • Execute the macro strategy to get Q-value estimates • Infer low Q values for actions not taken by macro • Compute an initial Q function with these examples • Continue learning with standard RL Advantage: potential for large immediate jump in performance Disadvantage: risk that agent will blindly follow an inappropriate strategy

  33. Macro Transfer to 3-on-2 BreakAway Variant of Taylor & Stone Torrey, Shavlik, Walker & Maclin: ILP 2007

  34. Macro Transfer to 4-on-3 BreakAway Torrey, Shavlik, Walker & Maclin: ILP 2007

  35. Outline • Reinforcement Learning w/ Advice • Transfer via Rule Extraction & Advice Taking • Transfer via Macros • Transfer via Markov Logic Networks • Wrap Up

  36. Approach #4: Markov Logic Networks(Richardson and Domingos, MLj 2003) dist2 < 10 ang1 > 45 dist1 > 5 0.5 ≤ Q < 1.0 0 ≤ Q < 0.5 IF dist2 < 10 AND ang1 > 45 THEN 0.5 ≤ Q < 1.0 Wgt = 1.7 IF dist1 > 5 AND ang1 > 45 THEN 0 ≤ Q < 0.5 Wgt = 2.1

  37. Using MLNs to Learn a Q Function Q • Perform hierarchical clustering to find set of good Q-value bins • Use ILP to learn rules that classify examples into bins • Use MNL weight-learning methodsto choose weights for these formulas IF dist1 > 5 AND ang1 > 45 THEN 0 ≤ Q < 0.1

  38. MLN Transfer to 3-on-2 BreakAway Torrey, Shavlik, Natarajan, Kuppili & Walker: AAAI TL Workshop 2008

  39. Outline • Reinforcement Learning w/ Advice • Transfer via Rule Extraction & Advice Taking • Transfer via Macros • Transfer via Markov Logic Networks • Wrap Up

  40. Summary of Our Transfer Methods • Directly reuse weighted sums as advice • Use ILP to learn generalized advice for each action • Use ILP to learn macro-operators • Use Markov Logic Networks to learn probability distributions for Q functions

  41. Our Desiderata for Transfer in RL • Transfer knowledge in first-order logic • Accept advice from humans expressed naturally • Refine transferred knowledge • Improve performance in related target tasks • Major challenge: Avoid negative transfer

  42. Related Work in RL Transfer • Value-function transfer (Taylor & Stone 2005) • Policy reuse (Fernandez & Veloso 2006) • State abstractions (Walsh et al. 2006) • Options (Croonenborghs et al. 2007) Torrey and Shavlik survey paper on line

  43. Conclusion • Transfer learning important perspective for machine learning- move beyond isolated learning tasks • Appealing ways to do transfer learning are via the advice-taking and demonstration perspectives • Long-term goal: instructable computing- teach computers the same way we teach humans

More Related