1 / 21

Learning Procedural Planning Knowledge in Complex Environments

Learning Procedural Planning Knowledge in Complex Environments. Douglas Pearson douglas.pearson@threepenny.net March 2004. Characterizing the Learner. Method. Implicit. Deliberate. KR. Complex Environments Actions: Duration & Conditional Sensing: Limited, noisy, delayed

inara
Télécharger la présentation

Learning Procedural Planning Knowledge in Complex Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004

  2. Characterizing the Learner Method Implicit Deliberate KR Complex Environments Actions: Duration & Conditional Sensing: Limited, noisy, delayed Task : Timely response Domain: Change over time large state space Reinforcement Learning IMPROV Procedural Symbolic Learners Simple Environments Declarative Simpler Agents Weak, slower learning Complex Agents Strong, faster learning

  3. Why Limit Knowledge Access? • Procedural – Only access by executing • Declarative – Can answer when will execute/what it will do. Declarative Problems • Availability • If (x^5 + 3x^3 – 5x^2 +2) > 7 then Action • Chains of rules A->B->C->Action • Efficiency • O(size of knowledge base) or worse • Agent slows down as learns more IMPROV Representation • Sets of production rules for operator preconditions and actions • Assume learner can only execute rules • But allow ability to add declarative knowledge when it’s efficient to do so.

  4. Focusing on Part of the Problem 100% Task Performance 0% Knowledge Learn this Domain Knowledge Initial Rule Base Representation

  5. The Problem • Cast learning problem as • Error detection (incomplete/incorrect K) • Error correction (fixing or adding K) • But with just limited, procedural access • Aim is to support learning in complex, scalable agents/environments.

  6. Speed-30 Speed-10 Speed-0 Speed-30 S1 S2 S3 S4 Error Detection Problem PLAN Existing (Possibly Incorrect) Knowledge How to monitor the plan during execution without direct knowledge access?

  7. Engine stalls No proposal S4 Speed-30 Speed-10 Speed-0 S1 S2 S3 Error Detection Solution • Direct monitoring – not possible • Instead detect lack of progress to the goal • No rules matching or conflicting rules • Not predicting behavior of the world (useful in stochastic environments) • But no implicit notion of quality of solution • Can add domain specific error conditions – but not required.

  8. Repeat until find goal Fail Reached Goal Identify Incorrect Operator(s) Train Inductive Learner Change Domain Knowledge IMPROV’s Recovery Method Replan Search Execute Record [State,Op -> Result] Learning

  9. Speed-30 Speed-10 Change-Gear Speed-0 Speed-30 Finding the Incorrect Operator(s) Speed-30 Speed-10 Speed-0 Speed-30 Change-Gear is over-specific Speed-0 is over-general By waiting can do better credit assignment

  10. Learning to Correct the Operator • Collected a set of training instances • [State, Operator -> Result] • Can identify differences between states Speed = 40 Light = green Self = car Other = ambulance Speed = 40 Light = green Self = car Other = car • Used as a default bias in training inductive learner • Learn preconditions as classification problem (predict operator from state)

  11. K-Incremental Learning • Collect a set of k instances • Then train inductive learner Reinforcement Learners Till Correction (IMPROV) Till Unique Cause (EXPO) Non-Incremental Learners n 1 k1 k2 Instance set size K-Incremental Learner • k does not grow over time => incremental behavior • Better decisions about what to discard when generalizing • When doing “active learning” bad early learning can really hurt

  12. Speed 30 Speed 0 Speed 20 Extending to Operator Actions Decompose into operator hierarchy Speed 30 Speed 0 Speed 20 Brake Release Slow -5 Slow -10 Slow -10 Slow 0 Terminates with operators that modify a single symbol

  13. Slow -2 Slow -4 Slow -6 => Failure Observed effects of braking on ice Correcting Actions Slow -5 Slow -10 Slow -10 Expected effects of braking Use the correction method to change the pre-conditions of these sub-operators

  14. Change Procedural Actions Brake Braking & slow=0 & ice => reject slow -5 Specialize Slow -5 Changing effects of brake Braking & slow=0 & ice => propose slow -2 Generalize Slow -2 Supports Complex Actions Actions with durations (sequence of operators) Conditional actions (branches in sequence of operators) Multiple simultaneous effects

  15. IMPROV Summary • IMPROV support for: • Powerful agents • -- Multiple goals • -- Faster, deliberate learning • Complex environments • -- Noise • -- Complex actions • -- Dynamic environments Method Implicit Deliberate KR Reinforcement Learning IMPROV Procedural Incremental Symbolic Learners Declarative Non-Incremental k-Incremental Learning -- Improved credit assignment -- Which operator -- Which feature General weak deliberate learner with only procedural access assumed -- General purpose error detection -- General correction method applied to preconditions and actions -- Nice re-use of precondition learner to learn actions -- Easy to add domain specific knowledge to make method stronger

  16. Redux: Diagram-based Example-driven Knowledge Acquisition Douglas Pearson douglas.pearson@threepenny.net March 2004

  17. 1. User specifies desired behavior

  18. 2. User selects features – define rules Later we’ll use ML to guess this initial feature set

  19. 3. Compare desired with rules Desired Turn-to-face(threat1) Shoot(threat1) Move-through(door1) Actual Turn-to-face(neutral1) Shoot(neutral1) Move-through(door1)

  20. 4. Identify and correct problems • Detect differences between desired behavior and rules • Detect overgeneral preconditions • Detect conflicts within the scenario • Detect conflicts between scenarios • Detect choice points where there’s no guidance • etc. etc. • All of these errors are detected automatically when rule is created

  21. Library of validated behavior examples Analysis & generation tools Executable Code A -> B C -> D E, J -> F G, A, C -> H E, G -> I J, K -> L Detect inconsistency Generalize Generate rules Simulate execution Simulation Environment 5. Fast rule creation by expert Define behavior with diagram-based examples Expert Engineer

More Related