1 / 40

Forward-Chaining Planning in Nondeterministic Domains

Forward-Chaining Planning in Nondeterministic Domains. Ugur Kuter and Dana Nau Department of Computer Science and Institute for Systems Research University of Maryland College Park, Maryland. Generating Plans of Action. Programs to aid human planners Project management (consumer software)

suki
Télécharger la présentation

Forward-Chaining Planning in Nondeterministic Domains

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Forward-Chaining Planning in Nondeterministic Domains Ugur Kuter and Dana Nau Department of Computer Scienceand Institute for Systems Research University of MarylandCollege Park, Maryland

  2. Generating Plans of Action • Programs to aid human planners • Project management (consumer software) • Plan storage and retrieval • (e.g., variant process planning) • Automatic schedule generation • (various OR and AI techniques) • For some problems, really want to generate plans automatically • Much more difficult • One source of difficulty: nondeterministic outcomes • If I plan to perform some action a, I cannot be sure in advance what outcome a will have

  3. c a b Graspblock c c b a Intendedoutcome a b c Unintendedoutcome Planning with Nondeterminism • Actions with multiple possible outcomes • Action failures • e.g., gripper drops its load • Exogenous events • e.g., road closed • Like Markov Decision Processes (MDPs),but without probabilities attached to the outcomes • Useful if accurate probabilities aren’t available, orif probability calculations would introduce inaccuracies • Existing approaches • Conditional Planning (e.g., Penberthy & Weld, 1992) • Conformant Planning (e.g., Smith & Weld, 1998) • Symbolic Model Checking (e.g., Cimatti et al., 1998, 2003)

  4. Research Motivation • Algorithms for planning with nondeterminism havevery high computational complexity • Search space usually is huge • Existing algorithms search most of the space • Classical planning • Lots of work on generating plans quickly • Techniques for pruning large parts of the entire space • Can we generalize any of these techniques for use in nondeterministic domains?

  5. Our Results • A way to nondeterminize any forward-chaining planner for deterministic planning domains • Rewrite it so that it works in nondeterministic domains • Theoretical analysis • Under the appropriate conditions, some nondeterminized planners can run exponentially faster than the best previous planners for nondeterministic domains • Experimental verification of the theoretical results

  6. Forward-Chaining Planners • Some of the most capable existing planners use forward chaining • Backtracking state-space search starting at the initial state • e.g., HSP, TLPlan, TALplanner, SHOP2 • FCP: abstract model of forward-chaining planners • Among different forward-chainingplanners, the main differenceis the action-generation function (s) {actions applicable to s} • Can classify them based on  • Domain-specific • Domain-independent • Domain-configurable Procedure FCP (s0, g) π := the empty plan; s := s0 loop • if s satisfies g then return π • else if s isn’t in ancestors(s) then • A := (s) • if A is empty then return failure • nondeterministically choose aA • π := π.a; s := (s,a) • else return failure

  7. Classification of Forward-Chaining Planners • Domain-specific: is designed or tuned for one specific domain • Several application-oriented planners work this way • e.g., EDAPS (process planning), Tignum 2 (used in Bridge Baron) • Good performance in the given domain, but hard to generalize • Domain-independent: works in any domain within some class • Usually,  works in anyclassical planning domain • Focus of most researchon AI planning • So far, not practical forreal-world planning • Domain-configurable: … Procedure FCP (s0, g) π := the empty plan; s := s0 loop • if s satisfies g then return π • else if s isn’t in ancestors(s) then • A := (s) • if A is empty then return failure • nondeterministically choose aA • π := π.a; s := (s,a) • else return failure

  8. Classification (continued) • Domain-configurable •  has a domain-independent computational engine • Give domain-specific information to  as part of the domain description • How to prune some of the actions from  1. Control rules written intemporal logic, used for pruning 2. Hierarchical Task Networks (HTNs)and ordered decomposition Procedure FCP (s0, g, K) π := the empty plan; s := s0 loop • if s satisfies g then return π • else if s isn’t in ancestors(s) then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π.a; s := (s,a) • else return failure

  9. 1. Control Rules in Temporal Logic • Depth-first forward search, with control rules written in temporal logic • For each state s, a control rule, f • prune s if it doesn’t satisfy f • Control rules for successors of s are computed via logical progression • TLPlan (Bacchus & Kabanza, Artificial Intelligence 2000) • TALplanner (Doherty & Kvarnstrom, AMAI 2001) • Both work the same way, but they use different temporal logics • Example (next slide): • A trivial blocks-world planning problem • LTL (the logic used in TLPlan)

  10. Example State s: Goal: {on(b,a)} • Control rule f: never pick up block x from the table unless x needs to be on top of another block • Progressed formula f + (must be true in all children of s) • If we pick up a, f + will not be satisfied - prune this state • If we pick up b, f + will be satisfied - keep searching below this state • Can write rules to prune huge parts of the search space b a a b

  11. task travel(x,y) 2. HTN Planning method taxi-travel(x,y) air-travel(x,y) • Decompose tasks into subtasks • Handle constraints (e.g., taxi not good for long distances) • Resolve interactions (e.g., take taxi early enough to catch plane) • If necessary, backtrack and try other decompositions get-ticket (a(x), a(y)) travel (a(y),y) get-taxi ride-taxi (x,y) pay-driver travel (x, a(x)) fly (a(x), a(y)) • travel(UMD, U-of-Alberta) • get-ticket(DCA, YEG) • go to Orbitz • find-flights(DCA,YEG) • buy-ticket(DCA,YEG) • travel(UMD, DCA) • get-taxi • ride-taxi(UMD, DCA) • pay-driver • fly(DCA, YEG) • travel(YEG, U-of-Alberta) • get-taxi • ride-taxi(YEG, U-of-Alberta) • pay-driver

  12. Ordered Decomposition task t0 • Decompose tasksin the same order in which they’ll be executed • Whenever we want to plan the next task • we’ve already planned everything that comes before it • Thus, we know the current state of the world • SHOP2 (Nau et al., IJCAI 2001, JAIR 2003) … task tm task tn … op1 op2 opi s0 s1 s2 … Si–1

  13. Performance • Using control rules and HTNs • can encode domain-specific problem-solving knowledge • highly focused search • go almost directly toward a near-optimal solution, with very little backtracking • TLPlan, TALplanner, and SHOP2 have been the best performers in the International Planning Competitions • Several orders of magnitude faster than the domain-independent planners • Solved many more problems

  14. Expressivity • Forward-chaining planners always know the current state • This makes it easy to do things that would be difficult otherwise • States can be arbitrary data structures • Preconditions and effects can include • logical inference • complex numeric computations • interactions with other software packages • Applications: • SHOP2 is open-source freeware, has been used in dozens of applications (Nau et al., 2004) • Bacchus and Kabanza are attempting to commercialize TLPlan Us: East declarer, West dummy Opponents: defenders, South & North Contract: East – 3NT On lead: West at trick 3 East: KJ74 West: A2 Out: QT98653

  15. How to NondeterminizeForward-Chaining Planners • Two steps: 1. Modify FCP to generate policies rather than plans 2. Modify FCP to solve problems in which actions have multiple outcomes • Want to do this in such a way that it will work for all instances of FCP • Nondeterminized versions of HSP, TLPlan, TALplanner, SHOP2, etc.

  16. a0 a1 a2 Goal State Initial State s0 s1 s2 s3 Plans Versus Policies • In classical domains, a solution is a plan (sequence of actions) • For nondeterministic domains, that’s not sufficient • An action may lead tomore than one possible state • What to do next dependson what state we’re in • Instead of a plan, use a policy: a partial function from states to actions π = (a0, a1, a2) s3 s0 a0 s1 a1 s3 s4 π = {(s1,a0), (s1,a1), (s2,a3)} s0 a0 a2 s1 s2

  17. Execution Graphs • An action a has morethan one possibleoutcome … … so a policy πhas more than one possible execution path • Execution graph E(π) = the graph of all of π’s possible execution paths • Sπ = {all states in E(π)} s3 s0 a0 s1 a2 s0 a0 s3 s4 a1 Initial States s2 Goal States s5 a1 s1 π = {(s0, a0), (s1, a1), (s2, a1), (s3, a2)}

  18. Nondeterminization (Step 1) • Rewrite FCP so that it generates solution policiesrather than solution plans Procedure FCP (s0, g, K) π := the empty plan; s := s0 loop • if s satisfies g then return π • else if s isn’t in ancestors(s) then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π.a; s := (s,a) • else return failure Procedure Policy-FCP(s0, g, K) π := ; s := s0 loop • if s satisfies g then return π • else if s isn’t in Sπ then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π  {(s,a)}; s := (s,a) • else return failure

  19. s0 s3 a2 a0 s2 s1 Goal a1 Types of Solutions (Cimatti et al, Artificial Intelligence, 2003) • Weak solution: at least one execution path reaches a goal • Strong solution: every execution path reaches a goal • Strong-cyclic solution: every fair execution path reaches a goal • Don’t stay in a cycle forever if there’s a state-transition out of it s0 s3 Goal a2 a0 s2 s1 Goal a3 a1 a3 s0 s3 a2 a0 Goal s2 s1 a1

  20. Nondeterminization (Step 2) • Modify Policy-FCP to generate strong-cyclic solutions • Can also modify it to generate strong and weak solutions(won’t discuss details) Procedure Policy-FCP(s0, g, K) π := ; s := s0 loop • if s satisfies g then return π • else if s isn’t in Sπ then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π  {(s,a)}; s := (s,a) • else • return failure Procedure ND-FCP (S0, g, K) π := ; S := S0; solved :=  loop • if S =  then return π • select s in S and remove it from S • if s satisfies g then put s into solved • else if s isn’t in Sπ then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π  {(s,a)}; S := S (s,a) • else if s has no descendants in (S  solved) – Sπ • then return failure

  21. Bookkeeping • Bookkeeping to generate graphs rather than paths • S = {nodes that have been generated but not yet explored} • solved = {nodes from which we know we can get to a solution} Procedure ND-FCP (S0, g, K) π := ; S := S0; solved :=  loop • if S =  then return π • select s in S and remove it from S • if s satisfies g then put s into solved • else if s isn’t in Sπ then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π  {(s,a)}; S := S (s,a) • else if s has no descendants in (S  solved) – Sπ • then return failure s3 s0 a s1

  22. s6 a2 s3 s4 s0 a0 s1 s5 s2 a3 a1 Failure Detection • A node s is unsolvable in the following cases: • s is a dead end, • s is part of a cycle from which there is no escape, • every descendant of s is unsolvable • This happens if s has no descendants in(S  solved) – Sπ Procedure ND-FCP (S0, g, K) π := ; S := S0; solved :=  loop • if S =  then return π • select s in S and remove it from S • if s satisfies g then put s into solved • else if s isn’t in Sπ then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π  {(s,a)}; S := S (s,a) • else if s has no descendants in (S  solved) – Sπ • then return failure

  23. Formal Properties • Several planning algorithms are instances of FCP • TLPlan, TALplanner, SHOP2, etc. • Only difference: what  is • Nondeterminizing FCP preserves ,so it works on any instance of FCP • ND-TLPlan,ND-TALplanner,ND-SHOP2,etc. • Nondeterminizing thempreserves soundness,completeness,time complexity • Details on the next few slides Procedure ND-FCP (S0, g, K) π := ; S := S0; solved :=  loop • if S =  then return π • select s in S and remove it from S • if s satisfies g then put s into solved • else if s isn’t in Sπ then • A := (s, K) • if A is empty then return failure • nondeterministically choose aA • π := π  {(s,a)}; S := S (s,a) • else if s has no descendants in (S  solved) – Sπ • then return failure

  24. c a b Graspblock c c b a Intendedoutcome a b c Unintendedoutcome Nondeterministic Versions ofOperators and Domains • Nondeterministic version of an operator o • Same as o except that it may have additional possible outcomes • Failures, exogenous events, etc. • Nondeterministic version of a domain D • The operators are nondeterministic versions of the ones in D

  25. Formal Properties • Nondeterminizing an algorithm preserves its soundness and completeness • Let P be any planning algorithm that’s an instance of FCP • Let ND-P be the nondeterminization of P • Let D be any classical planning domain • Let D’ be any nondeterministic version of D • If P is sound/complete on D, then ND-P is sound/complete on D’ • Nondeterminizing an algorithm preserves its time complexity (as a function of its output) • Let TP(n)and TND-P (n) be the running times of P and ND-P, where n = size of the solution found • Then TND-P (n) is polynomially bounded by TP(n) • (Details on next slide)

  26. a0 a1 a2 Goal State Initial State s0 s1 s2 s3 Time-Complexity Theorem • P = an instance of FCP; D = a classical domain • Suppose P’s time complexity is O(f(||)), where f is monotonic • D = a nondeterministic version of D • ND-P’s time complexity is O(p(f(||))) • Caveat: π may be exponentially larger than π a2 s0 a0 s3 s4 a1 Initial States s2 Goal States s5 a1 s1

  27. Special Case • Suppose that P runs in polynomial time and ND-P produces solutions of polynomial size • Then ND-P runs in polynomial time • Example: Blocks World • Given the appropriate domain knowledge • TALplanner, TLplan and SHOP2 solve Blocks-World problems in polynomial time • ND-TALplanner, ND-TLplan, and ND-SHOP2 produce solutions of polynomial size • With this domain knowledge, • ND-TALplanner, ND-TLplan, and ND-SHOP2 solve nondeterministic-BW problems in polynomial time

  28. Experimental Verification • Implementation of ND-SHOP2 • Compare with MBP (Bertoli et al., 2001) • The best-known planner for nondeterministic domains • Based on symbolic model-checking • Two experimental domains • Robot-Navigation (Kabanza et al., 1997) • The e. coli of research on planning with nondeterminism • Nondeterministic Blocks-World

  29. Robot NavigationDomain • Adapted from(Kabanza et al., 1997) • Rooms, doors, hallway • Robot can open/close doors, move packages to other rooms • Objective: move packages to their destinations • A kid runs around and randomly opens/closes doors • Robot may need to re-open a door repeatedly to go through • Experimental Setup • Kid doors: k = 1, …, 7 • Packages: n = 1, …, 5 • 20 randomly-generated problems for each combination of n, k

  30. Varying the problem size

  31. Varying the amount of nondeterminism

  32. c a b Graspblock c c b a Intendedoutcome a b c Unintendedoutcome Nondeterministic Blocks World • Traditional Blocks-World operators: • pickup, putdown, stack, unstack • Actions may have unintended outcomes • e.g., drop a block on the table • Experimental Setup • vary number of blocksfrom 3 to 10 • 20 randomly-generatedproblems for each case

  33. Varying the problem size

  34. Complexity Analysis • Complexity analysis shows MBP running in exponential time and ND-SHOP2 running in time O(n5) • To see why, need to understand how MBP and ND-SHOP2 work

  35. Representing Policies • A policy π is a partial functionfrom states into actions π(s0) = a0, π(s1) = a1, π(s2) = a1, π(s3) = a2 • Can use a symbolic representationroughly like this: if in(r4) and holding(b) and door-closed(r4) then π(s) = open-door(r4) if in(r4) and holding(b) and door-open(r4) then π(s) = go(r4, hall) • Each state description ignores all doors other than d4 • Includes an exponential number of states • Both MBP and ND-SHOP2 use symbolic representations of policies • Can write polynomial-size policies for exponentially large state spaces

  36. How MBP Generates Policies • MBP uses model-checking techniques • e.g., computing pre-images of sets of states • Roughly like a breadth-first backward search • MBP may need to explore exponentially many states that are unreachable from the initial state • Exponentially many states => exponential time • That’s what happens in the robot navigation and nondeterminized blocks world domains

  37. How ND-SHOP2 Generates Policies • ND-SHOP2 takes domain knowledge in the form of HTN methods • Method m1 Task: take-package (p, r, hall) Precond: in(r), holding(p), door-open(r) Subtasks: go(r, hall) • Method m2 Task: take-package(p, r, hall) Precond: in(r), holding(p), door-closed(r) Subtasks: open-door(r), go(r, hall) • Consider the task take-package(b, r4, hall) • ND-SHOP can very quickly develop the policy if in(r4) and holding(b) and door-closed(r4) then π(s) = open-door(r4) if in(r4) and holding(b) and door-open(r4) then π(s) = go(r4, hall)

  38. Conclusions • A technique for “nondeterminization” of forward-chaining classical planner • Theoretical analysis • Nondeterminization preserves soundness/completeness • Time complexity of the generalized planners is polynomially bounded by the time complexity of the original ones • Experimental verification of the results

  39. Future Work • Nondeterministic planning domains are just like MDPs except that there are no probabilities • We are quite confident that • We can generalize our approach to work in MDPs too • Our “MDP-ized” algorithms will be able to run exponentially faster than traditional MDP algorithms • Preliminary implementation and experiments • So far, very encouraging

  40. Related Work • M. Ghallab, D. Nau, and P. Traverso,Automated Planning: Theory and Practice(Morgan Kaufmann, May 2004) • First comprehensive textbook onautomated planning • models, techniques, algorithms • case studies of applications • Web site: http://www.laas.fr/planning • Lecture slides available online

More Related