1 / 33

4/1: Search Methods and Heuristics

4/1: Search Methods and Heuristics. Progression: Sapa (TLPlan; FF) Regression: TP4 Partial order: Zeno (IxTET). Reading List. (3/27)Papers on Metric Temporal Planning Paper on PDDL-2.1 standard (read up to--not including--section 6) Paper on SAPA

madra
Télécharger la présentation

4/1: Search Methods and Heuristics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 4/1: Search Methods and Heuristics • Progression: Sapa (TLPlan; FF) • Regression: TP4 • Partial order: Zeno (IxTET)

  2. Reading List • (3/27)Papers on Metric Temporal Planning • Paper on PDDL-2.1 standard (read up to--not including--section 6) • Paper on SAPA • Paper on Temporal TLPlan (see Section 3 for a slightly longer description of the progression search used in SAPA). (regression search for Temporal Planning • Paper on TP4 (regression search for Temporal Planning • Paper on Zeno (Plan-space search for Temporal Planning)

  3. Set <pi,ti> of predicates pi and the time of their last achievement ti < t. Set of protected persistent conditions (could be binary or resource conds). Time stamp of S. Set of functions represent resource values. Event queue (contains resource as well As binary fluent events). State-Space Search:Search is through time-stamped states Search states should have information about -- what conditions hold at the current time slice (P,M below) -- what actions have we already committed to put into the plan (,Q below) S=(P,M,,Q,t) In the initial state, P,M, non-empty Q non-empty if we have exogenous events

  4. Light-match Let current state S be P:{have_light@0; at_steps@0}; Q:{~have_light@15} t: 0 (presumably after doing the light-candle action) Applying cross_cellar to this state gives S’= P:{have_light@0; crossing@0}; :{have_light,<0,10>} Q:{at_fuse-box@10;~have_light@15} t: 0 Time-stamp Light-match Cross-cellar 15 10

  5. “Advancing” the clock as a device for concurrency control To support concurrency, we need to consider advancing the clock How far to advance the clock? One shortcut is to advance the clock to the time of the next earliest event event in the event queue; since this is the least advance needed to make changes to P and M of S. At this point, all the events happening at that time point are transferred from Q to P and M (to signify that they have happened) This This strategy will find “a” plan for every problem—but will have the effect of enforcing concurrency by putting the concurrent actions to “align on the left end” In the candle/cellar example, we will find plans where the crossing cellar action starts right when the light-match action starts If we need slack in the start times, we will have to post-process the plan If we want plans with arbitrary slacks on start-times to appears in the search space, we will have to consider advancing the clock by arbitrary amounts (even if it changes nothing in the state other than the clock time itself). In the cellar plan above, the clock, If advanced, will be advanced to 15, Where an event (~have-light will occur) This means cross-cellar can either be done At 0 or 15 (and the latter makes no sense) ~have-light Light-match Cross-cellar Cross-cellar 15 10

  6. Search Algorithm (cont.) • Goal Satisfaction: S=(P,M,,Q,t)  G if <pi,ti> G either: •  <pi,tj>  P, tj < ti and no event in Q deletes pi. •  e  Q that adds pi at time te < ti. • Action Application: Action A is applicable in S if: • All instantaneous preconditions of A are satisfied by P and M. • A’s effects do not interfere with  and Q. • No event in Q interferes with persistent preconditions of A. • A does not lead to concurrent resource change • When A is applied to S: • P is updated according to A’s instantaneous effects. • Persistent preconditions of A are put in  • Delayed effects of A are put in Q. S=(P,M,,Q,t) [TLplan; Sapa; 2001]

  7. Regression Search is similar… R W X y • In the case of regression over durative actions too, the main generalization we need is differentiating the “advancement of clock” and “application of a relevant action” • Can use same state representation S=(P,M,,Q,t) with the semantics that • P and M are binary and resource subgoals needed at current time point • Q are the subgoals needed at earlier time points •  are subgoals to be protected over specific intervals • We can either add an action to support something in P or Q, or push the clock backward before considering subgoals • If we push the clock backward, we push it to the time of the latest subgoal in Q • TP4 uses a slightly different representation (with State and Action information) Q A3:W A2:X A1:Y We can either work On R at tinf or R and Q At tinf-D(A3) [TP4; 1999]

  8. Let current state S be P:{at_fuse_box@0} t: 0 Regressing cross_cellar over this state gives S’= P:{}; :{have_light,< 0 , -10>} Q:{have_light@ -10;at_stairs@-10} t: 0 Cross_cellar Have_light Notice that in contrast to progression, Regression will align the end points of Concurrent actions…(e.g. when we put in Light-match to support have-light) This example changed since the class

  9. Notice that in contrast to progression, Regression will align the end points of Concurrent actions…(e.g. when we put in Light-match to support have-light) Cross_cellar S’= P:{}; :{have_light,< 0 , -10>} Q:{have_light@-10;at_stairs@-10} t: 0 If we now decide to support the subgoal in Q Using light-match S’’=P:{} Q:{have-match@-15;at_stairs@-10} :{have_light,<0 , -10>} t: 0 Have_light Cross_cellar Have_light Light-match

  10. PO (Partial Order) Search Involves LPsolving over Linear constraints (temporal constraints Are linear too); Waits for nonlinear constraints To become linear. Involves Posting temporal Constraints, and Durative goals Split the Interval into Multiple overlapping intervals [Zeno; 1994]

  11. More on Temporal planningby plan-space planners (Zeno) • The “accommodation” to complexity that Zeno makes by refusing to handle nonlinear constraints (waiting instead until they become linear) is sort of hilarious given it doesn’t care much about heuristic control otherwise • Basically Zeno is trying to keep the “per-node” cost of the search down (and if you do nonlinear constraint consistency check, even that is quite hard) • Of course, we know now that there is no obvious reason to believe that reducing the per-node cost will, ipso facto, also lead to reduction in overall search. • The idea of “goal reduction” by splitting a temporal subgoal to multiple sub-intervals is used only in Zeno, and helps it support a temporal goal over a long duration with multiple actions. Neat idea. • Zeno doesn’t have much of a problem handling arbitrary concurrency—since we are only posting constraints on temporal variables denoting the start points of the various actions. In particular, Zeno does not force either right or left alignment of actions. • In addition to Zeno, IxTeT is another influential metric temporal planner that uses plan-space planning idea.

  12. At_fusebox Have_light@t1 t1 I Cross_cellar G t2 at_fuse_box@G} Have_light@<t1,t2> t2-t1 =10 t1 < tG tI < t1

  13. The ~have_light effect at t4 can violate the <have_light, t3,t1> causal link! Resolve by Adding T4<t3 V t1<t4 ~have-light t3 t4 Burn_match At_fusebox Have_light@t1 t1 I Cross_cellar G t2 at_fuse_box@G} Have_light@<t1,t2> t2-t1 =10 t1 < tG tI < t1 T4<tG T4-t3=15 T3<t1 T4<t3 V t1<t4

  14. Notice that zeno allows arbitrary slack between the two actions ~have-light t3 t4 Burn_match At_fusebox Have_light@t1 t1 I Cross_cellar G t2 at_fuse_box@G} Have_light@<t1,t2> t2-t1 =10 t1 < tG tI < t1 T4<tG T4-t3=15 T3<t1 T4<t3 V t1<t4 T3<t2 T4<t3 V t2<t4 To work on have_light@<t1,t2>, we can either --support the whole interval directly by adding a causal link <have-light, t3,<t1,t2>> --or first split <t1,t2> to two subintervals <t1,t’> <t’,t2> and work on supporting have-light on both intervals

  15. 4/3 Discussion of the Sapa/Tp4/Zeno search algorithms Heuristics for temporal planning

  16. Q/A on Search Methods for Temporal Planning • Menkes: What is meant by the argument that resources are always easy to handle for progression planners? • The idea is that the partial plans in the search space of a progression planner are “position constrained”—so you know exactly when each action starts. Given then, it is a simple matter to check if a particular resource constraint (however complicated and nonlinear) holds over a time point or interval. In contrast, partial order planners only have constraints on the start points. So, checking that a resource constraint is valid involves checking that it holds on every possible assignment of times to the temporal variables. The difference is akin to the difference between model checking and theorem proving [Halpern & Vardi; KR91] (you can check the consistency of more complicated formulas in more complicated logics if you only need to do model-checking rather than inference/theorem proving

  17. Q/A contd. • Dan: Can the “interval goal reduction” used in Zeno be made more goal directed? • Yes. For example, regressing a goal have_light@[1 15] over an action that gives have_ligth@[1 7] will make it have_light@[7 15] • Making the reduction goal directed may be actually a smarter idea (especially for position constrained planners—for zeno, it doesn’t make much difference since it splits the interval into two variable-sized intervals.

  18. Q/A contd • Romeo: TL Plan paper says that their strategy is to keep adding concurrent actions until no more actions can be added at the current point, and only then advance the clock. Is this used in SAPA too? • Rao: I am surprised to hear that TLPlan does that. If this is used as a “strategy” rather than as a “heuristic”, then it can lead to loss of completeness. In general, we just because an action can be donedoesn’t mean that it should be done. • For example, consider a problem where you want a goal G. Ultimately, all actions that give G wind up requiring, among other conditions, the condition P*. P* is present in the init state. There is an action A that deletes P* and no action gives P*. It is applicable in the init state and doesn’t interfere with ANY of the other actions. Now, if we put A in the plan, just because it can be done concurrently, then we know we are doomed. • I (Rao) made this mistake in my ECP-97 paper on Graphplan (see Footnote 2 in http://rakaposhi.eas.asu.edu/pub/rao/ewsp-graphplan.ps), and figured out my error later

  19. Tradeoffs: Progression/Regression/PO Planning for metric/temporal planning • Compared to PO, both progression and regression do a less than fully flexible job of handling concurrency (e.g. slacks may have to be handled through post-processing). • Progression planners have the advantage that the exact amount of a resource is known at any given state. So, complex resource constraints are easier to verify. PO (and to some extent regression), will have to verify this by posting and then verifying resource constraints. • Currently, SAPA (a progression planner) does better than TP4 (a regression planner). Both do oodles better than Zeno/IxTET. However • TP4 could be possibly improved significantly by giving up the insistence on admissible heuristics • Zeno (and IxTET) could benefit by adapting ideas from RePOP.

  20. Classical Planning • Number of actions • Parallel execution time • Solving time • Temporal Resource Planning • Number of actions • Makespan • Resource consumption • Slack • ……. Heuristic Control Temporal planners have to deal with more branching possibilities  More critical to have good heuristic guidance Design of heuristics depends on the objective function  In temporal Planning heuristics focus on richer obj. functions that guide both planning and scheduling

  21. Objectives in Temporal Planning • Number of actions: Total number of actions in the plan. • Makespan: The shortest duration in which we can possibly execute all actions in the solution. • Resource Consumption: Total amount of resource consumed by actions in the solution. • Slack: The duration between the time a goal is achieved and its deadline. • Optimize max, min or average slack values • Combinations there-of

  22. Pruning a bad state while preserving the completeness. • Deriving admissible heuristics: • To minimize solution’s makespan. • To maximize slack-based • objective functions. Find relaxed solution which is used as distance heuristics Adjust the heuristic values using the resource consumption Information. Adjust the heuristic values using the negative interaction (Future work) Deriving heuristics for SAPA We use phased relaxation approach to derive different heuristics Relax the negative logical and resource effects to build the Relaxed Temporal Planning Graph [AltAlt,AIJ2001]

  23. Heuristics in Sapa are derived from the Graphplan-style bi-level relaxed temporal planning graph (RTPG) Progression; so constructed anew for each state..

  24. A B Person Airplane Person t=0 tg Load(P,A) Unload(P,A) Fly(A,B) Fly(B,A) Unload(P,B) Init Goal Deadline Relaxed Temporal Planning Graph • Relaxed Action: • No delete effects • May be okay given progression planning • No resource consumption • Will adjust later • while(true) • forallAadvance-time applicable in S • S = Apply(A,S) • Involves changing P,,Q,t • {Update Q only with positive • effects; and only when there is no other earlier event giving that effect} • ifSG then Terminate{solution} • S’ = Apply(advance-time,S) • if (pi,ti) G such that • ti < Time(S’) and piS then • Terminate{non-solution} • elseS = S’ • end while; Deadline goals

  25. Details on RTPG Construction All our heuristics are based on the relaxed temporal planning graph structure (RTPG). This is a Graphplanstyle[ 2] bi-level planning graph generalized to temporal domains. Given a state S = (P;M; ¦; Q; t), the RTPG is built from S using the set of relaxed actions, which are generated from original actions by eliminating all effects which (1) delete some fact (predicate) or (2) reduce the level of some resource. Since delete effects are ignored, RTPG will not contain any mutex relations, which considerably reduces the cost of constructing RTPG. The algorithm to build the RTPG structure is summarized in Figure 4. To build RTPG, we need three main datastructures: a fact level, an action level, and an unexecuted event queue Each fact f or action A is marked in, and appears in the RTPG’s fact/action level at time instant tf /tA if it can be achieved/executed at tf /tA. In the beginning, only facts which appear in P are marked in at t, the action level is empty, and the event queue holds all the unexecuted events in Q that add new predicates. Action A will be marked in if (1) A is not already marked in and (2) all of A’s preconditions are marked in. When action A is in, then all of A’s unmarked instant add effects will also be marked in at t. Any delayed effect e of A that adds fact f is put into the event queue Q if (1) f is not marked in and (2) there is no event e0 in Q that is scheduled to happen before e and which also adds f. Moreover, when an event e is added to Q, we will take out from Q any event e0 which is scheduled to occur after e and also adds f. When there are no more unmarked applicable actions in S, we will stop and return no-solution if either (1) Q is empty or (2) there exists some unmarked goal with a deadline that is smaller than the time of the earliest event in Q. If none of the situations above occurs, then we will apply advance-time action to S and activate all events at time point te0 of the earliest event e’ in Q. The process above will be repeated until all the goals are marked in or one of the conditions indicating non-solution occurs. [From Do & Kambhampati; ECP 01]

  26. Heuristics directly from RTPG A D M I S S I B L E • For Makespan: Distance from a state S to the goals is equal to the duration between time(S) and the time the last goal appears in the RTPG. • For Min/Max/Sum Slack: Distance from a state to the goals is equal to the minimum, maximum, or summation of slack estimates for all individual goals using the RTPG. • Slack estimate is the difference between the deadline of the goal, and the expected time of achievement of that goal. Proof: All goals appear in the RTPG at times smaller or equal to their achievable times.

  27. A B Person Airplane Person t=0 tg Load(P,A) Unload(P,A) Fly(A,B) Fly(B,A) Unload(P,B) Init Goal Deadline Heuristics from Relaxed Plan Extracted from RTPG RTPG can be used to find a relaxed solution which is then used to estimate distance from a given state to the goals Sum actions: Distance from a state S to the goals equals the number of actions in the relaxed plan. Sum durations: Distance from a state S to the goals equals the summation of action durations in the relaxed plan.

  28. Resource-based Adjustments to Heuristics Resource related information, ignored originally, can be used to improve the heuristic values Adjusted Sum-Action: h = h + R  (Con(R) – (Init(R)+Pro(R)))/R Adjusted Sum-Duration: h = h + R [(Con(R) – (Init(R)+Pro(R)))/R].Dur(AR)  Will not preserve admissibility

  29. Aims of Empirical Study • Evaluate the effectiveness of the different heuristics. • Ablation studies: • Test if the resource adjustment technique helps different heuristics. • Compare with other temporal planning systems.

  30. Adjusted Sum-Action Sum-Duration Prob time #act nodes dur time #act nodes dur Zeno1 0.317 5 14/48 320 0.35 5 20/67 320 Zeno2 54.37 23 188/1303 950 - - - - Zeno3 29.73 13 250/1221 430 6.20 13 60/289 450 Zeno9 13.01 13 151/793 590 98.66 13 4331/5971 460 Log1 1.51 16 27/157 10.0 1.81 16 33/192 10.0 Log2 82.01 22 199/1592 18.87 38.43 22 61/505 18.87 Log3 10.25 12 30/215 11.75 - - - - Log9 116.09 32 91/830 26.25 - - - - Empirical Results • Sum-action finds solutions faster than sum-dur • Admissible heuristics do not scale up to bigger problems • Sum-dur finds shorter duration solutions in most of the cases • Resource-based adjustment helps sum-action, but not sum-dur • Very few irrelevant actions. Better quality than TemporalTLPlan. • So, (transitively) better than LPSAT

  31. Empirical Results (cont.) Logistics domain withdriving restricted to intra-city (traditional logistics domain) Sapa is the only planner that can solve all 80 problems

  32. Empirical Results (cont.) Logistics domain with inter-city driving actions The “sum-action” heuristic used as the default in Sapa can be mislead by the long duration actions...  Future work on fixed point time/level propagation

  33. Multi-objective search Next Class: • Multi-dimensional nature of plan quality in metric temporal planning: • Temporal quality (e.g. makespan, slack) • Plan cost (e.g. cumulative action cost, resource consumption) • Necessitates multi-objective optimization: • Modeling objective functions • Tracking different quality metrics and heuristic estimation  Challenge: There may be inter-dependent relations between different quality metric

More Related