Evaluating Temporal Planning Domains: Insights and Challenges in IPC Benchmarking

Evaluating Temporal Planning Domains William Cushing Subbarao Kambhampati Kartik Talamadupula Daniel Weld Mausam

L fix-fuse F M light-match -L -M L ^ Competition winners are incomplete • How incomplete? • What should the IPC measure? • Epoch • A time at which an event happens • Decision Epoch Planning • Only start actions after epochs Temporal Planning Required Concurrency

How deep is the problem? • Required Concurrency • Languages • Incomplete for temporally expressive languages • Complete for temporally simple languages

(Minimal) Temporally Expressive Languages • Temporal Gap • Before-condition and effect • After-condition and effect • Two effects • Temporally Simple  No Temporal Gap

Essence of Temporal Planning B * D * * C A * No Temporal Gap  Classical + Scheduling • Forbidding temporal gap implies • All effects at one time • Before-conditions meet effects • After-conditions meet effects • Unique transition per action pre A [d] * eff

Required Concurrency • Inherently sequential is easy • Timestamps (with support for arithmetic) • Loose integration with a PERT scheduler • TGP, LPG-td, SGPlan, MIPS, … • Required concurrency is hard • The plan space is larger • The scheduling sub-problem is harder • Sub-problem optimality principle • State of the art is VHPOP, LPGP, CRIKEY • TEMPO, reduction to CSP

The International Planning Competition • Benchmarks must not require (much) concurrency • How much? • None at all • How do we show it? • Use temporal gap? • Problem: “every” action has temporal gap

Solution: Decompile temporal gap • (navigate ?rover ?alpha ?omega) • Pre: (at start (at ?rover ?alpha)) • Eff: (and • (at start (not (at ?rover ?alpha))) • (at end (at ?rover ?omega))) • (navigate ?rover ?alpha ?omega) • (over all (=> (at ?rover) ?alpha ?omega))

Causal Structure and Concurrency light-match * B * C A * fix-fuse * D light-match D C A B fix-fuse Inherently Sequential Inherently Concurrent

Navigate’s sequential structure navigate navigate ?? ?? communicate

Technique: Start-time Sequentialization • Do not want to enumerate plans! • Nor every sequentialization! • Start-time sequentialization • Fixed attempt • Suffices for benchmarks (not necessary) • End-time sequentialization • Critical-time sequentialization • Start times of containing actions in same order as all dependencies A light-match A B B fix-fuse

Element Safety • Y < X • S(A(Y)) > S(A(X)) • Threat-free • X supports Z, • Y threatens Z • Interaction-free • Z supports Y • X threatens Y • Link-free • Y supports X A B A B B A

Benchmarks never require concurrency (:durative-action navigate :parameters (?x - rover ?y - waypoint ?z - waypoint) :duration (= ?duration 5) :condition (and ;;(at start (at ?x ?y)) ;; MV Fluent ;;(at start (>= (energy ?x) 8)) ;; Resource Consumption (over all (can_traverse ?x ?y ?z)) (at start (available ?x)) (over all (visible ?y ?z)) ) :effect (and ;;(at start (decrease (energy ?x) 8)) ;; Resource Consumption (over all (consume (energy ?x) 8)) ;; Resource Consumption ;;(at start (not (at ?x ?y))) ;; MV Fluent ;;(at end (at ?x ?z)))) ;; MV Fluent (over all (-> (at ?x) ?y ?z)) ;; MV Fluent )) • Durative change on m.v. fluents is safe • Unbounded resources are safe • “The Perils of Discrete Resource Models” • ICAPS workshop on IPC • A few special cases • (at end (calibrated ?c ?r)) • Document… • http://rakaposhi.eas.asu.edu/is-benchmarks.html • Forthcoming ;;(at ?x - rover ?y - waypoint) (at ?x - rover ) - waypoint

Only RC due to Modeling Bugs • 1: drop • 1.1: drop • 2.05: sample • … • (and (full ?s) (empty ?s)) • 1: recharge • 1.1: recharge • 1.2: recharge • … • (>= (energy ?x) (* k (capacity ?x)))

Syntactic Sugar for avoiding Errors • Action drop (store) • full(store) == true at start • full(store) := false at end • Should be at start • empty(store) := true at end • Explicit resources • amount(store) :consume 1 • space(store) :produce 1 • Explicit durative change + m.v. fluents • amount(store) == full => empty

Temporal Machine Shop • Benchmarks lack required concurrency • Real world lacks required concurrency? • (:durative-action fire-kiln :parameters (?k - kiln) :duration (= ?duration 20) :effect (and (over all (lend (firing ?k))) (over all (–> (ready ?k) true false)) • (:durative-action bake-ceramic :parameters (?p - piece ?k - kiln) :duration (= ?duration (bake-time ?p)) :condition (and (over all (firing ?k)) (over all (shaped ?p))) :effect (over all (–> (baked ?p) false true)))

Real world required concurrency • (and (lifted bowl-left) (lifted bowl-right)) • Spray-oil (during milling) • Heat-beaker (while adding chemicals) • Ventilate-room (while drying glue) • …

Lessons for the Competition • Competitors tune for the benchmarks • Most of the competitors simplify to TGP • Either required concurrency is important • Benchmarks should test it • Or it isn’t • Language should be inherently sequential • PDDL spec. highlights light-match • RC occurs in the real world • Might need processes, continuous effects

Conclusion • Required Concurrency separates easy and hard temporal planning • The easy case allows offloading to a scheduler • Still an intriguing problem • Simplify the language – push the classical track • The hard case forces temporal reasoning by the planner • Real world required concurrency is frequent • PDDL 2.1.3 was designed for required concurrency • But the benchmarks fell through • Analysis of domains is hard • Automatable? Embeddable within a search? • Domain modeling is very hard • Durative change • Resources When is Temporal Planning Really Temporal? IJCAI 2007 Evaluating Temporal Planning Domains ICAPS 2007 The Perils of Discrete Resource Models ICAPS 2007, IPC workshop

Research in Concurrent Planning William Cushing, advised by Subbarao Kambhampati L fix-fuse F M light-match -L -M L ^ * light-match B * * C A * fix-fuse D light-match D C A B fix-fuse light fix light match fix fuse light fuse match fuse match fix fix light light light fuse fuse match fuse … When is Temporal Planning Really Temporal? with Mausam and Daniel Weld Existing state-space temporal planners are incomplete, yet win the competitions. Evaluating Temporal Planning Domains with Kartik Talamadupula, Mausam, and Daniel Weld An epoch is a time at which an event happens: when a condition or effect is asserted. An attempt at solving the problem of when to dispatch actions is to make decisions only at pre-existing epochs. Such decision epoch planners overlook the possibility that an action may need to start at a non-epoch. In the figure, the only solution to fixing a blown fuse is depicted: light the match around the end of the fixing fuse process (to avoid electrocution during the critical step of placing the new fuse in the socket). SGPlan, MIPS, SAPA, etc., incorrectly report no solution (or search forever) when given an encoding of this problem. The Perils of Discrete Resource Models with David E. Smith What is wrong with the competitions? Can they be fixed? Why are they incomplete? Vacuously, they are incomplete because there is one problem that they cannot solve. How many more problems? What is common to the class of problems that cannot be solved? Carrying out this analysis at the level of temporal action languages leads to a crisp characterization: decision epoch planners are complete for inherently sequential languages and incomplete for languages permitting required concurrency. A language is inherently sequential when every solution of every problem can be trivially sequentialized; a key result is that languages which forbid required concurrency are inherently sequential. That result does not hold for planning domains or problems; a planning domain might have both sequential and non-sequentializable solutions. No, because the dispatch time of an action can depend upon the future (arbitrarily far). What is tested? What should be tested? Our analysis shows that domains lacking required concurrency are, by and large, inherently sequential. Since inherently sequential domains can be efficiently solved using straightforward integration of classical planning and PERT scheduling (see SGPlan and MIPS) it makes more sense to place such problems into the quality sub-track of the classical track rather than in the temporal track. That is, only those domain features which present great difficulty to planner or planner engineer should be separated into separate tracks. In particular, the temporal track should contain problems requiring concurrency. Not only do the benchmark problems never require concurrency, but the domains themselves are inherently sequential. More precisely, the intended domains are inherently sequential, however, there are a few subtle modeling errors that allow the expression of problems requiring concurrency. Proving this property of the (intended) benchmarks is, however, complicated. Sequentializable Plans A plan is inherently sequential when its causal dependencies permit a sequential scheduling of its actions. The critical thing to show is that if one event is causally dependent upon a concurrent event then the order of the containing action’s start times in a nominal schedule is the same as that required by the causal dependence. Is complete state-space temporal planning possible? Does required concurrency occur in the real world? Many compelling real world examples of required concurrency involve joint effects of the set of actions being applied (e.g. lifted(table) ). While such problems are not easily expressed in the framework of discrete changes, all of the challenges in solving the discrete case remain in more general settings. There are also quite a number of real-world, but toy, examples (light-match). Yes, it is possible: one must delay all scheduling decisions until after the plan is finalized. That is, represent the dispatch times of actions using temporal variables, and solve for the values only after all actions have been selected and ordered. light-match A A Is required concurrency really harder? Durative change B B fix-fuse Under reasonable assumptions, both kinds of problems are in PSPACE. Nonetheless, any existing (complete) temporal planner can have the depth of its search space halved if the problem is known to be inherently sequential – an exponential improvement in performance. The benchmarks resort to compilation techniques to capture the concept of a durative (and mutually exclusive) change. This introduces temporal gap, and so needlessly complicates our analysis. We prove that the compilation is correct, in that concurrent access to the fluent is prevented, and employ a syntax to directly capture the non-compiled concept. The canonical example is a navigate action: we represent the statement “the navigate(rover, alpha, omega) requires the rover begin at alpha, and causes it to move to omega” using one effect. This is a state-space search in the sense that search nodes contain the requisite information for the application of state-based reachability heuristics. Aren’t inherently sequential problems interesting? Resources Allowing concurrent production and consumption of a resource easily leads to required concurrency, for example, avoiding over-flowing. However, due to the nature of PDDL, changes must be conservatively discretized. This is quite challenging, and the benchmarks only attempt the simpler scenario of discretizing inherently sequential resource usage scenarios. The key thing to show is that the discretized production effects do not threaten any constraints, which we show by rewriting the models so they lack capacity constraints. The duration of a plan is, very, often an important factor in the quality of the plan. In turn, plan quality (if not optimality) is often a critical factor in potential real world applications. So, just as supporting non-uniform costs is an important goal, supporting non-uniform durations in a GraphPlan style of concurrency (c.f. Temporal GraphPlan) is also an important goal: for the classical track.

Evaluating Temporal Planning Domains: Insights and Challenges in IPC Benchmarking