Expressive and Efficient Frameworks for Partial Satisfaction Planning

Expressive and Efficient Frameworks for Partial Satisfaction Planning Subbarao Kambhampati Arizona State University (Proposal submitted for consideration to Behzad Kamgar-Parsi/ONR)

Partial Satisfaction/Over-Subscription Planning • Traditional planning problems • Find the (lowest cost) plan that satisfies all the given goals • PSP Planning • Find the highest utility plan given the resource constraints • Goals have utilities and actions have costs • …arises naturally in many real world planning scenarios • MARS rovers attempting to maximize scientific return, given resource constraints • UAVs attempting to maximize reconnaisance returns, given fuel etc constraints • Logistics problems resource constraints • … due to a variety of reasons • Constraints on agent’s resources • Conflicting goals • With complex inter-dependencies between goal utilities • Soft constraints • Limited time

Supporting PSP planning • PSP planning changes planning from a “satisficing” to an “optimizing” problem • It is trivial to find a plan; hard to find a good one! • Rich connections to OR(IP)/MDP • Requires selecting “objectives” in addition to “actions” • Which subset of goals to achieve • At what degree to satisfy individual goals • E.g. Collect as much soil sample as possible; get done as close to 2pm as possible • Currently, the objective selection is left to humans • Leads to highly suboptimal plans since objective selection cannot be done independent of planning • We propose to develop scalable methods for synthesizing plans in such over-subscribed scenarios

Proposal Overview • Preliminary work • Simple formal model: PSP-Net Benefit • MDP-based, IP-based, and heuristic-planning based approaches • Proposed directions • Improving expressiveness of PSP planners • Handling goals needing degree of satisfaction (e.g. numeric goals) • Handling goals with soft deadline (where utility of the delayed goals is reduced) • Handling complex interactions between objectives • Interactions between the plans of the goals • Interactions between the utilities of the goals • Improving search in PSP planners • More powerful heuristics for PSP planning (which take interactions into account) • More flexible search frameworks --non-combinable costs and utilities • Multi-objective search • Applications • Replanning as a PSP planning problem

PLAN EXISTENCE PLAN LENGTH PSP GOAL PSP GOAL LENGTH PLAN COST PSP UTILITY PSP NET BENEFIT PSP UTILITY COST Formulation • PSP Net benefit: • Given a planning problem P = (F, A, I, G), and for each action a “cost” ca 0, and for each goal fluent f G a “utility” uf  0,and a positive number k. Is there a finite sequence of actions  = (a1, a2, …, an)that starting from I leads to a state S that has net benefit f(SG)uf – aca  k. Maximize the Net Benefit Actions have execution costs, goals have utilities, and the objective is to find the plan that has the highest net benefit.  easy enough to extend to mixture of soft and hard goals

EXACT METHODS Deterministic MDPs Model the problem as a deterministic MDP with action costs, where a state has a reward equal to the utility of the goals that hold in it. A special action “Done” takes the agent from any state S to a state Sd which is a sink state Guaranteed optimal, but very slow (using SPUDD, a state of the art MDP solver) Optiplan Integer programming based STRIPS planner Optimal for a given plan length Equivalent to bounded-horizon MDP HEURISTIC METHODS Altaltps Heuristic planner that selects the “objectives” up front heuristically Novel use of planning-graph based reachability analysis to pick objectives Not optimal, but quite fast Sapaps Models PSP as heuristic search. Can be optimal given admissible heuristics. Can be thought of as a search-based solution to the deterministic MDP A spectrum of approaches for PSP-Net Benefit [AAAI 2004; KBCS 2004] Source of Strength: Planning graph based Reachability Heuristics for PSP

Comparison of approaches Exact algorithms based on MDPs don’t scale at all [AAAI 2004]

[optional] Adapting PG heuristics for PSP • Challenges: • Need to propagate costs on the planning graph • The exact set of goals are not clear • Interactions between goals • Obvious approach of considering all 2n goal subsets is infeasible • Idea: Select a subset of the top level goals upfront • Challenge: Goal interactions • Approach: Estimate the net benefit of each goal in terms of its utility minus the cost of its relaxed plan • Bias the relaxed plan extraction to (re)use the actions already chosen for other goals

SAPAPS: A forward A* Approach for PSP [optional] Anytime A* Algorithm: Search through best beneficial nodes A5: SampleRock(Y) A1: Navigate(X,Y) A2: SampleSoil(Y) A4: Navigate(Y,Z) A3: TakePicture A*: f(S) = g(S) + h(S) g(S) is the net benefit of the plan that got us from initial state to S -- Difference between the utility of goals holding in S and and the cost of actions that took us from I to S h*(S) is the additional net benefit of the best plan P starting from S (If S’ is the result of applying P to S, then we want to maximize [U(S’) – U(S)] – C(P)] h(S) is the estimate of h*()

Search node evaluation (f = g+h): Lowest expected total number of actions Candidate Plans: Qualifying plans: Achieve all goals Search termination criteria: Achieving all goals Search node evaluation (f = g+h): Highest expected total “benefit” (goal utility – action cost). Candidate Plans: “Beneficial” plans:Total achieved goal utility > total action cost. Search termination criteria: No search node appears to be extendable to be more beneficial than the best beneficial plan found. [optional] SAPAPS: Modeling A* search for PSP • Many state-of-the-art planners use best-first A* search. • How to model A* search to PSP Net Benefit?

Proposal Overview • Preliminary work • Simple formal model: PSP-Net Benefit • MDP-based, IP-based, and heuristic-planning based approaches • Proposed directions • Improving expressiveness of PSP planners • Handling goals needing degree of satisfaction (e.g. numeric goals) • Handling goals with soft deadlines (where utility of the delayed goals is reduced) • Handling complex interactions between objectives • Interactions between the plans of the goals • Interactions between the utilities of the goals • Improving search in PSP planners • More powerful heuristics for PSP planning (which take interactions into account) • More flexible search frameworks --non-combinable costs and utilities • Multi-objective search • Applications • Replanning as a PSP planning problem

Search & Heuristic Improvements • Make objective selection more sensitive to goal (achievement) interactions • Consider group interactions • Consider negative interactions • Preliminary work in ICAPS 2005 (with Sanchez Nigenda) • Consider faster techniques for exact methods • Leverage our recent work on novel IP encodings • Based on loosely coupled network flow problems which is highly competitive with SAT methods • ICAPS 2005 (with van den Briel) • Consider adapting directed and anytime MDP techniques

In metric temporal domains, PSP will involve Partial Degree of satisfaction If you can’t give me 1000$, give me half at least Need to track costs for various intervals of a numeric quantity  Delayed Satisfaction If you submit the homework past the deadline, you will get penalty points Degree & Delay of Satisfaction Preliminary work on degree of satisfaction in [IJCAI 2005]

Utility interactions between goals • PSP-net benefit considers goal achievement interactions • ..but assumes additive model of goal utilities • U(G1,G2)= U(G1)+U(G2) • Additive utility model often unrealistic • Utility having two shoes is much more than the sum of the utilities of having either one of them • Utility of having two cars is less than the sum of utilities of having either one of them • Challenges: • Elicit utility models (preference elicitation) • Model utility interactions • Adapt and extend CP-nets for modeling goal utilities • Can also consider qualitative preference models • Extend the reachability heuristics to consider both plan interactions and goal interactions

Non-combinable costs/utilities • PSP Net Benefit assumes costs and utilities are in same units • …often does not hold • E.g. different types of resource costs (fuel, manpower); different types of utilities • Solution: Multi-objective search • Either elicit utility models • Alpha * manpower + Beta * mission utility • ..or search for highest utility plans given a specific resource bound • ..or provide pareto (non-dominated) set of solution plans and let the user choose • Challenge: Need to adapt reachability heuristics to separately track the various types of costs and utilities • We plan to build on our work on multi-objective temporal planning in SAPA

Combining uncertainty and partial satisfaction • Time permitting, we hope to extend our PSP framework to handle stochastic domains • Planning in stochastic domains already has many natural affinities to PSP • If the planner wants to ensure that its plan reaches goals with higher probability, it needs to often go for longer (costlier) plans • ..Many challenges remain in selecting objectives in stochastic domains • We expect to leverage our significant work in extending reachability heuristics for stochastic and non-deterministic domains • [UAI 2005; AAAI 2005; ICAPS 2004; JAIR in review] Note: Not in the proposal draft

Explaining the planner’s decisions in mixed initiative scenarios • In mixed-initiative scenarios, humans would like to get explanations on the selected objectives • Anecdotal evidence suggests that in military planning applications, human users are not willing to take a plan when the objectives selected by the planner do not match the human’s intuition • Challenge: Explaining the “optimality” of the planner’s decisions is technically hard • In contrast, explaining correctness is much simpler • Proposed approach: Will modify the reachability heuristic computations to leave a trace of their reasoning • Intent would be to explain at least the pareto-optimality of the selected set of objectives • when a subgoal cannot not be included because of cost-based or preference-based interactions with other selected subgoals, annotate this fact • summarize the pareto-set (in multi-objective optimization cases) in terms of conditional plans explaining which member of the set is “optimal” under what conditions • Support sensitivity analysis on the stability of the selected objectives (i.e., under what conditions will they no longer be optimal)

Modeling Replanning as a PSP problem • Traditionally, replanning has been cast as a “procedure” rather than a problem • Modify the old plan to handle the new situations • ..we take the stance that replanning is a “problem” • Achieve the original goals of the agent from the current initial situation • Subject to various constraints that were imposed by the partial execution of the original plan • Reservations, Commitments– these are however soft constraints • ..Replanning can be best modeled as a PSP problem! • We propose to do this..

Summary and Impact • PSP planning problems are ubiquitous and extend the modeling power of planning frameworks • .. By foregrounding user preferences among different objectives • They pose interesting technical challenges to the state of the art • ..by emphasizing plan-quality considerations • We have already made significant progress in handling PSP problems • AAAI 2004; ICAPS 2005 (2); IJCAI 2005 • ..and propose to extend our framework significantly • ..as well as demonstrate its power through applications

Expressive and Efficient Frameworks for Partial Satisfaction Planning