From high level goals to policies: a polynomial time algorithm for k-maintainable goals

From high level goals to policies: a polynomial time algorithm for k-maintainable goals Chitta Baral Arizona State university (joint work with Marcus Bjareland, Thomas Eiter, Mutsumi Nakamura, and Tran Son)

Quick overview of my research • Knowledge Representation and Reasoning • Language design; theoretical building blocks; implementation; applications. • Action, change and histories • Developing languages for representing actions, the structure of the world, and the effects of the actions on the world. • Developing languages for expressing goals or directives. • Developing ways to achieve goals • Formulating various kinds of reasoning (e.g. prediction, planning, explanation, diagnosis, counterfactuals, etc.) • Application of the above to modeling cell behavior • Prediction: (side) effect of drugs • Planning: Drug design • Explanation: explaining unusual behavior; medical diagnosis • Others: hypothesis generation

Motivation: Parameterized maintainability goals • Always f, also written as □ f - too strong for many kind of maintainability (eg. maintain the room clean) • Always Eventually f, also written as □◊ f. - Weak in the sense it does not give an estimate on when f will be made true. - May not be achievable in presence of continuous interference by belligerent agents. • □ f ------------------ □◊k f -------------------------- □◊ f • □◊3 f is a shorthand for □ ( f VOf VOOf VOOOf ) • But if an external agent keeps interfering how is one supposed to guarantee □◊3 f .

Motivation: a controller-agent transcript Controller (to the agent/robot):Your goal is to maintain the room clean. Robot/Agent:Can you be precise about what you mean by ‘maintain’? Also can I clean anytime or are there restrictions? Controller:You can only clean when the room is unoccupied. Controller:By ‘maintain’ I mean ALWAYSclean. Robot/Agent:I won’t be able to guarantee that. What if while the room is occupied some one makes it dirty? Controller:Ok, I understand. How about ALWAYS EVENTUALLLYclean. Controller’s Boss:‘Eventually’ is too lenient. We can’t have the room unclean for too long. We should put some bound.

Controller-agent transcript (cont) Controller:Sorry, Sir. I should have made it more precise. ALWAYSEVENTUALLY3 clean Robot/Agent:Sorry. I can neither guarantee ALWAYS EVENTUALLLY clean nor guarantee ALWAYS EVENTUALLLY3 clean. What if the room is continuously being used and you told me I can not clean while it is being used. Controller:You have a good point. Let me clarify again. If you are given an opportunity of 3 units of time without the room being occupied (i.e., without any interference from external agents) then you should have the room clean during that time. Robot/Agent:I think I understand you. But as you know I am a robot and not that good at understanding English. Can you please input it in a precise language.

Formulating k-maintainability: a system • A system is a quadrupleA = (S,A,Ф, poss), where – S is the set of system states; – A is the set of actions, which is the union of the set of agents actions, Aag, and the set of environmental actions, Aenv; – Ф : S x A → 2 S is a non-deterministic transition function that specifies how the state of the world changes in response to actions; – poss : S → 2 A is a function that describes which actions are possible to take in which states.

A system s3 s6 a1 a4 s1 a1 s4 a5 a3 a2 s7 s2 s5 S = {s1,s2,s3,s4,s5,s6,s7} A = {a1, a2, a3,a4,a5} Ф : as shown in the picture poss(s1) = {a1,a2,a3} poss(s4) = {a4}

a c d a a a’ a b f h e g S = {b,c,d,f,g,h} A = {a, a’, e} Aag = {a, a’} Aenv = {e} Ф : as shown in the picture poss(b) = {a} when our policy dictates a to be executed at b.

Controls and super-controls • Given a system A = (S,A,Ф, poss) and a set Aag (subset of A) of agent actions, – a control policy for A w.r.t. Aag is a partial function K: S → Aag, such that K(s) is an element of poss(s) whenever K(s) is defined. – a super-control policy for A w.r.t. Aag is a partial function K : S → 2 Aag, such that K(s) is a subset of poss(s) and K(s) ≠ { } whenever K(s) is defined.

Reachable states and closure • Reachable statesR(A,s): Given a system A = (S,A,Ф, poss) and a state s, R(A, s) (subset of S ) is the smallest set of states that satisfy the following conditions: (i) s is in R(A, s); and (ii) If s’ is in R(A, s) and a is in poss(s′), then Ф(s’, a) is a subset of R(A, s) . • Let A = (S,A,Ф, poss) be a system and let S be a subset of S. Then the closure of A w.r.t. S, denoted by Closure(S,A), is defined by Closure(S,A) = Us in S R(A, s) .

a c d a a a’ a b f h e g A = (S,A,Ф, poss) R(A,d) = {d,h} R(A,f) = {f, g, h} Closure({d,f}, A) = {d,f,g,h}

Unfoldk(s,A,K): • An element of Unfoldk(s,A,K) is a sequence of states of length at most k + 1 that the system may go through if it follows the control K starting from the state s. Formally: Let A = (S,A,Ф, poss) be a system, let s belong to S, and let K be a control for A. Then Unfoldk(s,A,K) is the set of all sequences σ = s0, s1, . . . , sl where l ≤ k and s0 = s, such that K (sj) is defined for all j<l, sj +1 belongs to Ф (sj, K(sj)), and if l<k, then K(sl) is undefined.

a c d a a a’ a b f h e a g Consider policy K : Do action a in states b, c, and d Unfold3(b,A,K) = { <b,c,d,h>, <b,g>} Unfold3(c,A,K) = { <c,d,h> }

Definition of k-maintainability: the parameters 1. a system A = (S,A,Ф, poss), 2. a set Aag ⊆ A of agent actions, 3. set of initial states S 4. a set of desired states E that we want to maintain, 5. Maintainability parameter k. 6. a function exo : S → 2 Aenv detailing exogenous actions, such that exo(s) is a subset of poss(s), and 7. a control K (mapping a relevant part of S to Aag) such that K(s) belongs to poss(s).

Basic Idea • Ignoring interference: • From any state under consideration by following the control policy one should visit E in k steps. • Accounting for interference: • Broaden the states under consideration from the initial states to all states that can be reached due to the control policy and the environment. (Use the notion of Closure.) • When using Closure • take into account the control policy; ignore other agents actions besides the one dictated by the control policy. • Also only consider exogenous actions in exo(s).

Definition of k-maintainability • possK,exo(s) is the set {K(s)} Uexo(s). • AK,exo = (S,A,Ф, possK,exo) • Given a system A = (S,A,Ф, poss),a set of agents action Aag (subset of A ) and a specification of exogenous action occurrence exo, we say that a control K for A w.r.t. Aag k-maintains subset Sof S with respect to subset E ofS, where k≥0, if - for each state s in Closure(S,AK,exo) and each sequence σ = s0, s1, . . . , srin Unfoldk(s,A,K)with s0 = s, it holds that {s0, s1, . . . , sr} ∩ E ≠ { }.

a c d a a a’ a b f h e g Consider policy K: Do action a in states b, c, and d poss(b) = {a,a’} possK,exo(b) = {a} Closure({b,c},A)= {b,c,d,f,g,h} Closure({b,c},AK,exo)= {b,c,d,h}

a c d a a a’ a b f h e g Goal: 3-maintainable policy for S={b} w.r.t. E={h} Such a policy: Do a in b, c, and d

a c d a e a a’ a b f h e g Goal: 3-maintainable policy for S={b} w.r.t. E={h} No such policy.

Constructing k-maintainable control policies: pre-formulation attempts • Handwritten policies: subsumption architecture, RAPs, situation control rules, protocols. • Our initial motivation behind formulating maintainability was when we tried to formalize what a control module was doing. • Kaebling and Rosenschien 1991: In the control rule “if condition c is satisfied then do action a”, the action a is the action that leads to the goal from any state where the condition c is satisfied.

a c d a a a’ a b f h e g Forward Search: If we use minimal paths or minimal cost paths we might pick a’; then we would have to backtrack. Backward Search: Should we include both d and f.

Propositional Encoding of solutions • Input: An input I is a system A= (S, A,Φ, poss), set of goal states E  S , set of initial states S S, a set AagA, a function exo, and an integer k  0 • Output: A control K such that S is k-maintainable with respect to E (using the control K), if such a control exists. Otherwise the output is the answer that no such control exists. • AIM: Given an input I, we construct a SAT instance sat(I) in polynomial time such that sat(I) is satisfiable if and only if the input I allows for a k-maintainable control, and that the satisfying assignments for sat(I) encode possible such controls.

Propositional encoding: notation • si denotes that • there is a path from state s to some state in E using only agent actions and at most i of them, to which we refer as “there is an a-path from s to E of length at most i,” and that • from each state s'reachable from s, there is an a-path from s' to E of length at most k.

The encoding sat(I) (0) For all states s, and for all j, 0  j <k: sj sj+1 (1) For all s  E: s0 (2) For all states s, t such that Φ(a,s) = t for some action a  exo(s): sk tk (3) For all states s not in E and all i, 1  i  k: sit PS(s) ti-1, where PS(s) = {t  S|  a Aag poss(s): t= Φ(a,s) }; (4) For all initial states not in E: sk (5) For all states s not in E:  s0

Constructing policies from the models of sat(I) • Let M be a model of Sat(I). • CM = {sS| M╞sk} • LM (s): the smallest index j such that M╞sj(i.e., s0, s1 ,…, sj-1 are false and sj is true), which we call the level of s w.r.t. M. • K(s) is defined iff s CM \ E and K(s) {a Aag| Φ(s,a) = t , t CM , LM (t) < LM (s) }

Proposition • Let I consist of a system A= (S, Aag, Φ, poss),where Φ is deterministic, a setAagA,sets of statesE  S, and S  S, an exogenous function exo, and a integer k. Then, (i) S is k-maintainable w.r.t E iff sat(I) is satisfiable. (ii) Given any model M of sat(I), any control K constructed from the algorithm above k-maintains S w.r.t. E.

Reverse Encoding • a  b is equivalent to •  a  b is equivalent to •  ( b)   a is equivalent to • b  a is equivalent to • b’  a’ is equivalent to • a’  b’

Rearranging sat(I) (0) For all states s and for all j, 0  j <k: sj sj+1 s’j  s’j+1 (1) For all s  E: s0 s’0 (2) For all states s, t such that Φ(a,s) = t for some action aexo(s): sk tk s’k tk' (3) For all state s not in E and all i, 1  i  k: sitPS(s) ti-1 , s’i ^tPS(s) t’i-1 where PS(s) = {t S|  a Aag poss(s): t= Φ(a,s) }; (4) For all initial states s not in E: sk s’k (5) For all states not in E:  s0 s’0

a c d a a a’ a b f h e g (6) b’0, c’0, d’0, f’0, g’0 (From 5) (7) g’1, g’2, g’3 (From 3) (8) b’1, c’1 (From 6 and 3) (9) f’3 (From 7 and 2) (10) f’2 (From 9 and 0) (11) f’1 (From 10 and 0) (12) b’2 (From 8, 11, and 3) Thus M = {g’3, g’2, g’1 , g’0, f’3, f’2, f’1 , f’0, b’2, b’1, b’0, c’1, c’0, d’0} LM(b) = 3 LM(c) = 2 LM(d) = 1

Polynomial time generation of control policy and maximal control policy • Computing a model of a Horn theory is a well-known polynomial problem (Dowling & Gallier 84). Thus, • Theorem: Under deterministic state transitions, problem k-MAINTAIN is solvable in polynomial time. • Maximal Control • Each satisfiable Horn theory T has the least model, MT, which is given by the intersection of all its models. • The least model is computable in linear time in the size of the encoding. • This model not only leads to a k-maintainable control, but also leads to a maximal control, in the sense that the control is defined on a greatest set of states outside E among all possible k-maintainable controls for S' w.r.t. E such that S is a subset ofS'.

Dealing with non-deterministic transition functions • Notations: • We say that there exists an a-path of length at most k  0 from a state s to a set of states S' , if either s S', ors S' , k > 0 and there is some action a Aag poss(s) such that for every t Φ(s,a) there exists an a-path of length at most k-1 from tto S'. • s_ai, i > 0, will denote that there is an a-path from s to E of length at most i starting with action a. • The encoding sat'(I) has again groups (0)-(5) of clauses as follows: (0), (1), (4) and (5) are the same as in sat(I). (2) For any state s and tsuch that tΦ(a,s)for some action a  exo(s): sk tk

Dealing with non-deterministic transition functions (cont.) (3) For every state s not in E and for all i, 1  i  k : (3.1) si(a  Aag poss(s) )s_ai; (3.2) for every a  Aag poss(s) and t Φ(s,a) : s_ai ti-1; (3.3) for every a Aag poss(s) if i < k: s_ais_ai+1;

A direct algorithm • Initialization • For all states s not in E make s’0 true. • For all states s not in E without any outgoing edges with agents actions then make s’0 … s’k true. • For all states s, if agent action a is not executable in s then make s_a’0 …s_a’k true. • Repeat until no change or until s’k is true for some initial state s. • If s’i is true then make s’i-1 true. If s_a’i is true then make s_a’i-1 . true. • If tΦ(a,s) for some exogenous action a and t’kis true then make s’k true. • For any state s not in E • If tΦ(a,s) for some agent action a and t’i-1is true then make s_a’i true. • If for all agents actions a that is executable in s we have s_a’i then make s’i true.

A direct algorithm (cont.) • If for some initial state s, s’k is true then the system is not k-maintainable, else construct super-control as follows: • For states s in E, K(s) is undefined and for other states K(s) = { a : s_a’k is not true}

Direct algorithm using counters • Idea: c[s] = i means s’0 … s’i andc[s_a] = i means s_a’0 … s_a’i • Initialization • For all states s not in E make s’0 true. c[s]:= 0. • For all states s not in E without any outgoing edges with agents actions then make s’0 … s’k true. c[s] := k. • For all states s, if agent action a is not executable in s then make s_a’0 …s_a’k true. c[s_a] := k. • The other steps are similar. • The idea can then be extended to actions with durations (or costs).

Computational Complexity • k-maintainability is PTIME-complete (under log-space reduction). PTIME-hardness holds for 1-maintainability, even if all actions are deterministic, and there is only one deterministic exogenous action • k-maintainability is EXPTIME-complete when we have a compact representation. EXPTIME-hardness holds for 1-maintainability, even if all actions are deterministic, and there is only one deterministic exogenous action

Conclusion • High level goal specification is important. • Certain important goal specification notions can not be expressed using existing goal representation languages. • k-maintainability is an important notion. • finite-maintainability is reinvention of Dijkstra's notion of self-stabilization. • There is a big research community of self-stabilization in distributed control and fault tolerance. • But they have not much focused on automatic generation of control (protocol, in their parlance) • They have focused more on proving correctness of hand written protocol • Most specifications over infinite trajectories would be better of with k-maintainability like notions as part of the specification. • Role 1 of k: length of the window of opportunity • Role 2 of k: bound within which maintenance is guaranteed

Conclusion (cont.) • Sat encoding to Horn logic program encoding – an interesting and novel approach to design polynomial algorithms • One often does not think in terms of negative propositions.

THANK YOU!

From high level goals to policies: a polynomial time algorithm for k-maintainable goals