Opmaker2: Efficient Action Schema Acquisition

Opmaker2: Efficient Action Schema Acquisition T.L.McCluskey, S.N.Cresswell, N. E. Richardson and M.M.West The University of Huddersfield,UK Email lee@hud.ac.uk

Motivation – Intelligent Agents that can Plan? The availability of knowledge via ontologies is growing • it is not unreasonable that an agent will be able to acquire and refine such knowledge with some degree of autonomy. • A necessary precondition of the use of (current) automated planning technology is that there exists detailed knowledge: action specifications and heuristics • the amount of effort needed to encode bug free, accurate action specifications and planning heuristics, and to maintain them, is significant. for every agent that can perform planning, must we hand code and hand maintain its action descriptions and heuristics? No, if agents are to achieve this kind of autonomy, then they should be capable of learning and refining action knowledge and heuristics.

Vision Agent Learning phase Application Ontologies Agent capable of deliberative planning Plan Engines

Planning and Machine Learning – moved on? Eg my work in 1987 (IJCAI-87, IWML-87) deriving planning heuristics (macros, operator choice, goal-orders) offline YES – must derive heuristics + dynamic domain model

General Idea of Opmaker2 A knowledge base for planning can be divided into 3: • the easy (static) part of a domain model • The (hard) dynamic part • Planning heuristics The idea of Opmaker is to learn the (hard) dynamic part and the planning heuristics from “training sequences” within the context of an existing (static) part of a domain model

Learning Problem - INPUT Input: (a) knowledge of objects, and collections of similar objects making up distinct classes, possible states of a typical object of each class, and state invariants (b) Knowledge of existing plans that other agents, or a trainer, has used [in terms of verbs and affected objects] with initial state and goal state for affected objects the agent does not have an explicit specification of actions in such a way that it can reason about their synthesis (or the agent does have such a specification but needs to refine, maintain or evolve it).

Example of (b) Example training sequence from “Extended Tyre Domain” 1. name: do_up; unchanging: wrench0,jack0, trim1; changing: hub1,nuts1 2. name: jack_down; unchanging:: nil changing: hub1,jack0 3. name: tighten; unchanging: wrench0,hub1,trim1; changing: nuts1 4. name: apply_trim; unchanging: hub1; changing: trim1,wheel5

Learning Problem - OUTPUT • A full parameterised specification of actions which can be used to do planning • Planning heuristics. • Refinement of any existing parameterised specification of actions, and heuristics, that it currently holds. Results: Opmaker2 outputs 1, plus 2 in the form of HTN methods. Trying to avoid 3.

Opmaker vs Opmaker2 But hasn’t this been done before – by Opmaker(1) (2002) for example? Opmaker2 INPUT: Partial (static) Domain Theory Training sequences (.. extra Invariants) OUTPUT: Primitive Operator Schema HTN Schema Opmaker (in GIPO) INPUT: Partial (static) Domain Theory Training sequences Intermediate State Details OUTPUT: Primitive Operator Schema Painful part!

Method Overview (1).use a set of heuristics and inferences to track the changing states (‘object trace’) of each changing object referred to within a training example; produce a set of candidate operators (2). use domain invariants to reduce the set of candidate operators (to a singleton) (3). use the techniques of the original Opmaker algorithm to generalise object references and create parameterised operator schema from the specific object transitions extracted in (i) from the training examples.

Method Overview for (1) O.i KNOWN Id1 Track an object O in a training sequence Id1,Id2,…,Idk,…, Idn: Where Idk includes the last mention of changing O UNKNOWN Idk O.f Method seeks to fill in state transition details..

Method Overview for (1) O.i Id1 Track an object O in a training sequence Id1,Id2,…,Idk,…, Idn: Where Idk includes the last mention of changing O Idk O.f ..by finding the Object’s trace including any associations it makes

Method Overview for (1) Considering a changing object O at training action with id ‘Idx’ .. If Idx is not the last change of O then we potentially have a choice of target state for O … Idx O current state = O.c State that has parameter of object type NOT in Idx Same state Potential new states

Back to Example id1: do_up; unchanging: wrench0,jack0, trim1; changing: hub1,nuts1 id2: jack_down; unchanging:: nil changing: hub1,jack0 id3: tighten; unchanging: wrench0,hub1,trim1; changing: nuts1 Id4: apply_trim; unchanging: hub1; changing: trim1,wheel5

Method Overview for (1) - example Considering a changing object hub1 at training action ‘do_up’ .. If do_up is not the last change of hub1 then we potentially have a choice of target state for hub1 … do_up hub1 current state = unfastened(hub1), jacked_up(hub1,jack0) S1 Same state S2 S3 S4 Potential new states (see PAPER for definitions)

Method Overview – result of step (1) Form sets of candidate operators for each Id, of form • Id - x • Set of parameters • Set of prevail conditions (unchanging objects) • Set of object transitions

Method Overview – step (2) use domain invariants to reduce the set of candidate operators (to a singleton) Eliminate a candidate that is logically consistent with domain invariants

Method Overview – step (3) use the techniques of the original Opmaker algorithm to generalise object references and create parameterised operator schema from the specific object transitions extracted in (i) from the training examples. Generalise those left using OpMaker to create operator schema

Producing Heuristics • Our earlier paper (2002) on Opmaker indicated how the training sequences can be made into ‘canned plans’ and used as components in planning using the Hybrid planner HyHTN • Even earlier work (1997) showed how invariants are key to producing goal ordering heuristics • In Opmaker2 we have implemented the former method and the canned plans are produced with the primitive operator schema

Experiments 3 Domains considered : Extended Tyre Domain, Hiking Domain, Blocks Domain Enough training sequences to include instances of all required operator schema Experimental Criteria: • A. ‘How many’ axioms have to be added to the static domain model to make sure the operator set is unique? • B. Are the operator sets induced capable of being used in solving planning problems? • C. With the canned plans introduced, is plan generation time reduced? • D. Does the performance of the learning program vary a great deal with different training sequences

Experimental Results A. ‘How many’ axioms have to be added to the static domain model to make sure the operator set is unique? • Tyre Domain - 8, Hiking Domain - 0, Blocks Domain - 4 B. Are the operator sets induced capable of being used in solving planning problems? Yes, in all 3 cases C. With the canned plans introduced, is plan generation time reduced? • Tyre Domain, Hiking Domain, Blocks Domain Yes, in all 3 cases (no graphs to show as yet..) D. Still much to do BUT LONG training sequences do not seem to cause a problem – we tried a 22 element solution in the Blocks World and it produced a single set of operator schema BUT result with blocks domain threw up some problems because of the multiple object references of the same class within one operator schema

Conclusions and Future Work Essentially, we have traded off effort to produce a detailed “static” domain model vs effort to produce operator schema modelling actions because • Static knowledge is more easily available • Our experience of operator building in GIPO II showed that creating the parameterised specs to be the hardest part of domain encoding Open Question for future work: How to induce operator schema which require constraints mot appearing in the input object state definitions How to guarantee that a set of invariants is strong enough to deliver a working set of operator schema?

Opmaker2: Efficient Action Schema Acquisition