ILP : Inductive Logic Programming. Induction. Given a background theory Th (clauses) positive examples Pos (ground facts) negative examples Neg (ground facts) Find a hypothesis Hyp in the form of a logic program such that for every p Pos : Th Hyp |= p
Induction • Given • a background theory Th (clauses) • positive examples Pos (ground facts) • negative examples Neg (ground facts) • Find a hypothesis Hyp in the form of a logic programsuch that • for every pPos: Th Hyp |= p • (Hyp covers p given Th ) • for every nNeg: Th Hyp |= n • (Hyp does not cover p given Th ) • ILP generates Hyp in the form of a logic program.
Consistent hypothesis complete incomplete
Example • Predicates: • group(X), in_group(e1,c1). • circle(Z), square(Z), • triangle (t3,up). • Description of the first set • group(e1). • circle(c1). triangle(t1,up). triangle(t2,up). • triangle(t3,up). square(s1). • in_group(e1,c1). in_group(e1,t1). in_group(e1,t2). • inside(t3,c1). inside(s1,t2). • How can candidate hypothesis look like? • positive(X) :- group(X), in_group(X,Y1), triangle(Y1,up), in_group(X,Y2), triangle(Y2,up). • negative(X) :- group(X), in_group(X,Y1), triangle(Y1,down).
What operations are used in the process of induction? Generalization and specialization example action hypothesis +p(b,[b])add clausep(X,Y). -p(x,[])specialisep(X,[V|W]). -p(x,[a,b]) specialise p(X,[X|W]). +p(b,[a,b]) add clause p(X,[X|W]).p(X,[V|W]):-p(X,W). Induction: example
Algorithms ILP Generic ILP algorithm needs description of operations for design of new hypothesis • Top-down approach: specialization (used e.g. in FOIL) • Bottom-up approach: generalization (used e.g. in GOLEM)
m(X,X) m(X,Y):-m(Y,X) m([X|Y],Z) m(X,[Y|Z]) m(X,[X|Z]) m(X,[Y|Z]):-m(X,Z) m(X,Y) • The set of (equivalence classes of) clauses is a lattice: • C1 is more general than C2 iff for some substitution : C1 C2 • greatest lower bound -MGS, least upper bound -LGG • Specialisation applying a substitution and/or adding a literal • Generalisation applying an inverse substitution and/or removing a literal • Comment: There can be infinite chains! odstavce Generality of clauses
Specialization operators Hypothesis F is a specialization of G iff F is a logical consequence of G G |= F (any model of G is a model of F). Specialization operatorspec specifies the set of its specializations of a given clause. 2 basic spec. operations • processing of used variables • unification of 2 variables: spec(p(X, Y )) = p(X, X) • substitution • by a constant : spec(num(X)) = num(0) • by a compount term:spec(num(X) = num(s(Y)) . • Adding a literal into the body spec (p(X,Y)) = (p(X,Y):- edge(U,V))
element(X,Y) element(X,X) element(X,Y):-element(Y,X) element(X,[Y|Z]) element([X|Y],Z) element(X,[X|Z]) element(X,[Y|Z]):-element(X,Z) Part of the specialisation graph for element/2
ILP generalization methods(searching the hypothesis space bottom-up) The set of clauses is partially ordered by the relation of subsumption, characterizing „generalization“and specialization (refinement) Def.: Let c, c1 be clauses. It is said that c-subsumesc1, if there is a substition such that c c1. Example: c= daughter(X,Y) :- parent(Y,X). c1 = daughter(X,Y) :- female(X),parent(Y,X). c2= daughter(mary,ann) :- female(mary),parent(ann,mary),parent(ann,tom). Clause c is at least as general as the clause c1 iff c-subsumes c1. Clause c is more general than the clause c1 (c1 is a specialization of c) iff c- subsumes c1 and it is not true that c1- subsumes c.
Usage of the operation -subsumes Lemma 1: If c-subsumes c1, thenc1 is a consequence of c, ie. c |- c1. Does the reverse claim hold? NO! See example c = list([V|W]) :- list(W). c1= list([X,Y|Z]) :- list(Z). Lemma 2: Using the partial order defined by -subsumption there can be found for any 2 clauses c, d their least upper and biggest lower bound (which is unique up to renaming of variables and -equivalence). Ussage? Pruning the space of hypotéz. Notation: d<cifd-subsumesc, ie. d is a generalization of c Application: Let e be an positive example covered by the clause c, ie. c |- e. According toL1 our hypothesis should be the generalizations of examples.
-subsumtion and the search in the space of hypothesis If we generalize c to d ( d < c), all examples covered by c will be covered by d as well. Ifccovers somenegative example, it is no good to generalize c. If we specialize c tof ( c < f), then the example not covered by c, will not be covered by f. If cdoes not cover some pozitive example, c is not worth of further specialization. Search for least general generalization – operator lgg – is purely syntactic task Example:lgg( [a,b,c], [a,c,d]) = [a,X,Y]. lgg( f(a,a), f(b,b)) = f (lgg(a,b), lgg(a,b)) = f (V,V), Attention to occurence of the same variable V in the case of repeated occurence of lgg(a,b), this is not the case of lgg(a,b) andlgg(b,a)
Definition of the lgg operator lgg for terms t1, t2 • lgg(t,t) = t • lgg(f(s1,..,sn),f (t1,..,tn)) = f(lgg(s1,t1),.., lgg(sn,tn)) • lgg(f(s1,..,sn),g (t1,..,tm)) = V, V- variable and f,g are different function symbols • lgg(s,t) = V, where Vis a variable provided that at least one of the terms s,t is a variable lgg for atomic formulas lgg(A1,A2) • lgg(p(s1,..,sn),p(t1,..,tn)) = p(lgg(s1,t1),.., lgg(sn,tn)) – the case of 2 atoms with the same predicatep • lgg(p(s1,..,sn),q (t1,..,tn)) is not defined, if p and q are different symbols lgg for literals lgg(L1,L2) • If both L1and L2 are positive, the task is reduced to lgg of atomic formulas • If both L1 and L2 are negative, ie. L1= not A1, L2= not A2, than lgg (L1,L2) = not lgg(A1,A2) • If L1 is positive and L2 negative, lgg(L1,L2) is not defined Example: lgg(parent(ann,mary),parent(ann,tom)) = parent(ann,X). lgg(parent(ann,mary),daughter(ann,tom)) not defined
lgg for clauses c1,c2 Suppose c1 = {L1,..,Ln} andc2 = {K1,..,Km}, then lgg (c1,c2) = { Fij = lgg(Li, Kj): Li Î c1, Kj Î c2 and lgg(Li ,Kj ) is defined} Example: c1 = daughter(mary,ann) :- female(mary),parent(ann,mary). c2 = daughter(eve,tom) :- female(eve),parent(tom,eve). lgg(c1,c2) = daughter(X,Z) :- female(X),parent(Z,X). Generalization wrt to background knowledge represented by conjunction K of ground facts - relative generalization by the operator rlgg rlgg(A1,A2) = lgg (A1:-K, A2 :-K) Appliaction in ILP: K is the set of all available facts from the task domain, atoms A1,A2correspond to the training examples
Example: application of rlgg e1= daughter(mary,ann), e2= daughter(eve,tom) K = parent(ann, mary) & …& parent(tom,ian) & female(ann) & … female(eve). c1= e1 :-K = d(m,a):-p(a,m) ,p(a,t),p(t,e),p(t,i),f(a),f(m),f(e). c2 = e2:-K = d(e,t):-p(a,m),p(a,t),p(t,e) ,p(t,i) ,f(a),f(m),f(e). rlgg(e1,,e2)= lgg(c1,,c2) = d(Vm,e,Va,t ):-p(a,m),p(a,t),p(t,e),p(t,i),f(a),f(m),f(e), p(a, Vm,t),p(Va,t, Vm,e), p(Va,t, Vm,i), p(Va,t, Vm,i) , p(a, Vt,m), p(V,t,a, Vi,m),…, f (Va,m),f (Va,e),f (Vm,e), … ., whereVm,e is lgg(m,e). Caution! The results of rlgg tends to be very long!
Irelevant literals d(Vm,e,Va,t ):-p(a,m),p(a,t),p(t,e),p(t,i),f(a),f(m),f(e),p(a, Vm,t), p(Va,t, Vm,e), p(Va,t, Vm,i), p(Va,t, Vm,i) , p(a, Vt,m), p(V,t,a, Vi,m),…,f (Va,m),f (Va,e),f (Vm,e), … . Are there some literals which make no difference for distinguishing between positive and negative examples? If so, can they be omitted? If omitting a literal does not result in covering a negative example, we consider this literal to be irelevant. d(Vm,e,Va,t ):-p(Va,t, Vm,e), f (Vm,e). daughter(X,Y) :- parent(Y,X), female(X).
Generic ILP algorithm using a set R of rules for modif. of hypothesis Input: B background knowledge E+ (E-) the set of positive (negative) examples QH := inicialize(B; E+, E –) ; /*suggestion of the starting hypothesis*/ while not (end_criterion(QH)) do choose ahypothesisHfromQH ; choose_modification_rulesr1,…,rkfromR ; applyingr1,…,rktoHcreate the new hypothesisH1fitting best E+andE-; QH := (QH-H) + H1 ; cancel_some_membersofQH ; /*pruning*/ filter the set of examples E+andE- Choose_hypothesisPfromQH
When is ILP usefull? • ILP is a good choice whenever • relation among considered objects have to be taken into account • the training data have no uniform structure (some objects are described extensively, other are mentioned in several facts only) • there is extensive background knowledge which should be used for construction of hypothesis • Some domains with succesfull industrial or research ILP applications: • Bioinformatics, medicine, ecology • Technical applications (finite element mash design, ..) • Natural language processing
Bioinformatics: SAR tasks • Structure Activity Relationships (SAR) task: given • chem.structure of a compund • empiric data about its toxicity/ mutageneticity/ terapeutic influence. • What is the cause of the observed behaviour? Pozitive Negative Result: struktural indicator
Bioinformatics - structural description of organic compounds • Primary structure = sequence of aminoacids. • Is it possible to predict the secondary structure (folds in space) from info about its primary structure ? • Support for interpretation of NMR (nucleo-magnetic resonance) spectrum - there is required classification into 23 structural types. Classical ML methods - 80% accuracy, ILP 90% - corresponds to the results of a domain expert
Bioinformatics - carcinogenicity • 230 aromatic and heteroaromatic compounds of natrium 188 compunds are well classifiable by attribute methods + remaining 42 coumponds, which are highly regression-unfriendly (denoted as RU group). • The advantages of relational reprezentation have been demonstarted on the RU group : The hypothesis suggested by the ILP system PROGOL achieved 88%accuracy while the classical attribute ML methods reached about 20 % less.
Systems Aleph (descendant of P-Progol), Oxford University Tilde + WARMR = ACE (Blockeel, De Raedt 1998) FOIL (Quinlan 1993) GOLEMdesignes a hypothesis by a method which combines several rlgg steps and omitting of irelevant literals MIS (Shapiro 1981), Markus (Grobelnik 1992), WiM (1994) RSD (Železný 2002) search for interesting subgroups Other systems: http://www-ai.ijs.si/~ilpnet2/systems/