1 / 62

Intelligent Systems

Intelligent Systems. Inductive Logic Programming (ILP) – Lecture 11 Prof. Dieter Fensel (& Dumitru Roman ). Agenda. Motivation Technical Solution Model Theory of ILP A Generic ILP Algorithm Proof Theory of ILP ILP Systems and Applications Illustration by a Larger Example

reuben
Télécharger la présentation

Intelligent Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intelligent Systems Inductive Logic Programming (ILP) – Lecture 11 Prof. Dieter Fensel (& Dumitru Roman)

  2. Agenda • Motivation • Technical Solution • Model Theory of ILP • A Generic ILP Algorithm • Proof Theory of ILP • ILP Systems and Applications • Illustration by a Larger Example • Summary and Conclusions

  3. (Machine) Learning • The process by which relatively permanent changes occur in behavioral potential as a result of experience. (Anderson) • Learning is constructing or modifying representations of what is being experienced. (Michalski) • A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. (Mitchell)

  4. (Machine) Learning Techniques • Decision tree learning • Conceptual clustering • Case-based learning • Reinforcement learning • Neural networks • Genetic algorithms • and… Inductive Logic Programming (ILP)

  5. Inductive Logic Programming = Inductive Learning ∩Logic Programming

  6. The inductive learning and logic programming sides of ILP • From inductive machine learning, ILP inherits its goal: to develop tools and techniques to • Induce hypotheses from observations (examples) • Synthesise new knowledge from experience • By using computational logic as the representational mechanism for hypotheses and observations, ILP can overcome the two main limitations of classical machine learning techniques: • The use of a limited knowledge representation formalism (essentially a propositional logic) • Difficulties in using substantial background knowledge in the learning process

  7. The inductive learning and logic programming sides of ILP (cont’) • ILP inherits from logic programming its • Representational formalism • Semantical orientation • Various well­established techniques • Compared to other approaches to inductive learning, ILP is interested in • Properties of inference rules • Convergence of algorithms • Computational complexity of procedures • ILP systems benefit from using the results of logic programming • E.g. by making use of work on termination, types and modes, knowledge­base updating, algorithmic debugging, abduction, constraint logic programming, program synthesis and program analysis

  8. The inductive learning and logic programming sides of ILP (cont’) • Inductive logic programming extends the theory and practice of logic programming by investigating induction rather than deduction as the basic mode of inference • Logic programming theory describes deductive inference from logic formulae provided by the user • ILP theory describes the inductive inference of logic programs from instances and background knowledge • ILP contributes to the practice of logic programming by providing tools that assist logic programmers to develop and verify programs

  9. Outline • Motivation • Technical Solution • Model Theory of ILP • A Generic ILP Algorithm • Proof Theory of ILP • ILP Systems and Applications • Illustration by a Larger Example • Summary and Conclusions

  10. Motivation – Family example • Imagine yourself as learning about the relationships between people in your close family circle • You have been told that your grandfather is the father of one of your parents, but do not yet know what a parent is • You might have the following beliefs (B): grandfather(X, Y) ← father(X, Z); parent(Z, Y) father(henry, jane) ← mother(jane. john) ← mother(jane, alice) ← • You are now given the following facts (positive examples) concerning the relationships between particular grandfathers and their grandchildren (E+): grandfather(henry, john) ← grandfather(henry, alice) ← • You might be told in addition that the following relationships do not hold (negative examples) (E-) ← grandfather(john, henry) ← grandfather(alice, john)

  11. Motivation – Family example (2) • Believing B, and faced with the new facts E+ and E- you might guess the following hypothesisH parent(X, Y) ← mother(X, Y) • H is not a consequence of B, and E-: B /\ E- |≠ □ (prior satisfiability) • However, H allows us to explain E+ relative to B: B /\ H|= E+ (posterior sufficiency) • B and H are consistent with E-: B /\ H /\ E- |≠ □ (posterior satisfiability) • The question: how it is possible to derive (even tentatively) the hypothesis H?

  12. Motivation – Tweety example • Suppose that you know the following about birds: haswings(X) ← bird(X) hasbeak(X) ← bird(X) bird(X) ← vulture(X) carnivore(X) ← vulture(X) • Imagine now that an expedition comes across a creature "Tweety''. The expedition leader telegraphs you to let you know that Tweety has wings and a beak, i.e. E+ is: haswings(tweety) ← hasbeak(tweety) ←

  13. Motivation – Tweety example (2) • Even without any negative examples it would not take a very inspired ornithologist with belief set B to hazard the guess "Tweety is a bird'‘, i.e. H = bird(tweety) ← • It could clearly be refuted if further evidence revealed Tweety to be made of e.g. plastic • The ornithologist would be unlikely to entertain the more speculative hypothesis "vulture(tweety)'', even though this could also be used to explain all the evidence: H’ = vulture(tweety) ← • The question: how do we know from B and E+ that H’ is more speculative than H?

  14. Motivation – Sorting example • Imagine that a learning program is to be taught the logic program for “quick­sort” • The following definitions are provided as background knowledge B: part(X, [], [], []) ← part(X, [Y | T ], [Y | S1], S2) ← Y =< X, partition(X, T, S1, S2) part(X, [Y | T ], S1, [Y | S2]) ← Y > X, part(X, T, S1, S2) app([], L, L) ← app([X | T ], L, [X | R]) ← app(T, L, R) Negative examples (E-): ← qsort([1, 0], [1, 0]) ← qsort([0], []) … Positive examples (E+): qsort([], []) ← qsort([0], [0]) ← qsort([1, 0], [0, 1]) ← …

  15. Motivation – Sorting example (2) • In this case we might hope that the algorithm would, given a sufficient number of examples, suggest the following H for "quick­sort": qsort([], []) ← qsort([XjT ], S) ← part(X, T, L1, L2), qsort(L1, S1), qsort(L2, S2), app(S1, [XjS2], S) • ILP systems such as Golem and FOIL can learn this definition of quick­sort from as few as 6 or 10 examples • Although much background knowledge is required to learn quick­sort, the mentioned ILP systems are able to select the correct hypothesis from a huge space of possible hypotheses

  16. Agenda • Motivation • Technical Solution • Model Theory of ILP • A Generic ILP Algorithm • Proof Theory of ILP • ILP Systems and Applications • Illustration by a Larger Example • Summary and Conclusions

  17. Model Theory – Normal Semantics • The problem of inductive inference: • Given is background (prior) knowledge B and evidence E • The evidence E = E+ /\ E- consists of positive evidence E+ and negative evidence E- • The aim is then to find a hypothesis H such that the following conditions hold: Prior Satisfiability: B /\ E- |≠ □ Posterior Satisfiability: B /\ H /\ E- |≠ □ Prior Necessity: B |≠ E+ Posterior Sufficiency: B /\ H|= E+ • The Sufficiency criterion is sometimes named completeness with regard to positive evidence • The Posterior Satisfiability criterion is also known as consistency with the negative evidence

  18. Model Theory – Definite Semantics • In most ILP systems background theory and hypotheses are restricted to being definite • This setting is simpler than the general setting because a definite clause theory T has a unique minimal Herbrand model M+(T), and any logical formulae is either true or false in the minimal model • This setting is formalised as follows: Prior Satisfiability: all e in E-are false in M+(B) Posterior Satisfiability: all e in E-are false in M+(B /\ H) Prior Necessity: some e in E+ are false in M+(B) Posterior Sufficiency: all e in E+ are true in M+(B /\ H) • The special case of the definite semantics, where the evidence is restricted to true and false ground facts (examples), is called the example setting • The example setting is equivalent to the normal semantics, where B and H are definite clauses and E is a set of ground unit clauses

  19. Model Theory – Non-monotonic Semantics • In the non­monotonic setting: • The background theory is a set of definite clauses • The evidence is empty (Reason: the positive evidence is considered part of the background theory and the negative evidence is derived implicitly, by making a kind of closed world assumption) • The hypotheses are sets of general clauses expressible using the same alphabet as the background theory

  20. Model Theory – Non-monotonic Semantics (2) • The following conditions should hold for H and B: • Validity: all h in H are true in M+(B) • All clauses belonging to a hypothesis hold in the database B, i.e. that they are true properties of the data • Completeness: if general clause g is true in M+(B) then H |= g • All information that is valid in the database should be encoded in the hypothesis • Minimality: there is no proper subset G of H which is valid and complete • Aims at deriving non redundant hypotheses

  21. Model Theory – Non-monotonic Semantics (3) • Example for B: male(luc) ← female(lieve) ← human(lieve) ← human(luc) ← • A possible solution is then H: ← female(X), male(X) human(X) ← male(X) human(X) ← female(X) female(X), male(X) ← human(X)

  22. Model Theory – Non-monotonic Semantics (4) • Example for B1 bird(tweety) ← bird(oliver) ← • Example for E1+ flies(tweety) • In the example setting an acceptable hypothesis H1 would be flies(X) ← bird(X) • In the non­monotonic setting H1 is not a solution since there exists a substitution {X ← oliver} which makes the clause false (non­valid) • This demonstrates that the non­monotonic setting hypothesises only properties that hold in the database

  23. Model Theory – Non-monotonic Semantics (5) • The non­monotonic semantics realises induction by deduction • The induction principle of the non­monotonic setting states that the hypothesis H, which is, in a sense, deduced from the set of observed examples E and the background theory B (using a kind of closed world and closed domain assumption), holds for all possible sets of examples • This produces generalisation beyond the observations • As a consequence, properties derived in the non­monotonic setting are more conservative than those derived in the normal setting

  24. Agenda • Motivation • Technical Solution • Model Theory of ILP • A Generic ILP Algorithm • Proof Theory of ILP • ILP Systems and Applications • Illustration by a Larger Example • Summary and Conclusions

  25. ILP as a Search Problem • ILP can be seen as a search problem - this view follows immediately from the model­theory of ILP • In ILP there is a space of candidate solutions, i.e. the set of hypotheses, and an acceptance criterion characterizing solutions to an ILP problem • Question: how the space of possible solutions can be structured in order to allow for pruning of the search? • The search space is typically structured by means of the dual notions of generalisation and specialisation • Generalisation corresponds to induction • Specialisation to deduction • Induction is viewed here as the inverse of deduction

  26. Specialisation and Generalisation Rules • A hypothesis G is more general than a hypothesis S if and only if G |= S • S is also said to be more specific than G. • In search algorithms, the notions of generalisation and specialisation are incorporated using inductive and deductive inference rules: • A deductive inference rule r in R maps a conjunction of clauses G onto a conjunction of clauses S such that G |= S • r is called a specialisation rule • An inductive inference rule r in R maps a conjunction of clauses S onto a conjunction of clauses G such that G |= S • r is called a generalisation rule

  27. Specialisation and Generalisation – Examples • Example of deductive inference rule is resolution; also dropping a clause from a hypothesis realises specialisation • Example of an inductive inference rule is Absorption: • In the rule of Absorption the conclusion entails the condition • Applying the rule of Absorption in the reverse direction, i.e. applying resolution, is a deductive inference rule • Other inductive inference rules generalise by adding a clause to a hypothesis, or by dropping a negative literal from a clause

  28. Soundness of inductive inference rules • Inductive inference rules, such as Absorption, are not sound • This soundness problem can be circumvented by associating each hypothesised conclusion H with a labelL = p(H | B /\ E) where L is the probability that H holds given that the background knowledge B and evidence E hold • Assuming the subjective assignment of probabilities to be consistent, labelled rules of inductive inference are as sound as deductive inference • The conclusions are simply claimed to hold in a certain proportion of interpretations

  29. Pruning the search space • Generalisation and specialisation form the basis for pruning the search space; this is because: • When B /\ H |≠ e, where B is the background theory, H is the hypothesis and e is positive evidence, then none of the specialisations H’ of H will imply the evidence. Each such hypothesis will be assigned a probability label p(H’ | B /\ E) = 0. • They can therefore be pruned from the search. • When B /\ H /\ e |= □, where B is the background theory, H is the hypothesis and e is negative evidence, then all generalisations H’ of H will also be inconsistent with B /\ E. These will again have p(H’ | B /\ E) = 0.

  30. A Generic ILP Algorithm • Given the key ideas of ILP as search, inference rules and labeled hypotheses, a generic ILP system is defined as: • The algorithm works as follows: • It keeps track of a queue of candidate hypotheses QH • It repeatedly deletes a hypothesis H from the queue and expands that hypotheses using inference rules; the expanded hypotheses are then added to the queue of hypotheses QH, which may be pruned to discard unpromising hypotheses from further consideration • This process continues until the stop­criterion is satisfied

  31. Algorithm – Generic Parameters • Initialize denotes the hypotheses started from • Rdenotes the set of inference rules applied • Deleteinfluences the search strategy • Using different instantiations of this procedure, one can realise a depth­first (Delete = LIFO), breadth­first Delete = FIFO) or best­first algorithm • Choosedetermines the inference rules to be applied on the hypothesis H

  32. Algorithm – Generic Parameters (2) • Prunedetermines which candidate hypotheses are to be deleted from the queue • This is usually realized using the labels (probabilities) of the hypotheses on QH or relying on the user (employing an “oracle”) • Combining Delete with Prune it is easy to obtain advanced search strategies such as hill­climbing, beam­search, best­first, etc. • The Stop­criterion states the conditions under which the algorithm stops • Some frequently employed criteria require that a solution be found, or that it is unlikely that an adequate hypothesis can be obtained from the current queue

  33. “specific­to­general” vs “general­to­specific” • “specific­to­general” systems • Start from the examples and background knowledge, and repeatedly generalize their hypothesis by applying inductive inference rules • During the search they take care that the hypothesis remains satisfiable (i.e. does not imply negative examples) • “general­to­specific” systems • Start with the most general hypothesis (i.e. the inconsistent clause) and repeatedly specializes the hypothesis by applying deductive inference rules in order to remove inconsistencies with the negative examples • During the search care is taken that the hypotheses remain sufficient with regard to the positive evidence

  34. Agenda • Motivation • Technical Solution • Model Theory of ILP • A Generic ILP Algorithm • Proof Theory of ILP • ILP Systems and Applications • Illustration by a Larger Example • Summary and Conclusions

  35. Proof Theory of ILP • Inductive inference rules can be obtained by inverting deductive ones • Deduction: Given B /\ H |= E+ , derive E+ from B /\ H • Induction: Given B /\ H |= E+ , derive H from B and B and E+ • Inverting deduction paradigm can be studied under various assumptions, corresponding to different assumptions about the deductive rule for |= and the format of background theory B and evidence E+ • Different models of inductive inference are obtained • θ-subsumption • The background knowledge is supposed to be empty, and the deductive inference rule corresponds to θ-subsumption among single clauses • Inductive inference • Takes into account background knowledge and aims at inverting the resolution principle, the best­known deductive inference rule • Inverting implication

  36. Rules of inductive inference and operators • Inference rules = what can be inferred from what • A well­known problem: the unrestricted application of inference rules results in combinatorial explosions • To control the application of inference rules, the search strategy employs operators that expand a given node in the search tree into a set of successor nodes in the search • A specialisation operator maps a conjunction of clauses G onto a set of maximal specialisations of S; A maximal specialisationS of G is a specialisation of G such that • G is not a specialisation of S • There is no specialisation S’ of G such that S is a specialisation of S’ • A generalisation operator maps a conjunction of clauses S onto a set of minimal generalisations of S; A minimal generalisationG of S is a generalisation of S such that • S is not a generalisation of G • There is no generalisation G’ of S such that G is a generalisation of G’

  37. θ-subsumption • A clause c1θ-subsumes a clause c2 if and only if there exists a substitution θsuch that c1θ in c2 • c1 is a generalisation of c2 (and c2 a specialisation of c1) under θ­subsumption • Clauses are seen as sets of (positive and negative) literals • The θ­subsumption inductive inference rule is: • For example father(X,Y) ← parent(X,Y), male(X) θ­subsumes father(jef,paul) ← parent(jef,paul), parent(jef,ann), male(jef), female(ann) with θ = {X = jef, Y = ann}

  38. Some properties of θ-subsumption • Implication: If c1 θ-subsumes c2 then c1 |= c2 • The opposite does not hold for self­recursive clauses: let c1 = p(f(X)) ← p(X); c2 = p(f(f(Y ))) ← p(Y ); c1 |= c2 but c1 does not θ­subsume c2 • Deduction using θ­subsumption is not equivalent to implication among clauses • Infinite Descending Chains: There exist infinite descending chains, e.g. h(X1,X2) ← h(X1,X2) ← p(X1,X2) h(X1,X2) ← p(X1,X2),p(X2,X3) h(X1,X2) ← p(X1,X2),p(X2,X3),p(X3,X4) ... This series is bounded from below by h(X,X) ← p(X,X)

  39. Some properties of θ-subsumption (2) • Equivalence: there exist different clauses that are equivalent under θ­subsumption • E.g. parent(X,Y) ← mother(X,Y), mother(X,Z) θ­subsumes parent(X,Y) ← mother(X,Y) and vice versa • Reduction: for equivalent classes of clauses, there is a unique representative (up to variable renamings) of each clause (reduced clause) • The reduced clause r of a clause c is a minimal subset of literals of c such that r is equivalent to c • Lattice: the set of reduced clauses form a lattice, i.e. any two clauses have unique lub (the least general generalisation - lgg) and any two clauses have a unique glb

  40. Inverting resolution • Four rules of inverse resolution • Lower­case letters are atoms and upper­case letters are conjunctions of atoms

  41. Absorption and Identification • Both Absorption and Identification invert a single resolution step • This is shown diagrammatically as a V with the two premises on the base and one of the arms • The new clause in the conclusion is the clause found on the other arm of the V • Absorption and Identification were called collectively V­operators

  42. Inter­ and Intra­construction • The rules of Inter­ and Intra­construction introduce a new predicate symbol • The new predicate can then be generalised using a V­operator; diagrammatically the construction operators can be shown as two linked V's, or a W, each respresenting a resolution • The premises are placed at the two bases of the W and the three conclusions at the top of the W • One of the clauses is shared in both resolutions. • Intra­ and Inter­construction are collectively called W­operators

  43. V and W operators • The V and W operators have most specific forms: • In this form the V­operators realise both generalisation and specialisation since the conclusions entail the premises • Use of most specific operators is usually implemented by having a two stage operation • In the first phase, inverse resolution operators are applied to examples • In the second phase clauses are reduced by generalisation through the θ­subsumption lattice

  44. Inverting resolution - Matching subclauses • Just as resolution requires unification to match terms, inverse resolution operators require a matching operation • In matching operation all clauses, including the examples are flattened • This involves introducing a new (n+1)­ary predicate for every n­ary function symbol, e.g. the clause member(a,[a,b]) ← becomes member(U,V ) ← a(U), dot(V,U,X), dot(X,Y,Z),b(Y),nil(Z). • Each new predicate symbol is then separately defined, e.g. dot([X|Y ],X,Y) ← • After flattening the problem of matching clauses when applying the inverse resolution operators reduces to one­sided matching of clause bodies

  45. Agenda • Motivation • Technical Solution • Model Theory of ILP • A Generic ILP Algorithm • Proof Theory of ILP • ILP Systems and Applications • Illustration by a Larger Example • Summary and Conclusions

  46. Characteristics of ILP systems • Incremental/non­incremental: describes the way the evidence E (examples) is obtained • In non­incremental or empirical ILP, the evidence is given at the start and not changed afterwards • Non­incremental systems search typically either specific­to­general or general­to­specific. • In incremental ILP, the examples are input one by one by the user, in a piecewise fashion. • Incremental systems usually employ a mixture of specific­to­general and general­to­specific strategies as they may need to correct earlier induced hypotheses • Interactive/ Non­interactive • In interactive ILP, the learner is allowed to pose questions to an oracle (i.e. the user) about the intended interpretation • Usually these questions query the user for the intended interpretation of an example or a clause. • The answers to the queries allow to prune large parts of the search space • Most systems are noninteractive

  47. Characteristics of ILP systems (2) • Single/Multiple Predicate Learning/Theory Revision • Suppose P(F) represent the predicate symbols found in formula F • In single predicate learning from examples, the evidence E is composed of examples for one predicate only, i.e. P(E) is a singleton • In multiple predicate learning, P(E) is not restricted as the aim is to learn a set of possibly interrelated predicate definitions • Theory revision is usually a form of incremental multiple predicate learning, where one starts from an initial approximation of the theory • Numerical Data • With a small number of examples it is hard to get enough examples in which the prediction is an exact number • Instead rules should predict an interval

  48. Application Areas – Drug design • The majority of pharmaceutical R&D is based on finding slightly improved variants of patented active drugs; this involves laboratories of chemists synthesising and testing hundreds of compounds almost at random • The ability to automatically discover the chemical properties which affect the activity of drugs could provide a great reduction in pharmaceutical R&D costs (the average cost of developing a single new drug is $230 million) • ILP techniques are capable of constructing rules which predict the activity of untried drugs • Rules are constructed from examples of drugs with known medicinal activity • The accuracy of the rules was found to be higher than for traditional statistical methods • More importantly the easily understandable rules can provide key insights, allowing considerable reductions in the numbers of compounds that need to be tested.

  49. Application Areas – Protein primary­secondary shape prediction • Predicting the three­dimensional shape of proteins from their amino acid sequence is widely believed to be one of the hardest unsolved problems in molecular biology • It is also of considerable interest to pharmaceutical companies since shape generally determines the function of a protein. • ILP techniques have recently had considerable success within this area • Over the last 20 years many attempts have been made to apply methods ranging from statistical regression to decision tree and neural net learning to this problem • Published accuracy results for the general prediction problem have ranged between 50 and 60 %, very close to random prediction. • With ILP it was found that the ability to make use of biological background knowledge together with the ability to describe structural relations boosted the predictivity for a restricted sub­problem from around 70% to around 80% on an independently chosen test set.

  50. Application Areas – Satellite diagnosis • ILP techniques have been applied to problems within the Aerospace industry • In this case a complete and correct set of rules for diagnosing power supply failures was developed by generating examples from a qualitative model of the power sub­system of an existing satellite • The resulting rules are guaranteed complete and correct for all single faults since all examples of these were generated from the original model • Rules were described using a simple temporal formalism in which each predicate had an associated time variable

More Related