Download Presentation
## Learning Sets of Rules

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Learning Sets of Rules**• Introduction • Sequential Covering Algorithms • First Order Rules • Induction as Inverted Deduction • Inverting Resolution • Summary**First Order Rules**• We now consider rules that have variables (first order rules). • More specifically we will learn first-order Horn theories. • Learning first order rules is also known as inductive logic programming. First Order Horn Clauses. Rules that have one or more preconditions and one single consequent. Predicates may have variables.**Example**Consider the following example: We wish to learn the relation: Daughter(x,y) which means x is the daughter of y. We are given the following example: Name1 Mother1 Father1 Male1 Female1 Sharon Louise Bob False True Name2 Mother2 Father2 Male2 Femal2 Daughter1,2 Bob Nora Victor True False True**Example**Now, if we have many of these examples then we could learn the following relation: If Father (x,y) and Female(y) then Daughter(x,y) This is more powerful than the propositional approach: If (Father1=Bob) and (Name2 = Bob) and (Femal1 = True) Then (Daughter1,2) = True The advantage lies on our ability to express relations among attribute values.**Terminology**Expressions contain the following: Constants: Bob, Louise, etc. Variables: x,y, etc. Predicates: Daughter, Father Functions: age**Terminology**Term: a constant, variable, or function applied to a term (Bob, x, age(Bob)) Literal: a predicate or its negation (Married(Bob,Louise)) Clause: a disjunction of literals. Horn Clause: a clause containing at most one positive literal: H V ~L1 V … V ~Ln**Terminology**The following Horn clause H V ~L1 V … V ~Ln is equivalent to H (L1 ^ … ^ Ln ) Where H is the consequent and the conjunction of literals (L1 ^ … ^ Ln) is the body or antecedent. Question: how do we learn sets of first order rules?**Learning Sets of First Order Rules**• A popular algorithm is FOIL (Quinlan 1990). • The method is very similar to sequential covering. • FOIL(target-predicate, predicates, examples) • Pos Those examples where target-predicate is true • Neg Those examples where target-predicate is false • Learned-Rules {} • While Pos do • Learn a new rule NewRule • Learned-Rules Learned-Rules + NewRule • Pos Pos – {members of Pos covered by NewRule} • Return Learned-Rules**Learning New Rules**• NewRule • NewRule If {} then target-predicate • CoveredNeg Neg • While CoveredNeg do • candidate-literals new literals for NewRule • BestLiteral argmax Foil_Gain L in • candidate-literals (L,NewRule) • c. Add BestLiteral to preconditions of NewRule • d. CoveredNeg subset of CoveredNeg satisfied by NewRule • End While**Considerations**• FOIL learns rules that predict when the target is true; sequential covering learns both rules that are true and false. • FOIL performs a hill-climbing search; sequential covering performs a beam search. • FOIL rules are more expressive than Horn Clauses, why? because the precondition can have negated literals.**Considerations**• Foil does a specific to general search while looking for one rule and forming the disjunction of more rules. • Foil does a general to specific search on each rule by starting with a NULL precondition and adding more literals (hill-climbing).**Generating Specializations**• Assume our current rule is as follows: • P(x1, x2, …, xk) L1 … Ln • Where each Li is a literal and P(x1, x2, …, xk) is the head or • postcondition. FOIL considers new literals Ln+1 to add to the • rule such as • Predicates: Q(v1,…,vr) where Q is a predicate and vi is an existing • or new variable (at least one vi must be already present). • Functions: Equal(xj,xk) where xj and xk are present in the rule. • Negated literals.**Example**We wish to learn the target predicate GrandDaughter(x,y) Our predicates are Father(x,y) and Female(x) Our constants are Victor, Sharon, Bob, and Tom. We start with the most general rule: GrandDaughter(x,y) **Example**Possible literals we could add: Equal(x,y), Female(x), Female(y), Father(x,y) … and their negations Assume we find the best choice is GrandDaughter(x,y) Father(y,z)**Example**We add the best candidate literal and continue adding literals until we generate a rule like the following: GrandDaughter(x,y) Father(y,z) ^ Father(z,x) ^ Female(x) At this point we remove all positive examples covered by the rule and begin the search for a new rule.**Choosing the Best Literal**Consider the target predicate: GrandDaughter(x,y) Consider all bindings. Example {x/Bob, y/Sharon}**Choosing the Best Literal**Now compare rule R before adding a literal and after adding a literal. Foil_Gain(L,R) = t [ log2 (p1 / p1 + n1) - log2 (p0 / p0 + n0) ] t: positive bindings of rule R still covered after adding literal L po: positive bindings of rule R no: negative bindings of rule R p1: positive bindings of rule R’ no: negative bindings of rule R’**Learning Recursive Rule Sets**What happens if we include the target predicate in the list of possible predicates? Then FOIL could consider it too as a candidate literal. Example: If Parent(x,y) then Ancestor(x,y) If Parent(x,z) and Ancestor(z,y) then Ancestor(x,y)**Induction as Inverted Deduction**What is the difference between induction and deduction? Induction: Inference from specific to general. Deduction: Inference from general to specific. Induction can be cast as a deduction problem as follows: We wish to learn a target function f(x) that deductively follows from the hypothesis h, instance xi, and background knowledge B: B ^ h ^ xi |-- f(xi)**Example**Learn target Child(u,v) meaning u is the child of v. Positive example: Child(Bob, Sharon) Given instance: Male(Bob), Female(Sharon), Father(Sharon,Bob) Background knowledge: Parent(u,v) Father(u,v) Two hypotheses satisfying the constraint are: h1: Child(u,v) Father(v,u) h2: Child(u,v) Parent(v,u) h2 needs background knowledge and illustrates the problem of constructive induction.**Inverting Resolution**Automated deduction uses the resolution rule (Robinson 1965). L: propositional literal P,R: propositional clauses The resolution rule is as follows: P V L ~L V R ___________ P V R**Example**C1: PassExam V ~KnowMaterial C2: KnowMaterial V ~Study C: PassExam V ~Study If you know C1 and C how can you induce C2?**Inverting Resolution**Suppose we have two clauses: C1: B V D C2: ?? C: A V B 1. A literal in C but not in C1 must be present in C2: A 2. A literal in C1 but not in C must be the literal removed by resolution: ~D Hence C2: A V ~D. There are other solutions, what are those solutions? Inverse resolution is not deterministic.**Summary**• Sequential covering learns a disjunctive set of rules by first learning a rule and then removing those positive examples covered by the rule, continuing the process until all positive examples are covered. • Examples of sequential covering are AQ and CN2 family of programs.**Summary**• Learning first-order Horn clauses is the problem of inductive logic programming. • FOIL applies sequential covering to first order rules. • Induction can be seen as the inverse of deduction; programs exist to do this form of induction.