1 / 73

Deductive databases

Deductive databases. Toon Calders t.calders@tue.nl. Motivation: Deductive DB. Motivation is two-fold: add deductive capabilities to databases; the database contains: facts (intensional relations) rules to generate derived facts (extensional relations) Database is knowledge base

hammer
Télécharger la présentation

Deductive databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deductive databases Toon Calders t.calders@tue.nl

  2. Motivation: Deductive DB Motivation is two-fold: add deductive capabilities to databases; the database contains: facts (intensional relations) rules to generate derived facts (extensional relations) Database is knowledge base Extend the querying datalog allows for recursion

  3. Motivation: Deductive DB Datalog as engine of deductive databases similarities with Prolog has facts and rules rules define -possibly recursive- views Semantics not always clear safety negation recursion

  4. Outline Syntax of the Datalog language Semantics of a Datalog program Relational algebra = safe Datalog with negation and without recursion Optimization techniques Conclusions

  5. Syntax of Datalog Datalog query/program: facts  traditional relational tables rules  define intensional views Rules if-then rules can contain recursion can contain negations Semantics of program can be ambiguous

  6. Syntax of Datalog Example father(X,Y) :- person(X,m), parent(X,Y). grandson(X,Y) :- parent(Y,Z), parent(Z,X), person(X,m). hbrothers(X,Y) :- person(X,m), person(Y,m), parent(Z,X), parent(Z,Y). person parent Name Sex Parent Child ingrid f ingrid toon alex m alex toon rita f rita an jan m jan an toon m an bernd an f an mattijs bernd m toon bernd mattijs m toon mattijs

  7. Syntax of Datalog Variables: X, Y Constants: m, f, rita, … Positive literal: p(t1,…,tn) p is the name of a relation (EDB or IDB) t1, …, tn constants or variables Negative literal: not p(t1, …, tn) Rule: h :- l1, …, ln h positive literal, l1, …, ln literals In Datalog: Correct negation ( In contrast to Prolog’s negation by failure )

  8. Syntax of Datalog Rule can be recursive Arithmetic operations considered as special predicates A<B : smaller(A,B) A+B=C : plus(A,B,C)

  9. Outline Syntax of the Datalog language Semantics of a Datalog program non-recursive recursive datalog aggregation Relational algebra = safe Datalog with negation and without recursion Optimization techniques Conclusions

  10. Semantics of Non-Recursive Datalog Programs Ground instantiation of a rule h :- l1, …, ln : replace every variable in the rule by a constant Example: father(X,Y) :- person(X,m), parent(X,Y) instantiation: father(toon,an) :- person(toon,m), parent(toon,an).

  11. Semantics of Non-Recursive Datalog Programs Let I be a set of facts The body of a rule instantiation R’ is satisfied by I if: every positive literal in the body of R’ is in I no negative literal in the body of R’ is in I Example: person(toon,m), parent(toon,an) not satisfied by the facts given before

  12. Semantics of Non-Recursive Datalog Programs Let I be a set of facts R is a rule h :- l1, …, ln Infer(R,I) = { h’ : h’ :- l1’, …, ln’ is a ground instantiation of R l1’ … ln’ is satisfied by I } R={R1, …, Rn}Infer(R,I) = Infer(R1,I)  …  Infer(Rn,I)

  13. Semantics of Non-Recursive Datalog Programs A rule h :- l1, …, ln is in layer 1: l1, …, ln only involve extensional predicates A rule h :- l1, …, ln is in layer i for all 0<j<i, it is not in layer j l1, …, ln only involve predicates that are extensional and in the layers 1, …, i-1

  14. Semantics of Non-Recursive Datalog Programs Let I0 be the facts in a datalog program Let R1 be the rules at layer 1 … Let Rn be the rules at layer n I1 = I0 Infer(R1, I0) I2 = I1 Infer(R2, I1) … In = In-1 Infer(Rn, In-1)

  15. Semantics of Non-Recursive Datalog Programs Example: person parent Name Sex Parent Child ingrid f ingrid toon alex m alex toon rita f rita an jan m jan an toon m an bernd an f an mattijs bernd m toon bernd mattijs m toon mattijs father(X,Y) :- person(X,m), parent(X,Y). grandfather(X,Y) :- father(X,Y), parent(Y,Z). hbrothers(X,Y) :- person(X,m), person(Y,m), parent(Z,X), parent(Z,Y).

  16. Semantics of Non-Recursive Datalog Programs Example: person parent Name Sex Parent Child ingrid f ingrid toon alex m alex toon rita f rita an jan m jan an toon m an bernd an f an mattijs bernd m toon bernd mattijs m toon mattijs Stratum 0 father(X,Y) :- person(X,m), parent(X,Y). grandfather(X,Y) :- father(X,Y), parent(Y,Z). hbrothers(X,Y) :- person(X,m), person(Y,m), parent(Z,X), parent(Z,Y).

  17. Semantics of Non-Recursive Datalog Programs Example: person parent Name Sex Parent Child ingrid f ingrid toon alex m alex toon rita f rita an jan m jan an toon m an bernd an f an mattijs bernd m toon bernd mattijs m toon mattijs Stratum 1 father alex toon jan an toon bernd toon mattijs hbrothers bernd mattijs mattijs bernd mattijs mattijs bernd bernd Stratum 0 father(X,Y) :- person(X,m), parent(X,Y). grandfather(X,Y) :- father(X,Y), parent(Y,Z). hbrothers(X,Y) :- person(X,m), person(Y,m), parent(Z,X), parent(Z,Y).

  18. Semantics of Non-Recursive Datalog Programs Example: person parent Name Sex Parent Child ingrid f ingrid toon alex m alex toon rita f rita an jan m jan an toon m an bernd an f an mattijs bernd m toon bernd mattijs m toon mattijs Stratum 1 Stratum 2 fathergrandfather alex toon jan mattijs jan an jan bernd toon bernd alex mattijs toon mattijs alex bernd hbrothers bernd mattijs mattijs bernd mattijs mattijs bernd bernd Stratum 0 father(X,Y) :- person(X,m), parent(X,Y). grandfather(X,Y) :- father(X,Y), parent(Y,Z). hbrothers(X,Y) :- person(X,m), person(Y,m), parent(Z,X), parent(Z,Y).

  19. Caveat: Correct Negation Negation in Datalog  Negation in Prolog Prolog negation (Negation by failure): not(p(X)) “is true if we fail to prove p(X)” Datalog negation (Correct negation): not(p(X)) “binds X to a value such that p(X) does not hold.”

  20. Caveat: Correct Negation Example: father(a,b). person(a). person(b). nfather(X) :- person(X), not( father (X,Y) ), person(Y). Datalog: ? nfather(X)  { (a), (b) } Prolog: ? nfather(X)  { (b) } ? person(a), not(father(a,a)), person(a)  yes

  21. Caveat: Correct Negation Prolog: Order of the clauses is important nfather(X) :- person(X), not( father (X,Y) ), person(Y). versus nfather(X) :- person(X), person(Y), not( father (X,Y) ). Order of the rules is important Datalog: Order not important More declarative

  22. Caveat: Correct Negation Difference is not fundamental: Prolog: nfather(X) :- person(X), not( father (X,Y) ).  Datalog: nfather(X) :- person(X), not(father_of_someone(X)). father_of_someone(X) :- father (X,Y).

  23. Caveat: Correct Negation Difference is not fundamental: Many systems that claim to implement Datalog, actually implement negation by failure. Debating on whether or not this is correct is pointless; both perspectives are useful Check on beforehand how an engine implements negation … Throughout the course, in all exercises, in the exam, …, we assume correct negation.

  24. Safety A rule can make no sense if variables appear in funny ways Examples: S(x) :- R(y) S(x) :- not R(x) S(x) :- R(y), x<y In each of these cases the result is infinite even if the relation R is finite

  25. Safety • Even when not leading to infinite relations, such Datalog Programs can be domain-dependent. Example: s(a,b). s(a,a). r(a). r(b). t(X) :- not(s(X,Y)), r(X). If domain is {a,b}: only t(b) holds. If domain is {a,b,c} not only t(b), but also t(a) holds ( Ground instantiation t(a) :- not(s(a,c)), r(a). )

  26. Safety Therefore, we will only consider rules that are safe. A rule h :- l1, …, ln is safe if: every variable in the head of the rule: also in a non-arithmetic positive literale in body every variable in a negative literal of the body: also in some positive literal of the body

  27. Model-Theoretic Semantics A model M of a Datalog program is: An instatiation of all intensional relations in the program That satisfies all rules in the program If the body of a ground instantiation of a rule holds in M, also the head must hold Some models are special …

  28. Model-Theoretic Semantics father(a,b). person(X) :- father(X,Y). person(Y) :- father(X,Y). M1: father person a b a b M2: father person a b a b a b a a

  29. Model-Theoretic Semantics A model is minimal if « we cannot remove tuples » M1: father person a b a b M2: father person a b a b a b a a Minimal Not Minimal

  30. Model-Theoretic Semantics For non-recursive, safe datalog programs semantics is well defined The model = all facts that can be derived from the program Closed-World Assumption: if a fact cannot be derived from the database, then it is nottrue Is a minimal model

  31. Model-Theoretic Semantics Minimal model is, however, not necessarily unique Example: r(a). t(X) :- r(X), not s(X). minimal models: M1 M2 r s t r s t a a a a

  32. Outline Syntax of the Datalog language Semantics of a Datalog program non-recursive recursive datalog aggregation Relational algebra = safe Datalog with negation and without recursion Optimization techniques Conclusions

  33. Semantics of Recursive Datalog Programs g(a,b). g(b,c). g(a,d). reach(X,X) :- g(X,Y). reach(Y,Y) :- g(X,Y) reach(X,Y) :- g(X,Y). reach(X,Z) :- reach(X,Y), reach(Y,Z). Fixpoint of a set of rules R, starting with set of facts I: repeat Old_I := II := I  infer(R,I) until I = Old_I Always termination (inflationary fixpoint)

  34. Semantics of Recursive Datalog Programs g(a,b). g(b,c). g(a,d). reach(X,X) :- g(X,Y). reach(Y,Y) :- g(X,Y) reach(X,Y) :- g(X,Y). reach(X,Z) :- reach(X,Y), reach(Y,Z). Step 0: reach = { } Step 1: reach = { (a,a), (b,b), (c,c), (d,d), (a,b), (b,c), (a,d) } Step 2: reach = {(a,a), (b,b), (c,c), (d,d), (a,b), (b,c), (a,d), (a,c) } Step 3: reach = {(a,a), (b,b), (c,c), (d,d), (a,b), (b,c), (a,d), (a,c) } STOP

  35. Semantics of Recursive Datalog Programs Datalog without negation: Always a unique minimal model. Semantics of recursive datalog with negation is less clear. Example: T(a). R(X) :- T(X), not S(X). S(X) :- T(X), not R(X). What about R(a)? S(a)?

  36. Semantics of Recursive Datalog Programs For some classes of Datalog queries with negation still a natural semantics can be defied Important class: stratified programs T depends on S if some rule with T in the head contains S or (recursively) some predicate that depends on S, in the body. Stratified program: If T depends on (notS), then S cannot depend on T or (notT).

  37. Semantics of Recursive Datalog Programs The program: T(a). R(X) :- T(X), not S(X). S(X) :- T(X), not R(X). is not stratified; R depends negatively on S S depends negatively on R R T S

  38. Semantics of Recursive Datalog Programs g(a,b). g(b,c). g(a,d). reach(X,X) :- g(X,Y). reach(Y,Y) :- g(X,Y). reach(X,Y) :- g(X,Y). reach(X,Z) :- reach(X,Y), reach(Y,Z). node(X) :- g(X,Y). node(Y) :- g(X,Y). unreach(X,Y) :- node(X), node(Y), not reach(X,Y). g reach node unreach

  39. Semantics of Recursive Datalog Programs If a program is stratified, the tables in the program can be partitioned into strata: Stratum 0: All database tables. Stratum I: Tables defined in terms of tables in Stratum I and lower strata. If T depends on (not S), S is in lower stratum than T.

  40. Semantics of Recursive Datalog Programs g(a,b). g(b,c). g(a,d). reach(X,X) :- g(X,Y). reach(Y,Y) :- g(X,Y). reach(X,Y) :- g(X,Y). reach(X,Z) :- reach(X,Y), reach(Y,Z). node(X) :- g(X,Y). node(Y) :- g(X,Y). unreach(X,Y) :- node(X), node(Y), not reach(X,Y). 0 g reach node unreach 1 2

  41. Semantics of Recursive Datalog Programs Semantics of a stratified program given by: First, compute the least fixpoint of all tables in Stratum 1. (Stratum 0 tables are fixed.) Then, compute the least fixpoint of tables in Stratum 2; then the lfp of tables in Stratum 3, and so on, stratum-by-stratum.

  42. Semantics of Recursive Datalog Programs Fixpoint of a set of rules R, starting with set of facts I: repeat Old_I := II := I  infer(R,I) until I = Old_I Fixpoint within one stratum always terminates Due to monotonicity within the strata Only positive dependence between tables in stratum l. Due to finite program, number of strata isfinite as well

  43. Semantics of Recursive Datalog Programs Stratum 0: g(a,b). g(b,c). g(a,d). Stratum 1: node(a), node(b), node(c), node(d),reach(a,a), reach(b,b), reach(c,c), reach(d,d), reach(a,b), reach(b,c), … Stratum 2: unreach(b,a), unreach(c,a), …

  44. Outline Syntax of the Datalog language Semantics of a Datalog program non-recursive recursive datalog aggregation Relational algebra = safe Datalog with negation and without recursion Optimization techniques Conclusions

  45. Aggregate Operators The < … > notation in the head indicates grouping; the remaining arguments (X, in this example) are the GROUP BY fields. In order to apply such a rule, must have all of relation g available. Stratification with respect to use of < … > is similar to negation. Degree(X, SUM(<Y>)) :- g(X,Y).

  46. Aggregate Operators bi(X,Y) :- g(X,Y). g bi(Y,X) :- g(X,Y). Degree(X, SUM(<Y>)) :- bi(X,Y). bi degree

  47. Aggregate Operators bi(X,Y) :- g(X,Y). g bi(Y,X) :- g(X,Y). Degree(X, SUM(<Y>)) :- bi(X,Y). bi degree 0 1 2

  48. Aggregate Operators bi(X,Y) :- g(X,Y). g bi(Y,X) :- g(X,Y). Degree(X, SUM(<Y>)) :- bi(X,Y). bi degree Compute stratum by stratum Assume strata 1  k fixed when computing k+1 0 1 2

  49. Aggregate Operators r(a,b). r(a,c). s(a,d). t(X,SUM(<Y>)) :- r(X,Y). r(X,Y) :- t(X,Z), Z=2, s(X,Y).

  50. Aggregate Operators r(a,b). r(a,c). s(a,d). t t(X,SUM(<Y>)) :- r(X,Y). r(X,Y) :- t(X,Z), Z=2, s(X,Y). r s

More Related