1 / 56

Chase Methods based on Knowledge Discovery

Chase Methods based on Knowledge Discovery. Agnieszka Dardzinska & Zbigniew W. Ras agnadar@wp.pl & ras@uncc.edu. Algorithm Chase. GIVEN:  Incomplete Information System ( IIS ) Constraints (functional dependencies,..) which IIS satisfies.

lerato
Télécharger la présentation

Chase Methods based on Knowledge Discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chase Methods based on Knowledge Discovery Agnieszka Dardzinska & Zbigniew W. Ras agnadar@wp.pl & ras@uncc.edu

  2. Algorithm Chase GIVEN:  Incomplete Information System (IIS) • Constraints (functional dependencies,..) which IIS satisfies [ Dept  Chair, Chair  Dept  Faculty-Name, … Dept(x1) =Dept(x2) ]

  3. Tableau System for IIS– information system with null values replaced by variables

  4. Variables in Tableaux System • distinguished variables, one for each attribute (if b is an attribute of interest, then vb is the corresponding distinguished variable) • nondistinguished variables (there are countably many of them: n1, n2, n3, ….)

  5. Functional Dependencies: [Department → Chair] [Department *Chair → Faculty Name]

  6. Algorithm Chase Input: tableaux system S and set of functional dependencies F Output: tableaux system CHASEF(S) Begin S1:=S; while there are t1, t2  S1 and (B  b)  F such that t1[B]= t2[B] and t1[b] < t2[b] do change all the occurrences of the value t2[b] in S1 to t1[b] CHASEF(S):=S1 End

  7. Algorithm Chase Input: tableaux system S and set of functional dependencies F Output: tableaux system CHASEF(S) Begin S1:=S; while there are t1, t2  S1 and (B  b)  F such that t1[B]= t2[B] and t1[b] < t2[b] do change all the occurrences of the value t2[b] in S1 to t1[b] CHASEF(S):=S1 End The algorithm always terminates if applied to a finite tableaux system. If one execution of the algorithm generates a tableaux system satisfying F, then every execution of the algorithm generates the same tableaux system.

  8. Algorithm Chase 1

  9. Chase supported by rules extracted from IIS (Chase 1) • Chase 1 identifies all incomplete attributes (their values are called concepts) in IS . 2. Main Algorithm - Extraction of rules from IS describing these concepts, - Null values in IS are replaced by values suggested by these rules. 3. These two steps are repeated till fixpoint is reached.

  10. Example (Chase1) X = {x1, x2, x3, x4, x5, x6, x7, x8, x9, x10} A = {b, c, d, e, f, g} Attribute b e2b1(support 2), c1f1 b1(support 2), g2 b2(support 2), c3 b3(support 1), c2 b2(support 2), g3d2 b2(support 1), e3d3 b3(support 1), f2d2 b2(support 1).

  11. Example (Chase1) Attribute b Two null values in S: b(x6), b(x8) b(x6): e2b1(support 2), c1f1 b1(support 2), g2 b2(support 2), c3 b3(support 1), c2 b2(support 2), g3d2 b2(support 1), e3d3 b3(support 1), f2d2 b2(support 1).

  12. Example (Chase1) Attribute b Two null values in S: b(x6), b(x8) b(x6): e2b1(support 2), c1f1 b1(support 2), g2 b2(support 2), c3 b3(support 1), c2 b2(support 2), g3d2 b2(support 1), e3d3 b3(support 1), f2d2 b2(support 1).

  13. Example (Chase1) Attribute b Two null values in S: b(x6), b(x8) b(x8): e2b1(support 2), c1f1 b1(support 2), g2 b2(support 2), c3 b3(support 1), c2 b2(support 2), g3d2 b2(support 1), e3d3 b3(support 1), f2d2 b2(support 1). b(x6) = b2

  14. Example (Chase1) Two null values in S: c(x7), c(x8). c(x7): b1c1(support 2), e2 c1(support 1), f4 c1(support 1), g1 c1(support 2), b2 d2 c2(support 1), b2e1 c2(support 1), b2f2 c2(support 1), b2g3 c2(support 1), d2e1 c2(support 1), d2g3 c2(support 1). b(x6) = b2

  15. Example (Chase1) Two null values in S: c(x7), c(x8). c(x7): b1c1(support 2), e2 c1(support 1), f4 c1(support 1), g1 c1(support 2), b2 d2 c2(support 1), b2e1 c2(support 1), b2f2 c2(support 1), b2g3 c2(support 1), d2e1 c2(support 1), d2g3 c2(support 1). b(x6) = b2

  16. Example (Chase1) Two null values in S: c(x7), c(x8). c(x8): b1c1(support 2), e2 c1(support 1), f4 c1(support 1), g1 c1(support 2), b2 d2 c2(support 1), b2e1 c2(support 1), b2f2 c2(support 1), b2g3 c2(support 1), d2e1 c2(support 1), d2g3 c2(support 1). b(x6) = b2 , c(x7) = c1

  17. Algorithm Chase1(S, In(A), L(D)) Input:System S=(X, A, V) Set of incomplete attributes In(A)={a1, a2, …, ak} Set of rules L(D) Output:System Chase1(S) begin j:=1; whilej ≤ kdobegin Sj:=S for allvVajdo while there is xX and rule (t v)L(D) such that xNSj(t) and card(aj(x))≠1 begin a(x):=v; end j:=j+1 end S:={Sj:1 ≤ j ≤ k}, Chase1 (S, In(A), L(D)) end

  18. A2 A5 A6 A7 A1 A3 A4 query q3 2 4 2 25 22 22 2 q3 2 3 8 22 28 23 2 q3 22 2 2 25 12 30 3 q3 2 2 32 27 22 2 14 q1 3 4 4 2 8 4 0 q1 4 9 3 8 5 1 3 q1 14 4 3 12 4 2 3 q1 1 3 17 4 14 4 5 q2 4 2 1 4 13 14 13 q2 4 15 8 13 13 1 3 q2 4 12 15 17 13 2 3 q2 14 27 16 3 4 4 13 A1- “the no. of different attributes used in a query" A2- “the percent of null values in a queried IS" A3- “the no. of objects returned by QAS when IS-complete" A4- “the no. of objects returned by QAS when IS-incomplete" (optimistic interpretation) A5- “the no. of objects returned by QAS based on rule-based chase algorithm" (pessimistic interpretation) A6- “the no. of bad objectsretrieved" A7- “the no. of passes of rule-based chase algorithm" ZOO Database

  19. Rules Discovery from partiallyIncomplete Information Systems

  20. Data (Incomplete) Information SystemS = ( X, A, V ) X - finite set of objects, A - finite set of attributes, - set of their values. Assumption 1. For any , 2. For any

  21. X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Example

  22. X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID for Extracting Rules from partially Incomplete Information System Goal: Describe e in terms of {a,b,c,d}

  23. X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID for Extracting Rules from partially Incomplete Information System Goal: Describe e in terms of {a,b,c,d}

  24. X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID Goal: Describe e in terms of {a,b,c,d} For the values of the decision attribute we have:

  25. X a b c d e x1 x2 • Check the relationship “ ” between values of classification attributes {a,b,c,d} and values of decision attribute e x3 x4 x5 x6 x7 x8 Algorithm ERID Goal: Describe e in terms of {a,b,c,d}.

  26. X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID Goal: Describe e in terms of {a,b,c,d} Let , . We say that: support iff and confidence of the rule are above some threshold values.

  27. X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID Goal: Describe e in terms of {a,b,c,d} Let , . We say that: support iff and confidence of the rule are above some threshold values. How to define support and confidence of a rule ?

  28. Definition of Support and Confidence (by example) To define support and confidence of the rule a1 e3 we compute: Support of the rule: Support of the term a1: Confidence of the rule:

  29. X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Extracting Rules from partially Incomplete Information System (Algorithm ERID(λ1, λ2)) Goal: Describe e in terms of {a,b,c,d} Thresholds (provided by user): Minimal support (λ1 = 1) Minimal confidence (λ2 = ½) - marked negative - marked negative - marked positive

  30. X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID(λ1, λ2) but but but but but but but but but

  31. X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID(λ1, λ2) but but but but but but but but but They all are not marked

  32. X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID(λ1, λ2) and and

  33. X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID(λ1, λ2) and and They all are marked positive.

  34. X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID(λ1, λ2) and and They all are marked positive. They all are marked negative.

  35. X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Algorithm ERID(λ1, λ2) The algorithm continues for terms of length 3, 4, … till all of them have either positive or negative marks. Rules are automatically constructed from relations marked positive.

  36. Algorithm Chase 2 (for Partially IIS)

  37. is defined for any , Algorithm Chase 2 • partially incomplete information system of type λ, if S is incomplete and the following three conditions hold:

  38. Algorithm Chase 2 S1, S2 - partially incomplete, both of type λ and both classifying the same sets of objects (from X) using the same sets of attributes (A) Let and satisfies containment relation Ψ (or Ψ(S1)=S2) if: The pair (S1, S2) We also denote that fact by

  39. System S1 System S2 X a b c d e X a b c d e x1 x1 x2 x2 x3 x3 x4 x4 x5 x5 x6 x6 x7 x7 c2 x8 x8

  40. Assumptions: • - information system of type λ • - set of all pairwise independent rules extracted by ERID from S • NS(t) - standard interpretation of term t in S, meaning that: , for any where for any , we have: • In(A) = {a1, … , ak} - incomplete attributes in S

  41. for all do begin if and is a maximal subset of rules from L(D) such that then if then begin end end pj:= pj +nj; end if /containment relation holds between aj(x), [bj(x)/pj]/ then Algorithm Chase 2(S, In(A), L(D)) .................. ..................

  42. X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 ERID(λ1, λ2) λ1=1, λ2=0.5 Incomplete Information System S of type λ = 0.3

  43. X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 ERID(λ1, λ2) λ1=1, λ2=0.5 Algorithm Chase 2 will try to replace by enew(x1) = {(e1, ), (e2, ), (e3, )}. We will show that Ψ(e(x1)) = enew(x1) (the value e(x1) will be changed by Chase 2). Incomplete Information System S of type λ = 0.3

  44. X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 For x1: we have: Because the confidence assigned to e3 is below the threshold λ, then only two values remain: (e1, 0.48), (e2, 0.97). The value of attribute e assigned to x1 is: {(e1, 0.33), (e2, 0.67)}. Incomplete Information System S of type λ = 0.3

  45. Distributed Chase Algorithm (Chase 3)

  46. Let: - distributed autonomous information systems - information system for any (I - set of sites) - knowledge-base at site • set of (k, i)-rules (constructed at site k and sent to site i)

  47. Strategy for constructing knowledge-base and Algorithm Chase 3 Notation: qS=[a, b, c : d, e] - request by S for definitions of a, b, c with additional information that d, e are complete attributes in S.

  48. Global Ontology S2 S1 r1 r2 qS qS=[a, c, d : b] S KB KBS

  49. Global Ontology S2 S1 r1 r2 qS qS=[a, c, d : b] S r1 r2 KBS

  50. Assumption: Di - granularity level of values of attributes used in rules from Di may differ from the granularity level of values of attribute used in descriptions of objects in Chase 3 algorithm to be applicable to Sihas to be based on rules from Di satisfying the following two conditions: . • attribute value used in the decision part of a rule has the granularity level either equal to or finer than the granularity level of the corresponding attribute in Si . • the granularity level of any attribute used in the classification part of a rule is either equal or softer than the granularity level of the corresponding attribute in Si

More Related