1 / 25

Solving Failing Queries *)

Solving Failing Queries *). Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu. Failing Query Problems. Problem 1 . Given S(A) with hierarchical attributes and query q(B) that returns an empty answer, how can one relax query’s constraints

kali
Télécharger la présentation

Solving Failing Queries *)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Solving Failing Queries*) Zbigniew W. Ras University of North Carolina Charlotte, N.C. 28223, USA ras@uncc.edu

  2. Failing Query Problems Problem 1. Given S(A) with hierarchical attributes and query q(B) that returns an empty answer, how can one relax query’s constraints so that it returns a non-empty set of tuples. Assumption: S(A) – information system based on attributes from A, q(B) – query based on attributes from B. Query q(B) is not local for system S(A), if [B A]. Problem 2. Given S(A), which represents one of the sites of a distributed autonomous information system, and not local query q(B) submitted to S(A), where A  B  , how to modifyq(B) so it can be answered.

  3. Failing Query Problem 1 age salary young middle-aged old low medium high 18 … 29 30 … 60 61 … 80 10k…40k 50k 60k 70k 80k…100k Example of a query: (age, 18)  (salary, 40k) Possible relaxations: (age, young)  (salary, 40k) (age, 18)  (salary, low) (age, young)  (salary, low) Problem 1. [Cooperative Query Answering] [Papers by: Minker, Chu, Gaasterland, Demolombe, Muslea] Preference for relaxation: [1 - age, 2 - salary]

  4. Failing Query Problem 1 q = a[1,2]*c1 submitted to S1 fails (no objects in S1 satisfying q) Solution: q can be generalized by QAS to q1 = a1*c1, which is matching objects x3 and x5 in S1. Question: Which of these two objects (x3 or x5) is closer to q? Information System S1 Attribute a is hierarchical of a structure in Lisp-like notation a(a1( a[1,1], a[1,2]), a2( a[2,1],…))

  5. Failing Query Problem 1 q = a[1,2]*c1 submitted to S1 fails (no objects in S1 satisfying q) Question: Which of these two objects (x3 or x5) is closer to q? Let k  m. Then, the distance: δa(a[i(1), i(2),…, i(k)], a[j(1), j(2),…, j(m)]) = if [i(1) = j(1)  … i(n) = j(n)  [ n = k  m  i(n+1)  j(n+1)]], then 1/2n else 0 δ(xi,xj) = δa+ δb+ δc+ δe Information System S1 δ(q, x3) = ½+1+1+1= 3½ δ(q, x5) = ½+1+1+1 = 3½ Result: both are OK

  6. Failing Query Problem 1 [Muslea, KDD’04] On-line, query-guided algorithm for relaxing failing DNF queries Example. A = {Price, CPU, Display, Weight}. Failing query q(A) = [Price$2000][CPU2.5GHz][Display17’’][Weight3lbs]. Select randomly chosen small subset of target DB to discover implicit relationships between values of attributes used in query. Discovered Rules: r1= [[Price$2900][Display18’’][Weight4lbs] [CPU2.5GHz]]. r2 = [[Price$3500] [CPU2.5GHz]]. r3 = ……. Nearest-neighbor technique is used to identify which rule is most similar to failing query. Assume that r1 is such a rule. Relaxed query: [Price$2900][CPU2.5GHz][Display17’’][Weight4lbs].

  7. Failing Query Problem 2 Problem 2. [Collaborative Query Answering] [Papers: Ras, Zemankova, Stolfo, Maitan, Zytkow, Dardzinska] Example of a non-local query Database: Flights(airline; departure time; arrival time; departure airport; arrival airport). select * from Flights where airline = "Delta” departure time = "morning" departure airport = "Charlotte" aircraft = "Boeing"

  8. Query Processing in Collaborative Systems System S1 System S Find definition of e1 in S1: b1→e1; c1→e1; a[1,2]→e1 q = a1 b1e1 submitted to S fails, because attribute e is not in S (clearly b[1,1] is also b1). q = a1b1e1a1b1 (b1+c1+a[1,2]) = = a1*b1+a1b1c1+a1b1a[1,2] = a1*b1. Objects y3, y4 satisfy the query q.

  9. Query Processing in Collaborative Systems System S System S1 q = a[1,2]b[1,1] submitted to S1 fails because of the granularity of b. Find definition b[1,1] in S: a1c2→b[1,1]. q = a[1,2]b[1,1] a[1,2]a1c2 = a[1,2]c2. Objects x1, x2 satisfy the query q.

  10. Failing Query Problem 2

  11. Failing Query Problem 2 Query Processing in Incomplete IS S = (X,A,V) is a partially incomplete information system of type , if the following two conditions hold: X is a set of objects, A is a set of attributes, Va is a set of values of attribute a, where a  A, and V = {Va : a  A}, • for any x  X, a  A, • if aS(x) is defined, then [aS(x)  Va or aS(x)={(vi,pi): 1  i  m}], • if [aS(x)={(vi,pi): 1  i  m}], then [ i=1…m pi = 1 and (i)(pi   )] • Also, if [aS(x) = v, then the value v has the same meaning as {(v,1)}]

  12. Incomplete Information System X a b c d e x1 x2 x3 x4 x5 x6 x7 x8 Queries: q1(a,b) = a1* b1 q2(a,b) = a1 + b1 J(a1) = {(x1,1/3), (x3,1),(x5,2/3)} J(b1) = {(x1,2/3),(x2,1/3),(x4,1/2), (x5,1),(x7,1/4)} What about J(a1* b1) = J(a1)  J(b1), J(a1 + b1) = J(a1)  J(b1) ?

  13. Interpretations for  and  Assume that: J(a1) = {(xi, pi): i  K} and J(b1) = {(xi, qi): i  K}. Interpretation T0 J(a1) 0 J(b1) as {(xi, S1(pi, qi): i  K}, where S1(pi, qi) = [if max(pi, qi) =1, then min(pi, qi), else 0]. J(a1) 0 J(b1) as {(xi, S2(pi, qi): i  K}, where S2(pi, qi) = [if min(pi, qi)=0, then max(pi, qi), else 1]. Interpretation T1 J(a1) 1 J(b1) as {(xi, max {0, pi+qi-1}): i  K} and J(a1) 1 J(b1) as {(xi, min{1, pi + qi}) : i  K}. Interpretation T2 J(a1) 2 J(b1) = {(xi, [piqi]/[2 - (pi + qi – piqi)]): i  K} and J(a1) 2 J(b1) = {(xi, [pi + qi]/[1 + piqi]) : i  K}.

  14. Interpretations for  and  Interpretation T3 J(a1) 3 J(b1) = {(xi, piqi): i  K} J(a1) 3 J(b1) = {(xi, pi+qi - piqi) : i  K} Interpretation T4 J(a1) 4 J(b1) = {(xi, [piqi]/[pi + qi – piqi]): i  K} J(a1) 4 J(b1) = {(xi, [pi + qi - 2piqi]/[1 – piqi]) : i  K} Fuzzy Interpretation T5 J(a1) 5 J(b1) = {(xi, min {pi, qi}: i  K} J(a1) 5 J(b1) = {(xi, max { pi, qi}) : i  K} Another possible interpretationT J(a1) 3 J(b1) = {(xi, piqi): i K} J(a1) 5 J(b1) = {(xi, max { pi, qi}) : i  K} Interpretations T0, T5, T satisfy property: a  (b  c) = (a  b)  (a  c)

  15. Assume: S1, S2 partially incomplete IS of type λ The same objects are stored in both systems The same attributes are used to describe objects aS1(x) ={(a1i, p1i): 1 ≤ m1}, aS2(x) ={(a2i, p2i): 1 ≤ m2} Failing Query Problem 2 Incomplete IS [S2 is finer than S1]

  16. S2 is finer than S1 if: (xX)(aA)[card(aS1(x)) ≥ card(aS2(x))] (xX)(aA) [card(aS1(x)) = card(aS2(x))]  [i≠j|p2i - p2j| > i≠j|p1i - p1j|] Incomplete Information System

  17. X a b c d e X a b c d e x1 x1 x2 x2 x3 x3 x4 x4 x5 x5 x6 x6 x7 x7 x8 x8 S2 finer than S1 S1 S2

  18. Failing Queries in Collaborative IS • Assume: • Query q = q(B) is submitted to S =(X, A, V), where: • B is a set of all attributes used in q • AB≠ • Attributes in B\(AB) are foreign for S • Two information systems can collaborate if they agree on the ontology of some of their common attributes • The granularity of values of attributes used in a query qmay differ from the granularity of values of the same attributes in S

  19. Failing Queries in Collaborative IS Query q(B) can be processed at site S by discovering definitions of values of attributes from B\(AB) at some of the remote sites for S. With each certain rule discovered at a remote site, a number of additional rules can be also discovered.

  20. Failing Query Problem 2 Example age ( child( ≤ 17), young (18, … , 29), middle-aged (30, … , 60), old (61, … , 80), senile ( ≥ 81) ) salary ( low(0, … , 40K), medium (50K, … , 70K), high (80K, … , 100K), very-high ( >100K) ) ( age, young )  ( salary, 40K ) ( age, young )  ( salary, low ) ( age, N )  ( salary, 40K ) ( age, N )  ( salary, low )

  21. Failing Queries in CollaborativeIS S = (X, A, V) – client site A = {a, b, d, …}, c  A Va={a1, a2, a3}, Vb={b1,1, b1,2, b1,3, b2,1, b2,2, b2,3, b3,1, b3,2, b3,3} Vd={d1, d2, d3} Semantics of hierarchical attributes {a, b, c, d} used by S and systems collaborating with S: • a(a1[a1,1, a1,2, a1,3], a2[a2,1, a2,2, a2,3], a3[a3,1, a3,2, a3,3]) • b(b1[b1,1, b1,2, b1,3], b2[b2,1, b2,2, b2,3], b3[b3,1, b3,2, b3,3]) • c(c1 [c1,1, c1,2, c1,3], c2[c2,1, c2,2, c2,3], c3[c3,1, c3,2, c3,3]) • d(d1[d1,1, d1,2, d1,3], d2[d2,1, d2,2, d2,3], d3[d3,1, d3,2, d3,3])

  22. S: a[i], b[i,j], d[i] Assume: Query q = ai,1* bi* ci,3* di is submitted to S. q = ai,1* [bi,1+ bi,2+bi,3] *ci,3* di= [ai,1* bi,1*ci,3* di] + [ai,1* bi,2 *ci,3* di] + [ai,1* bi,3 *ci,3* di] How to solve queryq ? 1. Generalize ai,1 to ai and ci,3 to c. The query has new form: q1 = ai* [bi,1+ bi,2+bi,3]* di 2.a. Objects matching q1 may satisfy q 2.b. Generalizations decrease the chance that retrieved objects will match query q. Check what values of attributes a and c are implied by di* bi,1,di* bi,2, or di* bi,3at remote sites for S, and if any of these rules have high confidence and support.

  23. S: a[i], b[i,j], d[i] q =ai,1* [bi,1+ bi,2+bi,3] *ci,3* di= [ai,1* bi,1 *ci,3* di ]+[ai,1* bi,2 *ci,3* di]+ [ai,1* bi,3 *ci,3* di] How to solve queryq ? 1. Generalize ai,1 to ai and ci,3 to c. The query has new form: q1 = ai* bi* di = [ai* bi,1* di]+ [ai* bi,2* di ] + [ai* bi,3* di ] 2. Check what values of attributes a and c are implied by di* bi,1,di* bi,2, or di* bi,3at remote sites for S, and if any of these rules have high confidence and support. Assume that: di bi,1ai,2, di bi,2 ci,3are certain rules, extracted at a remote site for S. We get q[ai,1* bi,2*di] + [ai,1*bi,3*ci,3 * di] local non-local

  24. Failing Query Problem 2 q=q(a[3,1,3,2], b1, c2) Possible generalization: q1=q1(a3, b1, c2) Rules extracted at remote sites which define any of the values below a[3] will help in solving q. Rules describing values not belonging to {a[3,1], a[3,1,3], a[3,1,3,2]} are used to reduce the size of the query (to remove some conjuncts).

  25. Questions? Thank You

More Related