1 / 51

Computing Full Disjunctions

Computing Full Disjunctions. Maximal Answers. How to Avoid Losing Information When Joining Relations. Partial Information is Lost. Dangling tuples disappear when joining relations, because the result has only complete tuples. ⋈. =. Maximal vs. Complete Answers. ⋈. =.

kylia
Télécharger la présentation

Computing Full Disjunctions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing Full Disjunctions

  2. Maximal Answers How to Avoid Losing Information When Joining Relations

  3. Partial Information is Lost • Dangling tuples disappear when joining relations, because the result has only complete tuples ⋈ =

  4. Maximal vs. Complete Answers ⋈ = ⊥ is the NULL value

  5. Obtaining Maximal Answers ⋈ ⋈ =

  6. Obtaining Maximal Answers ⋈ ⋈ =

  7. Obtaining Maximal Answers ⋈ ⋈ =

  8. ⋈ = This tuple is subsumed by the third tuple, i.e., this tuple has less information This tuple is not in the result

  9. ⋈ = This tuple is generated from two tuples that are not connected by a common attribute This tuple is the result of a Cartesian product This tuple is not in the result

  10. The Main Difficulty in Computing Maximal Answers Efficiently • How to generate maximal answers without generating subsumed tuples • Consider two algorithms • The naïve algorithm generates all the answers and then removes subsumed tuples • The second algorithm generates only maximal answers • In the worst case, both have an exponential runtime in the size of the input (i.e., the database and the query)

  11. How to Measure Efficiency? • When the output could be exponential in the size of the input, the runtime should be a function of the combined size of the input and the output • For example, this is the complexity measure of the algorithm for joining acyclic relations [Yannakakis 1981] • This complexity measure can discern that the naïve algorithm is not efficient

  12. Outerjoins

  13. Outerjoins do not generate FDs R3 R1 R2

  14. Outerjoins do not generate FDs R1 R2 R1 R2 R3 R3

  15. Outerjoins do not generate FDs R1 R2 (R1 R2) R3 R3

  16. Outerjoins do not generate FDs (R1 R2) R3 R1 R3 R2

  17. Outerjoins do not generate FDs (R1 R2) R3 R1 R3 R2 ?

  18. Tuple Sets

  19. Tuple Sets • R1,…,Rn are the given relations • A tuple set has at most one tuple from each relation • Let T be a tuple set • JCC(T)denotes that T is • Join consistent (no dangling tuples), and • Connected (no Cartesian product)

  20. An Example of a Tuple Set Let T be the set of the red tuples JCC(T ) is true The red tuples are join consistent & connected

  21. Another Example JCC(T ) is false The red tuples are not join consistent

  22. A Third Example JCC(T ) is true

  23. But if we change the attribute B of the last relation to E, then E T is not connected The two rightmost tuples do not share any attribute with the left tuple

  24. Full Disjunctions

  25. Full Disjunctions • The full disjunctionFD(R1,…, Rn) of R1,…, Rn consists of all T, s.t. T is a maximal JCC tuple set, i.e., Tis • join-consistent, • connected, and • maximal • That is, there is no tuple t ∉T, such that T ⋃{t } is a JCC tuple set

  26. Finding Maximal JCC Tuple Sets Choose a tuple from the first relation Extend it to a maximal JCC tuple set T

  27. Finding other maximal JCC tuple sets without doing too much work Choose a tuple from the first relation Extend it to a maximal JCC tuple set T Choose a tuple tnot in T

  28. The Next Step Remove the other tuplefrom the relation containingt Choose a tuple from the first relation Extend it to a maximal JCC tuple set T Choose a tuple tnot in T Find a maximal subset ofT ⋃{t }that is JCC and containst

  29. Remove the other tuplefrom the relation containingt Remove the tuple that isnot join consistent witht Remove the other tuplefrom the relation containingt Choose a tuple from the first relation Extend it to a maximal JCC tuple set T Choose a tuple tnot in T Find a maximal subset ofT ⋃{t }that is JCC and containst

  30. Remove the tuple that isnot join consistent witht Remove the tuple that isnot join consistent witht Remove the tuple that isdisconnected fromt Choose a tuple from the first relation Extend it to a maximal JCC tuple set T Choose a tuple tnot in T Find a maximal subset ofT ⋃{t }that is JCC and containst

  31. Remove the tuple that isdisconnected fromt Choose a tuple from the first relation Extend it to a maximal JCC tuple set T Choose a tuple tnot in T Find a maximal subset ofT ⋃{t }that is JCC and containst

  32. This is the unique maximal subset ofT⋃{t }that is JCC and contains t Choose a tuple from the first relation This subset can be extended to another maximal JCC tuple set Extend it to a maximal JCC tuple set T Choose a tupletnot in T Find a maximal subset ofT ⋃{t }that is JCC and containst

  33. More Formally …

  34. Extending JCC Tuple Sets Problem: • We are given a set T of tuples that is join-consistent and connected • We would like to generate a maximal set of tuples Tm, such that • Tm is join consistent and connected, and • T ⊆ Tm

  35. Extending JCC Tuple Sets (cont'd) Solution: • A naïve, greedy approach will work! • Start with Tm=T • While there exists a tuple t, such that Tm⋃{t} is join consistent and connected, insert t into Tm • When no tuple can be added to Tm, stop and return Tm

  36. Maximally Contained JCC Tuple Sets Problem: • We are given • a set of tuples T that is join consistent and connected, and • a tuple t (which is not necessarily in T) • We would like to generate a maximal set of tuples Tm, such that • Tm is join consistent and connected, and • t∈Tm⊆ T⋃{t}

  37. Uniqueness of Tm • Tm is join consistent and connected • t∈Tm⊆ T⋃{t} • Proposition: If T1 and T2 satisfy 1 and 2, then so does T1⋃T2 • We conclude that there is a unique maximal set Tmthat satisfies 1 and 2 • Tm is the union of all sets satisfying 1 and 2

  38. FindingTm • Start with Tm=T⋃{t} • Remove from Tm all tuples that are not join consistent with t • Remove from Tm all tuples that are not reachable from t through a path (i.e., leave only the connected component that contains t) • Return Tm

  39. An Example t T

  40. An Example Disagree T t

  41. An Example T t

  42. An Example Tm

  43. The Algorithm

  44. Data Structures • We use two data structures for holding intermediate results: Q: contains tuple sets that need to be printed C: contains all sets that are already printed • Initially, C is empty and Q consists of an arbitrary maximal tuple set T • T can be obtained, for example, by maximally extending the empty tuple set

  45. The Algorithm • While Q is not empty: • Remove an element T from Q • Print T and insert it into C • For each tuple t in the database: • Generate the maximal tuple set Tm, such that t∈Tm⊆ T⋃{t} and JCC(Tm) • Maximally extend Tm • If Tm is neither in Q nor in C, then insert Tm into Q

  46. The Algorithm Runs withPolynomial Delay • The outer loop prints one tuple set of the result in each iteration • The inner loop is repeated for each tuple of the database • Each iteration of the inner loop requires linear time in the size of the database • Testing whether Tm is neither in Q nor in C requires logarithmic time in the size of Q and C, i.e., linear time in the size of the database The delay is quadratic

  47. Correctness of the Algorithm • Clearly, the algorithm prints only maximal JCC tuple sets • Moreover, no tuple set is printed more than once • It remains to show that every maximal JCC tuple set is printed by the algorithm

  48. Proof • Suppose, by way of contradiction, that S is a maximal JCC tuple set that is not printed by the algorithm • Let S' be a maximal tuple set, such that • S'⊆ S • JCC(S') • S' is contained in a tuple set that is printed by the algorithm • Let T be a set that is printed and contains S' • Note that S' is properly contained in S

  49. The Tuple t • Since S and S' are connected and S'⊊S, there exists a tuple t, such that • t∈S\S' and • JCC(S' ⋃{t}) S S'

  50. When T is Printed… • Consider the iteration (of the while loop) when T is removed from Q • Consider the iteration (of the for loop) when t is chosen • The algorithm finds the (unique) maximal tuple set Tm, such that t ∈Tm ⊆ T⋃{t} and JCC(Tm) • Since S'⋃{t} ⊆ T⋃{t} and JCC(S' ⋃{t}), Tm contains S'⋃{t}

More Related