510 likes | 619 Vues
Computing Full Disjunctions. Maximal Answers. How to Avoid Losing Information When Joining Relations. Partial Information is Lost. Dangling tuples disappear when joining relations, because the result has only complete tuples. ⋈. =. Maximal vs. Complete Answers. ⋈. =.
E N D
Maximal Answers How to Avoid Losing Information When Joining Relations
Partial Information is Lost • Dangling tuples disappear when joining relations, because the result has only complete tuples ⋈ =
Maximal vs. Complete Answers ⋈ = ⊥ is the NULL value
⋈ ⋈ = This tuple is subsumed by the third tuple, i.e., this tuple has less information This tuple is not in the result
⋈ ⋈ = This tuple is generated from two tuples that are not connected by a common attribute This tuple is the result of a Cartesian product This tuple is not in the result
The Main Difficulty in Computing Maximal Answers Efficiently • How to generate maximal answers without generating subsumed tuples • Consider two algorithms • The naïve algorithm generates all the answers and then removes subsumed tuples • The second algorithm generates only maximal answers • In the worst case, both have an exponential runtime in the size of the input (i.e., the database and the query)
How to Measure Efficiency? • When the output could be exponential in the size of the input, the runtime should be a function of the combined size of the input and the output • For example, this is the complexity measure of the algorithm for joining acyclic relations [Yannakakis 1981] • This complexity measure can discern that the naïve algorithm is not efficient
Outerjoins do not generate FDs R3 R1 R2
Outerjoins do not generate FDs R1 R2 R1 R2 R3 R3
Outerjoins do not generate FDs R1 R2 (R1 R2) R3 R3
Outerjoins do not generate FDs (R1 R2) R3 R1 R3 R2
Outerjoins do not generate FDs (R1 R2) R3 R1 R3 R2 ?
Tuple Sets • R1,…,Rn are the given relations • A tuple set has at most one tuple from each relation • Let T be a tuple set • JCC(T)denotes that T is • Join consistent (no dangling tuples), and • Connected (no Cartesian product)
An Example of a Tuple Set Let T be the set of the red tuples JCC(T ) is true The red tuples are join consistent & connected
Another Example JCC(T ) is false The red tuples are not join consistent
A Third Example JCC(T ) is true
But if we change the attribute B of the last relation to E, then E T is not connected The two rightmost tuples do not share any attribute with the left tuple
Full Disjunctions • The full disjunctionFD(R1,…, Rn) of R1,…, Rn consists of all T, s.t. T is a maximal JCC tuple set, i.e., Tis • join-consistent, • connected, and • maximal • That is, there is no tuple t ∉T, such that T ⋃{t } is a JCC tuple set
Finding Maximal JCC Tuple Sets Choose a tuple from the first relation Extend it to a maximal JCC tuple set T
Finding other maximal JCC tuple sets without doing too much work Choose a tuple from the first relation Extend it to a maximal JCC tuple set T Choose a tuple tnot in T
The Next Step Remove the other tuplefrom the relation containingt Choose a tuple from the first relation Extend it to a maximal JCC tuple set T Choose a tuple tnot in T Find a maximal subset ofT ⋃{t }that is JCC and containst
Remove the other tuplefrom the relation containingt Remove the tuple that isnot join consistent witht Remove the other tuplefrom the relation containingt Choose a tuple from the first relation Extend it to a maximal JCC tuple set T Choose a tuple tnot in T Find a maximal subset ofT ⋃{t }that is JCC and containst
Remove the tuple that isnot join consistent witht Remove the tuple that isnot join consistent witht Remove the tuple that isdisconnected fromt Choose a tuple from the first relation Extend it to a maximal JCC tuple set T Choose a tuple tnot in T Find a maximal subset ofT ⋃{t }that is JCC and containst
Remove the tuple that isdisconnected fromt Choose a tuple from the first relation Extend it to a maximal JCC tuple set T Choose a tuple tnot in T Find a maximal subset ofT ⋃{t }that is JCC and containst
This is the unique maximal subset ofT⋃{t }that is JCC and contains t Choose a tuple from the first relation This subset can be extended to another maximal JCC tuple set Extend it to a maximal JCC tuple set T Choose a tupletnot in T Find a maximal subset ofT ⋃{t }that is JCC and containst
Extending JCC Tuple Sets Problem: • We are given a set T of tuples that is join-consistent and connected • We would like to generate a maximal set of tuples Tm, such that • Tm is join consistent and connected, and • T ⊆ Tm
Extending JCC Tuple Sets (cont'd) Solution: • A naïve, greedy approach will work! • Start with Tm=T • While there exists a tuple t, such that Tm⋃{t} is join consistent and connected, insert t into Tm • When no tuple can be added to Tm, stop and return Tm
Maximally Contained JCC Tuple Sets Problem: • We are given • a set of tuples T that is join consistent and connected, and • a tuple t (which is not necessarily in T) • We would like to generate a maximal set of tuples Tm, such that • Tm is join consistent and connected, and • t∈Tm⊆ T⋃{t}
Uniqueness of Tm • Tm is join consistent and connected • t∈Tm⊆ T⋃{t} • Proposition: If T1 and T2 satisfy 1 and 2, then so does T1⋃T2 • We conclude that there is a unique maximal set Tmthat satisfies 1 and 2 • Tm is the union of all sets satisfying 1 and 2
FindingTm • Start with Tm=T⋃{t} • Remove from Tm all tuples that are not join consistent with t • Remove from Tm all tuples that are not reachable from t through a path (i.e., leave only the connected component that contains t) • Return Tm
An Example t T
An Example Disagree T t
An Example T t
An Example Tm
Data Structures • We use two data structures for holding intermediate results: Q: contains tuple sets that need to be printed C: contains all sets that are already printed • Initially, C is empty and Q consists of an arbitrary maximal tuple set T • T can be obtained, for example, by maximally extending the empty tuple set
The Algorithm • While Q is not empty: • Remove an element T from Q • Print T and insert it into C • For each tuple t in the database: • Generate the maximal tuple set Tm, such that t∈Tm⊆ T⋃{t} and JCC(Tm) • Maximally extend Tm • If Tm is neither in Q nor in C, then insert Tm into Q
The Algorithm Runs withPolynomial Delay • The outer loop prints one tuple set of the result in each iteration • The inner loop is repeated for each tuple of the database • Each iteration of the inner loop requires linear time in the size of the database • Testing whether Tm is neither in Q nor in C requires logarithmic time in the size of Q and C, i.e., linear time in the size of the database The delay is quadratic
Correctness of the Algorithm • Clearly, the algorithm prints only maximal JCC tuple sets • Moreover, no tuple set is printed more than once • It remains to show that every maximal JCC tuple set is printed by the algorithm
Proof • Suppose, by way of contradiction, that S is a maximal JCC tuple set that is not printed by the algorithm • Let S' be a maximal tuple set, such that • S'⊆ S • JCC(S') • S' is contained in a tuple set that is printed by the algorithm • Let T be a set that is printed and contains S' • Note that S' is properly contained in S
The Tuple t • Since S and S' are connected and S'⊊S, there exists a tuple t, such that • t∈S\S' and • JCC(S' ⋃{t}) S S'
When T is Printed… • Consider the iteration (of the while loop) when T is removed from Q • Consider the iteration (of the for loop) when t is chosen • The algorithm finds the (unique) maximal tuple set Tm, such that t ∈Tm ⊆ T⋃{t} and JCC(Tm) • Since S'⋃{t} ⊆ T⋃{t} and JCC(S' ⋃{t}), Tm contains S'⋃{t}