1 / 22

Fusion Trees

Fusion Trees. Advanced Data Structures Aris Tentes. Goal. Fixed Universe Successor Problem We have a set of n numbers Each number has a length of at most log u bits (u=size of the fixed Universe) We want to perform the following actions: Predecessor/Successor Insertion/Deletion

willem
Télécharger la présentation

Fusion Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fusion Trees Advanced Data Structures Aris Tentes

  2. Goal Fixed Universe Successor Problem • We have a set of n numbers • Each number has a length of at most log u bits (u=size of the fixed Universe) • We want to perform the following actions: • Predecessor/Successor • Insertion/Deletion in time better than O(log n)

  3. Model Transdichotomous RAM • Memory is composed of words • Each word has a length of w=logu • Each item we store must fit in a word • The following operations require constant time: • Addition, Subtraction • Multiplication, Division • AND, OR, XOR • left/right Shift • Comparison

  4. Main Idea A fusion tree is a B-tree with fan-out and, therefore, has a height of If we find a way to determine, where a query fits among the B keys of a node in constant time, then we have an solution to our problem

  5. In the Nodes • Suppose that the keys (K) in a node are • If we view them in a binary tree then we have the following picture: • The black nodes are the branching nodes. • For k keys, there are exactly k-1 branching nodes. • However, some of them may be in the same level. • Thus, less than k bits are required to distinguish the ‘s.

  6. We construct the set B(K) with the branching levels (namely the bit positions required to distinguish the keys) • Let with and • Def. :PerfectSketch(x)= the extracted bits according to B(K) of x. Namely, the bits of x, which correspond to the positions • If we collect the perfect sketches of all k keys, then we are able to reduce the node representation to k r-bit strings. • That means that bits would be efficient. Less than a word!!

  7. However, computing PerfectSketch(x) is difficult. Therefore, we compute an approximation, called Sketch(x). • Sketch(x) contains the samebits with PerfectSketch(x), in the same order with some extra 0’s in between, but in consistent positions. • This is done by multiplying x by a number m, which we will see later how we choose it.

  8. Firstly, we compute leaving only the bits which correspond to B(K). • If then we observe that • All we need is to find an m such that: • All are distinct (no collisions) • (to preserve order) • are concentrated in a small range ( )

  9. If we find such an m, then we compute which is long. • Note that k sketches fit in a word.

  10. Can we find such an m? • Firstly, we show how to find such that whenever • Suppose we have found with the desired property. • We observe that implies • Thus we can choose to be the least residue not represented among the fewer than residues of the form • Then, by adding suitable values of we obtain the final values of mi

  11. The set of the sketched keys of a node is denoted by S(K) • Def.: We define the sketch of an entire node as follows:

  12. Lemma • Suppose y is an arbitrary number and xi an element of S (the set of keys). Let be the elements of B(S) and m-1 the most significant bit position in which PerfectSketch(y) and PerfectSketch(xi) differ. • Assume that p>bm is the most significant position in which y and xi differ. • Then the rank(y) in S is uniquely determined by the interval containing p and the relative order between y and xi.

  13. Using the previous lemma, we can reduce the computation of rank(y) in K to computing rank(Sketch(y)) in K(S). • Having computed rank(Sketch(y)), we have determined the predecessor and successor Sketch(xi) and Sketch(xi+1) of Sketch(y) in K(S). • If xi≤y≤xi+1, then we are done. • Else we pick the one (from the sketched ones) with the longest prefix of significant bits with Sketch(y) and apply the previous lemma. • Use of a look up table.

  14. Finding the rank(Sketch(y)) in S(K) • Firstly, we compute • Then the substraction • And finally • Observing that .

  15. Suitable multiplication sums these ones and gives the desired rank. • What remains is to find a way to compute in constant time, the most significant bit, in which two numbers u,v differ. • We can easily see that this problem is reduced to the problem of finding the most significant bit of u XOR v. • We want to compute msb(x).

  16. Lemma • We call a number x d-sparse if the positions of its one bits belong to a set of the form Not all these positions have to be occupied by ones. • If x is d-sparse, then there exist constants y,y’, such that for z=(yx)ANDy’ the i’th bit of z equals the bit in the position of a+di of x. Namely, z is a perfect compression of x.

  17. msb(x) • At first consider a partitioning of the w bits of our word x into consecutive blocks of bits. The computation is divided into two phases. • We find the leftmost block containing a one and we extract this block • We find the leftmost one in this extracted block.

  18. First Phase • Let be the number, which has ones precisely in the leftmost position of each block, namely and • We compute lead(x)= the leftmost bit of each block is one iff x contains a one in this block. It is given by • We observe that lead(x) is d-sparse, so we can apply the previous lemma and obtain compress(x).

  19. Let be the set of the first b/s powers of two. • We compute b’=rank(compress(x)) in P, in the same way as before. • Note that b’ identifies the block number (counting from the right ) of the leftmost block of x containing a one.

  20. The position of the most significant one in lead(x) is f=sb’ • To extract the desired block we multiply by and right justify the significant portion.

  21. Second Phase • We want to find the position of the leftmost one in the extracted block. • As before, we do a rank computation of these s bits with the first s powers of two. • Now we have all the information needed to compute msb(x)

  22. Conclusions • In the static case, the problem of successor and predecessor, is clear to be solvable in time, since this is the height of our B-tree and the computation in each node requires constant time (the data we need is precomputed) • In the dynamic case, the total time to update a node is • The amortized time for insertion/deletion in a B-tree is constant.Therefore, sorting requires

More Related