90 likes | 194 Vues
Explore the challenges of storing and managing sets efficiently, delving into deterministic dictionary construction and fast lookups. Learn about random and deterministic solutions, hash functions, avoiding collisions, and optimization methods for construction time.
E N D
Uniform algorithms for deterministic construction of efficient dictionaries Milan Ružić IT University of Copenhagen Faculty of Mathematics University of Belgrade ESA 2004 / ARCO 2005 presentation
The dictionary problem • How to store a set S Uand answer inquires about membership: “is xS?”. • In the dynamic dictionary problem, Smay change over time. • Conditions: • Compute on a unit-cost RAM with word length wand a standard instruction set, including multiplication and division. • Finite universe U {0,1}w . • Use space linear in n | S | .
Randomized solutions • Started with a static dictionary with O(n) expected construction time, using (nw)random bits [Fredman, Komolós, Szmerédi ‘82]. • Reached a dynamic dictionary with: • Constant search time. • Constant update time with probability O(1 – n-c). • Use of only O(log n + log w) random bits. [Dietzfelbinger et al ’92] • However, what if: • random bits are not easily available, or • performance without a guarantee is unacceptable?
The family of hash functions • Viewing the problem in a continuous setting - HR . • A sufficient condition for avoiding collisions :
The set of good parameters • The set of multipliers which generate less than m collisions on the set ofsdifferences has the measure of at least • We can calculate the measures with numbers of bounded precision. • The set of “good” parameters contains sufficiently large intervals – that is, there are “good” multipliers which can be represented by a constant number of machine words.
Finding a good function • Problem: Given a set of s differences, deterministically find a multiplier a which produces less than m colliding differences. • Not all differences need to be explicitly stored in memory. • We use bit by bit construction – sometimes several consecutive bits are set at once. • Choosing a bit is equivalent to choosing a half of a working interval. • Key observation: sets with relatively small support intervals are insignificant to current choice.
Three classes of differences • The recurrence for measure estimates: 1(p+1) + 2(p+1) + E(p+1) (p) + E(p) • Several bits are chosen at once when Dmid. • O(w) term represents the total cost of finding the leftmost bits of keys.
Reducing the construction time • We employ multi-level hashing scheme. The number of levels can be set by adjusting the parameters m and s. • The structure of the set of differences: • In the case of O(1) lookup time we set nkn, m 4n and r n. • Note on evaluation: When input consists of multi-word keys, full multiplication is usually not necessary.