Uniform algorithms for deterministic construction of efficient dictionaries

Uniform algorithms for deterministic construction of efficient dictionaries Milan Ružić IT University of Copenhagen Faculty of Mathematics University of Belgrade ESA 2004 / ARCO 2005 presentation

The dictionary problem • How to store a set S  Uand answer inquires about membership: “is xS?”. • In the dynamic dictionary problem, Smay change over time. • Conditions: • Compute on a unit-cost RAM with word length wand a standard instruction set, including multiplication and division. • Finite universe U  {0,1}w . • Use space linear in n  | S | .

Randomized solutions • Started with a static dictionary with O(n) expected construction time, using (nw)random bits [Fredman, Komolós, Szmerédi ‘82]. • Reached a dynamic dictionary with: • Constant search time. • Constant update time with probability O(1 – n-c). • Use of only O(log n + log w) random bits. [Dietzfelbinger et al ’92] • However, what if: • random bits are not easily available, or • performance without a guarantee is unacceptable?

Deterministic dictionaries with fast lookups

The family of hash functions • Viewing the problem in a continuous setting - HR . • A sufficient condition for avoiding collisions :

The set of good parameters • The set of multipliers which generate less than m collisions on the set ofsdifferences has the measure of at least • We can calculate the measures with numbers of bounded precision. • The set of “good” parameters contains sufficiently large intervals – that is, there are “good” multipliers which can be represented by a constant number of machine words.

Finding a good function • Problem: Given a set of s differences, deterministically find a multiplier a which produces less than m colliding differences. • Not all differences need to be explicitly stored in memory. • We use bit by bit construction – sometimes several consecutive bits are set at once. • Choosing a bit is equivalent to choosing a half of a working interval. • Key observation: sets with relatively small support intervals are insignificant to current choice.

Three classes of differences • The recurrence for measure estimates: 1(p+1) + 2(p+1) + E(p+1) (p) + E(p) • Several bits are chosen at once when Dmid. • O(w) term represents the total cost of finding the leftmost bits of keys.

Reducing the construction time • We employ multi-level hashing scheme. The number of levels can be set by adjusting the parameters m and s. • The structure of the set of differences: • In the case of O(1) lookup time we set nkn, m  4n and r n. • Note on evaluation: When input consists of multi-word keys, full multiplication is usually not necessary.

Uniform algorithms for deterministic construction of efficient dictionaries

Uniform algorithms for deterministic construction of efficient dictionaries

Presentation Transcript

Space-Efficient Algorithms for Document Retrieval

Deterministic Amplification of Space-Bounded Probabilistic Algorithms

Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection

A Parallel Algorithm for Construction of Uniform Grids

Kendo: Efficient Deterministic Multithreading in Software

Efficient Algorithms for Matching

D THREADS : Efficient Deterministic Multithreading

Dthreads: Efficient Deterministic Multithreading

EFFICIENT ALGORITHMS FOR MULTICHROMOSOMAL GENOME REARRANGEMENTS

Efficient Embedding of Deterministic Test Data

Efficient learning algorithms for changing environments

Energy-Efficient Algorithms

Efficient Algorithms for Elliptic Curve Cryptosystems

Efficient Synchronization for Non-Uniform Communication Architecture

Kendo: Efficient Deterministic Multithreading in Software

Algorithms for Efficient Collaborative Filtering

The Exploration of Deterministic and Efficient Dependency Parsing

Efficient Algorithms for Motif Search

Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection