350 likes | 471 Vues
Improving Bloom Filter Configuration for Lazy Transactional Memory. Mark Jeffrey and J. Gregory Steffan ECE, University of Toronto November 10, 2011. Parallel Programming is Hard. T 1. T 3. T 2. Rd(a). Rd(a). Rd(x). Rd(b). Wr (c). Rd(a). Wr (a). Rd(a).
E N D
Improving Bloom Filter Configuration for Lazy Transactional Memory Mark Jeffrey and J. Gregory Steffan ECE, University of Toronto November 10, 2011
Parallel Programming is Hard T1 T3 T2 Rd(a) Rd(a) Rd(x) Rd(b) Wr(c) Rd(a) Wr(a) Rd(a) • Many tools are using Bloom filters Tools offload some burden of managing data accesses: • Memory Race Replay • Atomicity Violation Survival • Transactional Memory • Speculative Optimizations Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Bloom Filter & We show new practices are inefficient! (in theory and empirically) • Bit-vector-based data structure [1970] • offers fast set operations • in exchange for some imprecision • Recently used to compare memory accesses • With unconventional practices: Intersection Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Bloom Filters in Concurrency Tools Our propositions will improve parallelism! Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Tracking Address-Set Conflicts Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Address-Sets T1 T3 T2 Rd(a) Rd(x) Rd(a) Rd(b) Wr(c) Wr(a) Rd(a) Rd(a) Read Set: memory locations read RT1={a,b} Write Set: memory locations written WT1={a} Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Burden: Address-Set Conflicts T1 T3 T2 Rd(a) Rd(x) Rd(a) Rd(b) Wr(c) Wr(a) Rd(a) Rd(a) Conflicts • address accesses are dependent • independence -> parallelism! • address conflicts -> no parallelism Conflict Detection requires • read and write set comparison Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Lazy Conflict Detection T1 T2 Rd(a)- -Rd(a) Wr(b)- -Rd(b) R1={a,c}W1={b} Rd(c)- Test address-sets for null-intersections Detect conflicts at the end of a transaction Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Bloom Filters (BF) Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Bloom Filter Background x h() • Bloom filter is a compact set representation • bit vector - much smaller than address space Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Bloom Filter Background y h() {Yes, No} Query for an address, y Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Bloom Filter False Positives (FPs) x ? is y in y • Encode a large address space into a bit-vector • response to query is actually No or Maybe • False Positives – when “maybe” is wrong Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Partitioned Bloom Filter x h1() h2() … hk() … • Insert an address, x: • k hash functions encode k bit indices to set Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Partitioned Bloom Filter y h1() h2() … hk() … {Maybe, No} Probability of False Positives is well understood Query for an address, y: Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
UnconventionalBloom Filter Null-Intersection Tests Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Bloom Filter Null-Intersection Tests a1 ? a5 a4 a3 a2 Two existing approaches: • build a Queue of Queries (QoQ) • combine queries into distinct Bloom filter • replace many queries with 1 intersection! Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Partitioned BF Intersection & … … … {Disjoint, Maybe Overlap} Do two sets share any elements? Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Unpartitioned BF Intersection … … & … {Disjoint, Maybe Overlap} Any asserted bits indicate set overlap Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Imprecision in BF Intersection Understand and improve Bloom filter intersection • Bloom filter was intended for fast Querying • Recent systems use filter for Intersection • Imprecision can produce False Set-Overlaps (FSO) • We are the first to study Bloom filter FSOs • Our goal is to Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Important Questions When using BFs for testing null-intersection • How do BF Intersection and QoQ compare? • theoretical study [SPAA ‘11] • Can we compromise? • new Bloom filter design • Does theory work in practice? • empirical study Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Bloom Filters for Null-Intersection Tests How do BF Intersection and QoQ compare? Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Definitions Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Definitions h1() h2() … hk() … Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Probability of FSO[SPAA ‘11] b1 ϵ? b5 b4 b3 b2 h1 h2 … hk h1 h2 … hk • Unpartitioned BF Intersection • Partitioned BF Intersection • Queue of BF Queries Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Comparing FSOs [SPAA ’11] b4 b3 b2 h1 … hk h1 … hk b1 ϵ? • Queue of Queries gives the fewest false conflicts • Partitionedintersection improves on Unpartitioned For any length m, and k > 1 hash functions, Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Bloom Filters for Null-Intersection Tests Can we compromise? A new Bloom filter design Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Batch-of-Bloom-filters (BoB) hpre x … x h1 hk … … … … h1 hk h1 hk … … Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
BoB Intersection & … … … … … … … BoB: compromise between QoQ and Intersect {Disjoint, Maybe Overlap} Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Bloom Filters for Null-Intersection Tests Does theory work in practice? Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Methodology • Augment RingSTM with alternate BF configs [Spear et al. SPAA ’08] • unpartitionedBloom filterintersection • Stress BF configurations using STAMP bench • 8-core Intel Xeon with SSE2 ISA • 32-bit Linux 2.6.32-5-686 Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Performance Results: Labyrinth Execution Time Aborts 21% Speedup Better QoQ, BoB, part. intersect outperform baseline Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Performance Results: Kmeans-low Execution Time Aborts >25% slowdown Better Querying overhead counteracts reduced aborts Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Conclusion Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Conclusion Conflict detection often applies Bloom filters • for fast set operations: y ϵ S and S1∩S2 • unconventionally using BFs for null-intersection Our recommendations (from theory & practice) • strongly consider querying before intersection • in hardware, consider intersecting BoBs • build adaptive systems for application behaviors Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM
Improving Bloom Filter Configuration for Lazy Transactional Memory Thank you! markj@eecg.toronto.edu