Memory –efficient Data Management Policy for Flash-based Key-Value Store

Memory –efficient Data Management Policy for Flash-based Key-Value Store Wang Jiangtao 2013-4-12

Outline • Introduction • Related work • Two works • BloomStore[MSST2012] • TBF[ICDE2013] • Summary

Key-Value Store • KV store efficiently supports simple operations: Key lookup & KV pair insertion • Online Multi-player Gaming • Data deduplication • Internet services

Overview of Key-Value Store • KV store system should provide high access throughput (> 10,000 key lookups/sec) • Replaces traditional relational DBs for its superior scalability & performance. • prefer to use KV store for its simplicity and better scalability • Popular management (index + storage) solution for large volume of records – often implemented through an index structure, mapping Key-> Value

Challenge • To meet high throughput demand, the performance of index access and KV pair (data) access is critical • index access : search the KV pair associated with a given “key” • KV pair access: get/put the actual KV pair • Available memory space limits the maximum number of stored KV pairs • Using in-RAM index structure can only address index access performance demand

DRAM must be Used Efficiently • 1 TB of data • 4 bytes of DRAM for key-value pair 32 B( Data deduplication) => 125 GB! Index size(GB) 168 B(Tweet) => 24 GB 1 KB(Small image) => 4 GB Per Key-value pair size (bytes)

Existing Approach to Speed up Index & KV pair Accesses • Maintain the index structure in RAM to map each key to its KV pair on SSD • RAM size can not scale up linearly to flash size • Keep the minimum index structure in RAM, while storing the rest of the index structure in SSD • On-flash index structure should be designed carefully • Space is precious • random writes are slow and bad for flash life (wear out)

Bloom Filter • Bloom Filter利用位数组表示一个集合，并判断一个元素是否属于这个集合。初始状态时，m位的位数组的每一位都置为0，Bloom Filter使用k个相互独立的哈希函数，它们分别将集合中的每个元素映射到{1,…,m}的范围中。对任意一个元素x，第i个哈希函数映射的位置hi(x)就会被置为1（1≤i≤k）。注意，如果一个位置多次被置为1，那么只有第一次会起作用，后面几次将没有任何效果。 • 错误率 • Bloom Filter参数选择 • 哈希函数的个数k、位数组大小m、元素的个数n • 降低错误率

FlashStore[VLDB2010] • Flash as a cache • Components • Write buffer • Read cache • Recency bit vector • Disk-presence bloom filter • Hash table index • Cons • 6 bytes of RAM per key-value pair

SkimpyStash[SIGMOD2011] • Components • Write buffer • Hash table • Bloom filter • using linked list • a pointer to the beginning of the linked list of flash • Storing the linked lists on flash • Each pair have a pointer to earlier keys in the log • Cons • Multiple flash page reads for a key lookup • High garbage collection cost

MSST2012

Introduction • Key lookup throughput is the bottleneck for data application • Keep an in-RAM large-sized hash table • Move index structure to secondary storage(SSD) • Expensive random write • High garbage collection cost • Bigger storage space

BloomStore • BloomStore Design • An extremely low amortized RAM overhead • Provide high key lookup/insertion throughput • Componets • KV Pair write buffer • Active bloom filter • a flash page for write buffer • Bloom filter chain • many flash pages • Key-range partition • a flash “block” BloomStore architecture

KV Store Operations • Key Lookup • Active Bloom filter • Bloom filter chain • Lookup cost

Parallel lookup • Key Lookup • Read the entire BF chain • Bit-wise AND resultant row • High read throughput h1(ei) h1(ei) ... h1(ei) Bit-wise AND eiis found Bloom filters in parallel

KV Store Operations • KV pair Insertion • KV pair Update • Append a new key-value pair • KV pair Deletion • Insert a null value for the key

Experimental Evaluation • Experiment setup • 1TB SSD(PCIe)/32GB(SATA) • Workload

Experimental Evaluation • Effectiveness of prefilter • Per KV pair is 1.2 bytes • Linux Workload • Vx Workload

Experimental Evaluation • Lookup Throughput • Linux Workload • H=96(BF chain length) • m=128(the size of a BF) • Vx Workload • H=96(BF chain length) • m=64(the size of a BF) • A prefilter

ICDE2013

Motivation • Using flash as a extension cache is cost-effective • The desired size of RAM-cache is too large • Caching policy is memory-efficient • Replacement algorithm achieves comparable performance with existing policies • Caching policy is agnostic to the organization of data on SSD

Defects of the existing policy • Recency-based caching algotithm • Clock or LRU • Access data structure and index

System view • DRAM buffer • An in-memory data structure to maintain access information (BF) • No special index to locate key-value pair • Key-value store • Provide a iterator operation to traverse • Write through BF Key-Value cache prototype architecture

Bloom Filter with deletion(BFD) • BFD • Removing a key from SSD • A bloom filter with deletion • Resetting the bits at the corresponding hash-value in a subset of the hash functions X1 Delete X1

Bloom Filter with deletion(BFD) • Flow chart • Tracking recency information • Cons • False positive • polluting the cache • False negative • Poor hit ratio

Two Bloom sub-Filters(TBF) • Flow chart • Dropping many elements in bulk • Flip the filter periodically • Cons • Keeping rarely-accessed objects • polluting the cache • traversal length per eviction

Traversal cost • Key-Value Store Traversal • unmarked on insertion • marked on insertion • longer stretches of marked objects • False positive

Evaluation • Experiment setup • two 1 TB 7200 RPM SATA disks in RAID-0 • 80 GB FusionioDrive PCIE X4 • a mixture of 95% read operations and 5% update • Key-value pairs:200 million(256B) • Bloom filter • 4 bits per marked object • a byte per object in TBF • hash function:3

Summary • KV store is particularly suitable for some special applications • Flash will improve the performance of KV store due to its faster access • Some index structure need to be redesign to minimize the RAM size • Don’t just treat flash as disk replacement

Thank You!

Memory –efficient Data Management Policy for Flash-based Key-Value Store

Memory –efficient Data Management Policy for Flash-based Key-Value Store

Presentation Transcript