1 / 28

Shredder GPU -Accelerated Incremental Storage and Computation

Shredder GPU -Accelerated Incremental Storage and Computation. Pramod Bhatotia § , Rodrigo Rodrigues § , Akshat Verma ¶ § MPI - SWS , Germany ¶ IBM Research-India. USENIX FAST 2012. Handling the data deluge. Data stored in data centers is growing at a fast pace

lise
Télécharger la présentation

Shredder GPU -Accelerated Incremental Storage and Computation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ShredderGPU-Accelerated IncrementalStorage and Computation Pramod Bhatotia§, Rodrigo Rodrigues§, AkshatVerma¶ §MPI-SWS, Germany ¶IBM Research-India USENIX FAST 2012

  2. Handling the data deluge Data stored in data centers is growing at a fast pace Challenge: How to store and process this data ? • Key technique: Redundancy elimination Applications of redundancy elimination • Incremental storage: data de-duplication • Incremental computation: selective re-execution Pramod Bhatotia

  3. Redundancy elimination is expensive Duplicate For large-scale data, chunking easily becomes a bottleneck Chunks Hash Yes File Chunking Hashing Matching No Content-based chunking [SOSP’01] Fingerprint File 0 1 0 1 1 0 1 0 1 1 0 0 1 1 0 1 1 0 0 1 0 1 1 1 1 0 1 1 0 1 0 1 0 Content Marker Pramod Bhatotia

  4. Accelerate chunking using GPUs GPUs have been successfully applied to compute-intensive tasks Storage Servers 20Gbps (2.5GBps) ? 2X Using GPUs for data-intensive tasks presents new challenges Pramod Bhatotia

  5. Rest of the talk Shredder design • Basic design • Background: GPU architecture & programming model • Challenges and optimizations Evaluation Case studies • Computation: Incremental MapReduce • Storage: Cloud backup Pramod Bhatotia

  6. Shredder basic design CPU (Host) GPU (Device) Transfer Chunking kernel Reader Store Data for chunking Chunked data Pramod Bhatotia

  7. GPU architecture GPU (Device) Host memory Multi-processor N Device global memory Multi-processor 2 Multi-processor 1 CPU (Host) Shared memory PCI SP SP SP SP Pramod Bhatotia

  8. GPU programming model GPU (Device) Host memory Multi-processor N Input Device global memory Multi-processor 2 Multi-processor 1 CPU (Host) Shared memory PCI Output SP SP SP SP Threads Pramod Bhatotia

  9. Scalability challenges Host-device communication bottlenecks Device memory conflicts Host bottlenecks (See paper for details) Pramod Bhatotia

  10. Challenge # 1 Host-device communication bottleneck CPU (Host) GPU (Device) Synchronous data transfer and kernel execution • Cost of data transfer is comparable to kernel execution • For large-scale data it involves many data transfers Main memory Chunking kernel Device global memory Transfer PCI Reader I/O Pramod Bhatotia

  11. Asynchronous execution GPU (Device) CPU (Host) Pros: + Overlaps communication with computation + Generalizes to multi-buffering Cons: - Requires page-pinning of buffers at host side Asynchronous copy Deviceglobal memory Main memory Transfer Buffer 1 Buffer 2 Copy to Buffer 2 Copy to Buffer1 Time Compute Buffer 1 Compute Buffer 2 Pramod Bhatotia

  12. Circular ring pinned memory buffers CPU (Host) Memcpy GPU (Device) Asynchronous copy Deviceglobal memory Pinned circular Ring buffers Pageable buffers Pramod Bhatotia

  13. Challenge # 2 Device memory conflicts CPU (Host) GPU (Device) Main memory Chunking kernel Device global memory Transfer PCI Reader I/O Pramod Bhatotia

  14. Accessing device memory Deviceglobal memory 400-600 Cycles Thread-1 Thread-3 Thread-2 Thread-4 Few cycles Device shared memory SP-1 SP-2 SP-3 SP-4 Multi-processor Pramod Bhatotia

  15. Memory bank conflicts Deviceglobal memory Un-coordinated accesses to global memory lead to a large number of memory bank conflicts Device shared memory SP SP SP SP Multi-processor Pramod Bhatotia

  16. Accessing memory banks Interleaved memory Bank 3 Bank 2 Bank 0 Bank 1 3 7 2 6 1 5 Memory address 0 4 8 Chip enable OR MSBs LSBs Data out Address Pramod Bhatotia

  17. Memory coalescing Device global memory Time Thread # 4 3 2 1 Memory coalescing Device shared memory Pramod Bhatotia

  18. Processing the data Thread-1 Thread-2 Thread-4 Thread-3 Device shared memory Pramod Bhatotia

  19. Outline Shredder design Evaluation Case-studies Pramod Bhatotia

  20. Evaluating Shredder Goal: Determine how Shredder works in practice • How effective are the optimizations? • How does it compare with multicores? Implementation • Host driver in C++ and GPU in CUDA • GPU: NVidia Tesla C2050 cards • Host machine: Intel Xeon with12 cores (See paper for details) Pramod Bhatotia

  21. Shredder vs. Multicores 5X Pramod Bhatotia

  22. Outline Shredder design Evaluation Case studies • Computation: Incremental MapReduce • Storage: Cloud backup (See paper for details) Pramod Bhatotia

  23. Incremental MapReduce Read input Map tasks Reduce tasks Write output Pramod Bhatotia

  24. Unstable input partitions Read input Map tasks Reduce tasks Write output Pramod Bhatotia

  25. GPU accelerated Inc-HDFS Input file copyFromLocal Shredder HDFS Client Content-based chunking Split-3 Split-1 Split-2 Pramod Bhatotia

  26. Related work GPU-accelerated systems • Storage: Gibraltar [ICPP’10], HashGPU [HPDC’10] • SSLShader[NSDI’11], PacketShader[SIGCOMM’10], … Incremental computations • Incoop[SOCC’11], Nectar[OSDI’10], Percolator[OSDI’10],… Pramod Bhatotia

  27. Conclusions GPU-accelerated framework for redundancy elimination • Exploits massively parallel GPUs in a cost-effective manner Shredder design incorporates novel optimizations • More data-intensive than previous usage of GPUs Shredder can be seamlessly integrated with storage systems • To accelerate incremental storage and computation Pramod Bhatotia

  28. Thank You!

More Related