Satisfying Data-Intensive Queries Using GPU Clusters

Satisfying Data-Intensive Queries Using GPU Clusters Haicheng Wu, Jeff Young Sudhakar Yalamanchili Computer Architecture and Systems Laboratory Center for Experimental Research in Computer Systems School of Electrical and Computer Engineering Georgia Institute of Technology Sponsors: AIC, AMD, LogicBlox Inc., National Science Foundation, NEC, NVIDIA

Application: Data Warehousing • On-line and off-line analysis • Retail analysis • Forecasting • Pricing • Etc… • Combination of relational data queries and computational kernels • Current applications process 1 to 50 TBs of data [1] • Techniques can be applied to other “Big Data” problems like irregular graphs, sorting …… LargeQty(p) <- Qty(q), q > 1000. …… [1] Independent Oracle Users Group. A New Dimension to Data Warehousing: 2011 IOUG Data Warehousing Survey.

Proposed System Model • Red Fox: Compilation and optimization of queries for GPUs • Remove need for application developer to optimize applications to run on GPUs • Oncilla: Global Address Space (GAS) layer • Create an API to simplify data movement and scheduling

Red Fox Compilation Flow Datalog Queries Primitives Library RA-to-PTX (nvcc + RA-Lib) LogicBlox Front-End Runtime Manager Kernel Weaver Translation Layer Back-End Language Front-End Query Plan PTX/Binary Kernel

Relational Algebra Primitives on GPUs Raw Performance (NVIDIA C2050) Fastest known for GPUs! • Multi-stage algorithm • Simple primitives are close to maximum performance • More complex primitives could show better performance with newer implementations (in progress with NVIDIA Research) Practical MAX

Red Fox: TPC-H Q1 Results • GPU computation scales well with problem size • Improved primitives could lead to further 10x speedup

Oncilla: Fabrics for Accelerator Clouds Companies: LogicBlox, NVIDIA, AIC • Goal: Transparent, efficient host memory aggregation across node for accelerators • Solution: Use Global Address Spaces (GAS) and commodity fabrics (HT, QPI, PCIe, 10GE, IB) • Support in-core databases using software from Red Fox project

Oncilla: Efficient Data Movement • Oncilla aims to combine support for multiple types of data transfer and CUDA-based optimizations under a simplified runtime. • Ex: “oncilla_malloc(2 GB, node2, gpumem)” • Enable application developers and schedulers to take advantage of high-performance GAS without needing to be experts in specialized hardware

Questions? For more information: Red Fox: H. Wu, G. Diamos, H. Cadambi, and S. Yalamanchili, “KernelWeaver: Automatically Fusing Database Primitives for Efficient GPU Computation,” MICRO, December 2012 http://gpuocelot.gatech.edu/projects/compiler-projects/ Oncilla: http://gpuocelot.gatech.edu/projects/compiler-projects/oncilla-gas-infrastructure/ J. Young, S. Yalamanchili, Commodity Converged Fabrics for Global Address Spaces in Accelerator Clouds,” HPCC, June, 2012

Satisfying Data-Intensive Queries Using GPU Clusters

Satisfying Data-Intensive Queries Using GPU Clusters

Presentation Transcript

Data Management: Queries

Processing Data Intensive Queries in Scientific Database Federations

Heterogeneous CPU/GPU co-processor clusters

Geometric Computations on GPU: Proximity Queries

Satisfying Strong Application Requirements in Data-Intensive Clouds

Data Intensive Cyberinfrastructure

Memento : Coordinated In-Memory Caching for Data-Intensive Clusters

Using Clusters

Data Intensive Scientific Compute Model for Multicore clusters

Large Data Visualization on Distributed Memory Multi-GPU Clusters

GP using GP GPU

Custom Data Queries

A Service for Data-Intensive Computations on Virtual Clusters

Queries for data modification: Action queries

Balancing Performance and Power Consumption in Data-Intensive Computing Clusters

Harvesting the Opportunity of GPU-Based Acceleration for Data-Intensive Applications

Using Action Queries

Data Queries

Granular Visibility Queries on the GPU

Processing Data-Intensive Queries in Petabyte-Scale Scientific Databases

Data Management: Queries

Large Data Visualization on Distributed Memory Multi-GPU Clusters