Hardware Acceleration Using GPUs

Hardware Acceleration Using GPUs M Anirudh Guide: Prof. Sachin Patkar VLSI Consortium April 4, 2008

Advantages of Using Graphics Processors • Parallel architectures with lots of ALUs • High memory bandwidth • Cheap, fast and scalable • New generation within 2 years • High Gflops/$ Cons • No double precision yet ( only SP floating point operations) • Loss of precision (not fully IEEE 754 compliant)

NVIDIA GeForce 8 Series cards • Currently using 8500GT to test our algorithms • 8500GT has 16 processors and a theoretical peak fp performance of 28.8 Gflops and memory bandwidth of 12.8GB/s • Scalable architecture • 8800 GT – 128 processors, ~350 Gflops and 86.4 GB/s

GeForce 8500GT Architecture Thread Scheduler Control Control ALU ALU ALU ALU ALU ALU ALU ALU Local Memory Local Memory ALU ALU ALU ALU ALU ALU ALU ALU Shared Memory GLOBAL MEMORY

Programming Model • Massively multi-threaded • Threads -> warps -> blocks -> grid • Shared memory and global memory • Coalesced memory access - 5GB/s – 70 GB/s

Results • Matrix-vector operations are so slow because of the data transfer from host to device. • 10 Gflops on GPU for matrix-matrix compared to 2+ Gflops on CPU and 6 Gflops reported using BLAS. Also Nvidia 8800 card is observed to have a performance of up to 180 Gflops for matrix-matrix multiplication using optimized algorithms.

Conclusion • Most reported performances for GPU are ~30-40% of theoretical peak performances. These are still 5x - 10x faster than CPU • Considerable understanding and work required to fully optimize code • Matrix-matrix operations are easily a magnitude faster than on CPU Future Work • Aim is to develop optimized routines for LU decomposition, Cholesky, Conjugate Gradient etc • Try to incorporate these routines with the DC Analyzer to achieve both performance improvement as well as tackle larger data sizes.

Hardware Acceleration Using GPUs

Hardware Acceleration Using GPUs

Presentation Transcript

Introduction to Graphics Hardware and GPUs

Hardware Acceleration

Leveraging GPUs for Application Acceleration Dan Ernst Cray, Inc.

Scalable Multi-Cache Simulation Using GPUs

Scalable Clustering using Multiple GPUs

Hardware Design Using EDK

Hardware Acceleration of Parallel Prefix Algorithms

Application Performance through Hardware Acceleration

Hardware Acceleration for Stereo-Vision Algorithms

Hardware Acceleration of Fault-tolerant System Verification

Scalable Clustering for Vision using GPUs

FPGAs for the Masses: Hardware Acceleration without Hardware Design

Operational Weather Forecasting using GPUs

Region-Scale Evacuation Modeling using GPUs

Hardware Acceleration of Applications Using FPGAs

Digital signature using MD5 algorithm Hardware Acceleration

Programming GPUs using Directives

Application Performance through Hardware Acceleration

Application Performance through Hardware Acceleration

Using GPUs for Rapid Electromagnetic Modeling

Scalable Clustering for Vision using GPUs

Interactive Reach Planning for Animated Characters Using Hardware Acceleration