Utilization of GPU’s for General Computing

Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji, et al.

Overview • Problem: • Want to use the GPU for things other than graphics, however the costs can be high • Solution: • Improve the CUDA drivers • Results: • As compared to node of a supercomputer, worth it • Conclusion • These improvements make using GPGPU’s more feasible

Problem: Need to computation power • Why GPU’s? • GPU’s are not being fully realized as a resource, often sitting idle when not being used for graphics • Better performance for less power as compared to CPU’s • What’s the issue? Cost. • Efficient scheduling – timing data loads with its uses • Memory management – using the small amount of memory available effectively • Loads and stores – waiting for memory transfers, taking 100’s of cycles

Solutions • Brook+ by AMD, Larrabee by Intel • CUDA by NVIDA • Greatest technological maturity at the time • Paper investigating existing technology and suggested improvements 8 Streaming Processors 30 Multi-Processors 16kb

NVIDA’s Tesla C1060 GPU vs. Hitachi HA8000-tc/RS425 (T2K) Super Computer • T2K – fastest supercomputer in Japan

Issues to Overcome • High SIMD vector length • Small main memory size • High register spill cost • No L2 cachebut rather read-only texture caches

Methods to Hide Away Latency • CUDA compiler option limits number of registers used per warp • 1 warp = the 32 threads running in a block (SMID) • Maximizes number of warps that can run at a time • Could cause spills • Variable-sized multi-round data transfer scheduling with PCI express • PCI express allows for data transfer, GPU and CPU computation to occur in parallel • Allows for constant flow of information: • Allows for up to O(log x/x) as compared to uniform scheduling’s O()

Methods to Hide Away Latency • Computation time between communications > Communication latency • Worth sending the data over to the GPU • Increasing bandwidth and size of messages makes the constant term in overhead latency seem smaller • Efficient use of registers to prevent spills • Deciding what work to do where, GPU vs. CPU, work sharing • Minimizing divergent warps using atomic operations found in CUDA • Divergent warp occur when threads must follow both paths

Results • Variable-sized multi-round data transfer scheduling Number of rounds

Results • Use of atomic instructions in CUDA to minimize latency

Conclusion • CUDA gives programmers the ability to harness the power of the GPU for general uses. • The improvements presented allow this option to be more feasible. • Strategic use of GPGPU’s as a resource will improve speed and efficiency. • However, presented material mainly theoretical, not much strong data to back up • More suggestions than implementations, promoting GPGPU use

Utilization of GPU’s for General Computing

Utilization of GPU’s for General Computing

Presentation Transcript

Techno Teach: Handheld Computing

Biologically Inspired Computing: Introduction

An Introduction and Overview of Cloud Computing

General Purpose Computing with Condor

Grid Computing and Alternative Distributed Computing Solutions

Part 4: LFG Utilization

Computer-aided education in Electrical Engineering in the light of Didactic Principles

If You Like Money, General-Purpose I s for You

Research Utilization in Nursing Chapter 21

Cloud Computing in Healthcare

How to Print Utilization Report

Quantum

Computing for Belle

The LHC Computing Challenge

Scientific Computing on Heterogeneous Clusters using DRUM (Dynamic Resource Utilization Model)

Computing Engine Choices

Information Technologies Department

Computing Concepts

General framework to think about fish waste utilization: a multistep procedure

Site Report: The Linux Farm at the RCF

Utilization and ROI Reports

Using Knowledge to Facilitate Better Data Discovery, Access, and Utilization for CloudGIS