Last time: Runtime infrastructure for hybrid (GPU-based) platforms Task scheduling

Last time: Runtime infrastructure for hybrid (GPU-based) platforms • Task scheduling • Extracting performance models at runtime • Memory management • Asymmetric Distributed Shared Memory StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines, Cédric Augonnet, Samuel Thibault, and Raymond Namyst. TR-7240, INRIA, March 2010. [link] An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems, Isaac Gelado, Javier Cabezas, John Stone, Sanjay Patel, Nacho Navarro, Wen-mei Hwu, ASPLOS’10 [pdf]

Today: • Bridging runtime and language support • ‘Virtualizing GPUs’ Achieving a Single Compute Device Image in OpenCL for Multiple GPUs, Jungwon Kim, Honggyu Kim, Joo Hwan Lee, Jaejin Lee, PPoPP’11 [pdf] Supporting GPU Sharing in Cloud Environments with a Transparent Runtime Consolidation Framework, Vignesh T. Ravi et al., HPDC 2011

Today: • Bridging runtime and language support • ‘Virtualizing GPUs’ Achieving a Single Compute Device Image in OpenCL for Multiple GPUs, Jungwon Kim, Honggyu Kim, Joo Hwan Lee, Jaejin Lee, PPoPP’11 [pdf] Supporting GPU Sharing in Cloud Environments with a Transparent Runtime Consolidation Framework, Vignesh T. Ravi et al., HPDC 2011  best paper!

Context: clouds shift to support HPC applications • initiallytightly coupled applications not suited for could applications • today • Chinese – cloud with 40Gbps infiniband • Amazaon HPC instance • GPU instances: Amazon, Nimbix • Challenge: make GPUs shared resources in the could.

Challenge: make GPUs a shared resource in the could. • Why do this? • GPUs are costly resources • Multiple VMs on a node with a single GPU • Increase utilization • app level: some apps might not use GPUs much; • kernel level: some kernels can be collocatd

Two streams • How? • Evaluate … • opportunities • gains • overheads

1. The ‘How?’ • Preamble: Concurrent kernels are supported by today’s GPUs • Each kernel can execute a different task • Tasks can be mapped to different streaming multiprocessors (using thread-block configuration) • Problem: concurrent execution limited to the set of kernels invoked within a single processor context • Past virtualization solutions • API rerouting / intercept library

1. The ‘How?’ • Preamble: Concurrent kernels are supported by today’s GPUs • Each kernel can execute a different task • Tasks can be mapped to different streaming multiprocessors (using thread-block configuration) • Problem: concurrent execution limited to the set of kernels invoked within a single processor context

1. The ‘How?’ • Architecture

2. Evaluation – The opportunity • The opportunity • Key assumption: Under-utilization of GPUs • Space-sharing • Kernels occupy different SP • Time-sharing • Kernels time-share same SP (benefit form harware support form context switces) • Note: is it not always possible

2. Evaluation – The opportunity • The opportunity • Key assumption: Under-utilization of GPUs • Sharing • Space-sharing • Kernels occupy different SP • Time-sharing • Kernels time-share same SP (benefit form harware support form context switces) • Note: resource conflicts may prevent this • Molding – change kernel configuration (different number of thread blocks / threads per block) to improve collocation

2. Evaluation – The gains

2. Evaluation – The overheads

Discussion • Limitations • Hardware support

OpenCL vs. CUDA • http://ft.ornl.gov/doku/shoc/level1 • http://ft.ornl.gov/pubs-archive/shoc.pdf

Last time: Runtime infrastructure for hybrid (GPU-based) platforms Task scheduling

Last time: Runtime infrastructure for hybrid (GPU-based) platforms Task scheduling

Presentation Transcript

HYBRID CAR TECHNOLOGY AND EMERGENCY PROCEDURES

control-m

Task Analysis

Assertions

Chapter 8: Hybrid Technology and Multichip Modules

OPERATIONS SCHEDULING AND SEQUENCING

Information Infrastructure in SMMEs:

ECOWAS’s Infrastructure: A Regional Perspective

Presented at Dept of CS, IUPUI, April 15, 2011

Operations Scheduling

Project Scheduling A. A. Elimam

Lecture 3

Temporal Plan Execution: Dynamic Scheduling and Simple Temporal Networks

Chapter 6: CPU Scheduling

Chapter 6: CPU Scheduling

Hybrid Soft Computing: Where Are We Going?

Introduction to uC/OS-II

Introduction to uC/OS-II

Chapter 5: CPU Scheduling

Introduction to uC/OS-II

Dezső Sima 20 11 November