100 likes | 445 Vues
An Execution Model for Heterogeneous Multicore Architectures. Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili Computer Architecture and Systems Laboratory Center for Experimental Research in Computer Systems School of Electrical and Computer Engineering Georgia Institute of Technology.
E N D
An Execution Model for Heterogeneous Multicore Architectures Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili Computer Architecture and Systems Laboratory Center for Experimental Research in Computer Systems School of Electrical and Computer Engineering Georgia Institute of Technology
Software Challenges of Heterogeneity • Programming Model • Execution Model • Portability • Performance
System Space Single GPU Multicore CPU Multi GPU Multicore CPU Multi-node Level of Abstraction Runtime Execution Model (Harmony) Runtime Translation of Data-Parallel IR (Ocelot) System Size and Configuration
Scalable Portable Execution – Harmony Runtime Cap Model 3 readInputs(); computeInvariants(); for all chunks { simulateChunk(); } generateResults(); Memory Inputs Outputs Inputs Outputs kernel chunk chunk Transparent scheduling, execution management of chunks kernel Harmony Run-time CPU CPU CPU ACC ACC ACC FIFO FIFO FIFO Local Memory Local Memory Local Memory Cache Cache Cache DMA DMA DMA Binary compatibility across system sizes Network (e.g., Hypertransport, QPI, PCIe) • Minimize/avoid retuning and porting applications as you add accelerators • Advanced optimizations • Speculation, performance prediction, kernel fusion
Emerging Environment Datalog CUDA/OpenCL Language Front End Language Front End • Status: • Summer 2009 • With Prof. Nate Clark Kernel IR • Status: • Single node/multi-GPU Run Time (Harmony) Ocelot Emulator LLVM I/F • Status: • Test and Debug • Status: • In progress (Fall 2009) CUDAJIT Prof. H. Kim GPGPU Simulator Supported ISAs (MIPS, SPARC, x86, etc.)
Emerging HVM Platform Architecture With K. Schwan and A. Gavrilovska
Problem Scaling – Risk Analysis Application Measured execution times GPU interactive overhead dominates With latest CPUs (2x faster) and GPUs(4x faster), GPU advantage should grow by 2x
GPU Compilation Flow Abstract Syntax Tree (Datalog Clauses) Clauses to Execution Units Execution Group P GPU (EU) GPU (EU) GPU (EU) P Predicates to Data Structures Execution Units to Algorithms (Kernels) Data Structures Compute Kernels Runtime Mapping of Kernels to Cores Runtime GPU Core CPU Core