Understanding CUDA Programming Basics for Efficient Execution
10 likes | 112 Vues
This lecture review covers CUDA programming execution model, basic structure, memory management with cudaMalloc, cudaMemcpy, and cudaFree. It explains __global__ functions, kernel launch configuration using , thread organization, and ways to find thread IDs and numbers. It delves into Dim3 thread organization variables like threadIdx, blockIdx, gridDim, and blockDim for computing global IDs effectively.
Understanding CUDA Programming Basics for Efficient Execution
E N D
Presentation Transcript
Lecture 19 review • CUDA programming execution model • CUDA program basic structure • cudaMalloc, cudaMemcpy, cudaFree • __global__, myKernel<<<grid, block>>>(arg,…) • CUDA thread organization • How to find my id, and number of threads? • Dim3 threadIdx, blockIdx, gridDim, blockDim • How to compute global id from these variables?