End Semester Project for course Parallel Computing

“Cellular Automata using nVIDIA CUDA” and “Bridging the Gap between MPJExpress and CUDA” End Semester Project for course Parallel Computing Team members: BibrakQamar NUST-2007-BIT9-105 JahanzaibMaqbool NUST-2007-BIT9-118 BilawalSarwar NUST-2007-BIT9-11 Muhammad Imran NUST-2007-BIT9-127 MahreenNadeem NUST-2007-BIT9-

System Specification • Name CUDA-TESTBED • Processor: Intel(R) Xeon(R) CPU W3520 @2.67GHz, Core2 Quad • Physical Threads per core = 2 • Cores = 4 • GPU : 2 NVIDIA GTX 285 • Memory = 8 GB

NVIDIA GTX 285 • GPU Engine Specs: • CUDA Cores : 240 • Graphics Clock : 648 MHz “The shader clock” • Processor Clock :1476 MHz “Hot clock” • Memory Specs: • Memory Clock :1242 MHz • Standard Memory : 1GB GDDR3 • Memory Interface Width : 512-bit • Memory Bandwidth : 159.0 GB/sec

Implementation • Game of Life on CUDA • Fish and Shark on CUDA • Matrix Multiplication on GPU Accelerated Cluster using MPJExpress

Cellular AutomataFish and Shark Execution Flow • Initialize device • Allocate Device and Host side memory • Populate cells • Copy From Host to Device • Loop in Display() • Draw cells • Execute Kernel • Copy result back to Host • End loop • Free memory • End program

Kernel function • Get ThreadID.X and ThreadID.Y • Fetch neighbors' • Decide Fate • Write result to resultant Cellular board

Execution Graph Max Global Memory Throughput we achieved was = 95 GB/s

Height PlotFish and Shark1300 generations with display

Speedup against sequential CPU version Average Speedup = 878.91 X

Matrix Multiplication on GPU Accelerated Cluster using MPJExpress • Algorithm • Use MPJExpress to distribute Data. • Call cudaMatMultiply function • Allocate device memory • Execute Kernel • Copy results back • Gather results at root www.Jcuda.org We have used JCUDA, Java binding for NVIDIA CUDA

End Semester Project for course Parallel Computing

End Semester Project for course Parallel Computing

Presentation Transcript

Parallel Computing

Parallel Computing

Parallel Computing Explained Parallel Computing Overview

Parallel Computing Project (OPENMP using LINUX for Parallel application)

Parallel Computing

Performance Technology for Productive, High-End Parallel Computing

Parallel Computing

Performance Technology for Productive, High-End Parallel Computing

Parallel computing

Parallel Computing

Parallel Computing

Performance Technology for Productive, High-End Parallel Computing

Parallel Computing

Parallel Computing

Centre for Parallel Computing

Parallel Computing

Parallel Computing

Parallel Computing

Parallel computing

Project Planning for your End-of-Course FINAL PROJECT

Parallel Computing

Performance Technology for Productive, High-End Parallel Computing