Team Members: Tyler Drake Robert Wrisley Kyle Von Koepping Justin Walsh

Team Members: Tyler Drake Robert Wrisley Kyle Von Koepping Justin Walsh Faculty Advisors: Computer Science – Prof. Sanjay Rajopadhye Electrical & Computer Engineering – Prof. Olivera Notaros

Project Goals: To develop parallel versions of applications that will run on a graphics card and measure the performance. • Started with a simple Matrix Multiply program. • We intend to develop at least one or two additional applications and also to pursue an analysis of hardware optimizations. • Develop a process for tuning applications & hardware that other developers can use more easily.

Tyler Drake – Computer Science major • Robert Wrisley – Computer Science/Computer Engineering dual major • Kyle Von Koepping – Electrical Engineering major • Justin Walsh – Computer Science/Computer Engineering dual major • Shared coding responsibilities • Enables comparison and greater understanding for all team members • Possibly divide responsibilities for the second half of the project

Moore’s Law • Transistor densities on single-core processors were doubling approximately every 18 months. • This trend has remained valid since first observed in 1965 and is expected to hold for several more years. • This natural trend had become the standard goal for hardware companies.

Limits of Moore’s Law • There is an ultimate limit to Moore’s law. • Transistors will soon reach sizes of atomic level. • Moore’s law does not apply to Random Access Memory (RAM) speeds and hard drive seek times. (AKA Memory Wall) • Redesign of processor architecture isn’t driven directly by Moore’s Law, but by the fact that these and other factors have not kept up with this growth rate.

The Graphics Card • CPU or multiple CPU’s are not the only processors found on a personal computer • The graphics card has a graphics processing unit (GPU). • The GPU is specifically designed to render 3D models onto a 2D display • Designed for floating point computation with a highly parallel architecture.

CUDA • Engineers have begun to exploit the highly parallel architecture of the GPU for general applications. • Graphics companies encourage general purpose computing on the GPU (GPGPU). • Nvidia has developed CUDA (Compute Unified Device Architecture). • Based on the C language programmers can easily shift to developing on the GPU

What We Have Done So Far

What Have We Been Doing? • Learning about CUDA • NVIDIA CUDA guides • Lecture slides from University of Illinois, Urbana-Champaign • Papers from various academic groups • University of Illinois, Urbana-Champaign • Tokyo Institute of Technology • University of California at Berkeley • Learning to write parallel programs in CS475 using MPI & OpenMP • Writing simple programs using CUDA and observing performance • Matrix Multiply

Results and Optimizations • Results • Achieved 131 Gigaflops/sec on a GTX280 with N = 1024. GTX 280 peak is 933 Gigaflops/sec. • Optimizations • Tiling the result matrix into smaller sub-matrices and having each thread block compute a sub-matrix will reduce amount of data needed to be loaded by each thread block. • This helps to reduce memory latency.

Significant Lessons Learned and Other Useful Notes • Memory • Must allocate memory on the graphics card from the main program being run on the CPU • Memory for the graphics card is explicitly managed by the programmer • An “extension” to C, not a separate language • Similar to MPI, OpenMP, etc.

Where is our project headed? Increasing problem complexity • Some are no longer “Pleasantly Parallel” • Higher degree of kernel analysis • Moving to more dynamic programs

Additional programs being written for the GPU include: • Scan: Matrix computation where the ith index is the sum of the previous i-1 indices! • Knapsack: profit maximization given a capacity and list of items with their weight & profit • Matrix Multiply for still larger matrices • Triangular Matrix Multiplication

Potential Applications Mandelbrot Set • Pleasantly parallel, familiar • Easily scalable

Potential Applications Ray Tracing • Very computationally intensive • Feasible for non-realtime computations • Very dynamic, due to recursion • High degree of realism

Potential Applications Examples of images generated by Ray Tracing

Potential Applications Hidden Markov Models • Clear parallelism • Wide range of applications

Potential Applications Uses of Hidden Markov Models

To develop a more complex application for the GPU and optimize the performance • To analyze hardware optimizations and evaluate the performance gains • Develop a process for future programmers that will give them the best performance increases with the minimum development effort • Please Note: These goals are tentative and subject to change.

Moore’s Law now being applied to processors per core instead of transistors per processor. • Multi-core machines offer the next generation of performance enhancements… but they are already here! • GPUs provide massively parallel architectures that programmers can take advantage of to see phenomenal performance gains.

Learning to use the CUDA library and some of the nuances. Have gotten good performance on Matrix-Multiply attempts. Also completing CUDA versions of Scan and Knapsack problems. Move on to a more complex application. Researching hardware optimizations that can further enhance performance on GPUs. Develop a combined approach for future applications programmers to follow.

$50 spent for a graphics card that is CUDA compatible. • We’d like to thank Prof. Dan Connors for the use of his machines with Nvidia GTX280 graphics cards. • This provided us free access to a consistent build for all of us to run our code and sample code on. • We don’t project any major costs next semester, except perhaps for some materials for our E-Days presentation.

Team Members: Tyler Drake Robert Wrisley Kyle Von Koepping Justin Walsh

Team Members: Tyler Drake Robert Wrisley Kyle Von Koepping Justin Walsh

Presentation Transcript

Automated X-ray Time Detection

Non-binary constraints

Skua Field Development Plan Team 2.

Liquid-Liquid Extraction

SAE Formula Electric ECE Team #3/ME Team #21

Commercial Aviation Safety Team (CAST) Overview Kyle L. Olsen Asian Regional Aviation Safety Team 19 – 20 November 200

Harbor Jeopardy Conference Members Divide Into Three Teams

Using RTI for LD Eligibility: We Are All Members of the Assessment Team Oregon RTI Project

To A Mouse By: Robert Burns

Potentiostat

Tyler Perry

EPICS-Habitat for humanity (HFH)

Robert La Salle

CSC 4591 – MIPS Team

Bienvenue

Museum Entrance

Land Navigation for Ground Team Leaders

ECE 477 Design Review Team 1 Fall 2011

CEOS WGISS 24 - IDN Task Team

Dr. Justin D. Mansell, Steve Coy, Liyang Xu, Anthony Seward, and Robert Praus

Expanded “Cookbook” Instructions for the Teradyne Integra J750 Test System