1 / 17

Team Programming Project

Team Programming Project. Byunghyun (Byung) Jang Ph.D student Northeastern University Jul. 26 2009. CRA-W/CDC Careers in High Performance Systems (CHiPS) Mentoring Workshop July 25-27 2009 National Center for Supercomputing Applications (NCSA) at

Télécharger la présentation

Team Programming Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Team Programming Project Byunghyun (Byung) Jang Ph.D student Northeastern University Jul. 26 2009 CRA-W/CDC Careers in High Performance Systems (CHiPS) Mentoring Workshop July 25-27 2009 National Center for Supercomputing Applications (NCSA) at University of Illinois at Urbana-Champaign (UIUC)

  2. Some words about me • 4th year Ph.D student • Born and raised in South Korea • 34 years old (never too late to learn) • B.S. in mechanical engineering and M.S. in computer science • Full time engineer at Samsung Electronics for 3 years • GPGPU • Internship at AMD and fellowship from AMD • Happy

  3. Goals • Understand General Purpose Computing on GPU (a.k.a. GPGPU) • Experience CUDA GPU programming • Understand how massively multi-threaded parallel programming works • Think about solving a problem in a parallel fashion • Experience the tremendous computational power of GPU • Experience the challenges in efficient parallel programming

  4. Outlines • Application 1: Image Rotation • Introduction and Design (15 min) • Preparation (5 min) • Installing a skeleton code, compile test, image view test • Hands-on Programming (30 min) • Replace ??? with your own CUDA code • Application 2: Histogram • Introduction and Design (15 min) • Preparation (5 min) • Installing a skeleton code, compile test • Hands-on Programming (40 min) • Replace ??? with your own CUDA code • Conclusion

  5. Application 1: Image Rotation- Introduction - • Rotate an image by a given angle • A basic feature in image processing applications Original Input Image Rotated Output Image

  6. Application 1: Image Rotation- Introduction - • What the application does: Step 1. Compute a new location according to the rotation angle (trigonometric computation) Step 2. Read the pixel value of original location Step 3. Write the pixel value to the new location computed at Step 1 • Create the same number of threads as the number of pixels • Each thread takes care of moving one pixel • Our goals are • To understand how to use GPU for data parallelism • To know how to map threads to data

  7. Application 1: Image Rotation- Design - 512 Treads Mapping 8 Thread Block (0, 0) Thread Block (0, 1) Thread Block (0, 63) 8 512 Thread Block (63, 0) Thread Block (63, 63)

  8. Application 1: Image Rotation- Preparation - 1. Deploy the skeleton code in the proper directory [..@ac ~]$ cp /tmp/projects.tar ./ [..@ac ~]$ cp /tmp/cuda.pdf ./ [..@ac ~]$ tar -xf projects.tar 2. Request a cluster node for interactive use for 2 hours [..@ac ~]$ qsub -I -l walltime=02:00:00 3. Compile [..@ac ~]$ cd PROJECTS/projects/ImageRotation [..@ac ~]$ make clean [..@ac ~]$ make To use printf() to debug, use “make emu=1” instead of “make” 4. Execute [..@ac ~]$ ./ImageRotation 5. Convert image from “pgm” to “jpg” format [..@ac ~]$ convert data/lena_out.pgmdata/lena_out.jpg 6. Download “lena_out.jpg” to your laptop to view it Download for your future reference

  9. Application 1: Image Rotation- Hands-on Programming - • Replace ??? in the skeleton code with your own CUDA code • Refer to the hints and comments in skeleton code • Talk to me if you have any questions or are done • Try to finish by 2:30 pm • Help others if you finish early

  10. Application 2: Histogram- Introduction - • Shows the frequency of occurrence of the intensity value of each pixel • A commonly used analysis tool in image processing and data mining applications y-axis: Number of Pixels 0 (black) x-axis: Intensity 255 (white) Input Image Output Histogram

  11. Application 2: Histogram- Introduction - • Serial implementation looks like • Access to data[] is sequential but access to histogram[] is random depending on the value, therefore, • We will use a fast shared memory to store per-block sub-histogram (s_hist[]) because shared memory handles random memory access much more efficiently than global memory does data[DATA_COUNT]; // input data histogram[BIN_COUNT]; // histogram data for (int i=0; i < BIN_COUNT; i++) histogram[i] = 0; // initialization for (int i=0; i < DATA_COUNT; i++) histogram[ data[i] ]++; // updating corresponding bin

  12. 64 data elements 64 data elements 64 data elements 64 data elements Application 2: Histogram- Design - • The structure of shared memory would look like the follow • Notice that shared memory is per thread block and limited data[DATA_COUNT] Shared Memory s_hist[]

  13. Application 2: Histogram- Design - • Merging per-thread histogram into per-block histogram THREAD_N = 192 Shared Memory s_hist[] per block BIN_COUNT = 64 BIN_COUNT BIN_COUNT d_result[] # of thread blocks final histogram

  14. Application 1: Image Rotation- Preparation - 1. Compile [..@ac ~]$ cd PROJECTS/projects/Histogram [..@ac ~]$ make clean [..@ac ~]$ make To use printf() to debug, use “make emu=1” instead of “make” 2. Execute [..@ac ~]$ ./Histogram 4. Check output message “*** TEST FAILED”: something wrong “*** TEST PASSED”: you got it

  15. Application 1: Histogram- Hands-on Programming - • Replace ??? in the skeleton code with your own CUDA code • Refer to the hints and comments in skeleton code • Talk to me if you have any questions or are done • Try to finish by 3:30 pm • Help others if you finish early

  16. Conclusions • What we’ve learned throughout the two projects • Understood a massive parallel computing on GPU • Experienced what CUDA programming looks like • Understood how to explicitly program hardware resources • Understood the importance and challenges in parallel programming • Experienced solving problem in massively parallel fashion • GPU is the platform of choice for data-parallel computationally- intensive applications • In a few years, we are likely to see many people buying a new graphics card to increase the desktop’s computing performance, not to increase 3D game performance

  17. Thank you!

More Related