1 / 57

On Dynamic Load Balancing on Graphics Processors

On Dynamic Load Balancing on Graphics Processors. Daniel Cederman and Philippas Tsigas Chalmers University of Technology. Overview. Motivation Methods Experimental evaluation Conclusion. The problem setting. Work. Offline. Task. Task. Task. Task. Task. Task. Task. Online. Task.

andrew
Télécharger la présentation

On Dynamic Load Balancing on Graphics Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Dynamic Load Balancing on Graphics Processors Daniel Cederman and Philippas Tsigas Chalmers University of Technology

  2. Overview • Motivation • Methods • Experimental evaluation • Conclusion

  3. The problem setting Work Offline Task Task Task Task Task Task Task Online Task Task Task Task

  4. Static Load Balancing Processor Processor Processor Processor

  5. Static Load Balancing Processor Processor Processor Processor Task Task Task Task

  6. Static Load Balancing Processor Processor Processor Processor Task Task Task Task

  7. Static Load Balancing Processor Processor Processor Processor Task Task Task Task Subtask Subtask Subtask Subtask

  8. Static Load Balancing Processor Processor Processor Processor Task Task Task Task Subtask Subtask Subtask Subtask

  9. Dynamic Load Balancing Processor Processor Processor Processor Task Task Task Task Subtask Subtask Subtask Subtask

  10. Task sharing Check condition Work done? Done Acquire Task Task Set Try to get task Task Got task? No, retry Task Task Perform task Task No, continue New tasks? Add Task Task Add task

  11. System Model • CUDA • Global Memory • Gather and scatter • Compare-And-Swap • Fetch-And-Inc • Multiprocessors • Maximum number ofconcurrent thread blocks Global Memory Multi-processor Multi-processor Multi-processor Thread Block Thread Block Thread Block Thread Block Thread Block Thread Block Thread Block Thread Block Thread Block

  12. Synchronization • Blocking • Uses mutual exclusion to only allow one process at a time to access the object. • Lockfree • Multiple processes can access the object concurrently. At least one operation in a set of concurrent operations finishes in a finite number of its own steps. • Waitfree • Multiple processes can access the object concurrently. Every operation finishes in a finite number of its own steps.

  13. Load Balancing Methods • Blocking Task Queue • Non-blocking Task Queue • Task Stealing • Static Task List

  14. Blocking queue Free TB 1 Head TB 2 Tail TB n

  15. Blocking queue Free TB 1 Head TB 2 Tail TB n

  16. Blocking queue Free TB 1 Head TB 2 T1 Tail TB n

  17. Blocking queue Free TB 1 Head TB 2 T1 Tail TB n

  18. Blocking queue Free TB 1 Head TB 2 T1 Tail TB n

  19. Non-blocking Queue TB 1 TB 1 Head TB 2 TB 2 T1 T2 T3 T4 Tail TB n • Reference P. Tsigas and Y. Zhang, A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems[SPAA01]

  20. Non-blocking Queue TB 1 TB 1 Head TB 2 TB 2 T1 T2 T3 T4 Tail TB n

  21. Non-blocking Queue TB 1 TB 1 Head TB 2 TB 2 T1 T2 T3 T4 Tail TB n

  22. Non-blocking Queue TB 1 TB 1 Head TB 2 TB 2 T1 T2 T3 T4 Tail TB n

  23. Non-blocking Queue TB 1 TB 1 Head TB 2 TB 2 T1 T2 T3 T4 T5 Tail TB n

  24. Non-blocking Queue TB 1 TB 1 Head TB 2 TB 2 T1 T2 T3 T4 T5 Tail TB n

  25. Task stealing T1 TB 1 TB 2 T3 T2 TB n • Reference Arora N. S., Blumofe R. D., Plaxton C. G. , Thread Scheduling for Multiprogrammed Multiprocessors [SPAA 98]

  26. Task stealing T1 T4 TB 1 TB 2 T3 T2 TB n

  27. Task stealing T1 T4 T5 TB 1 TB 2 T3 T2 TB n

  28. Task stealing T1 T4 TB 1 TB 2 T3 T2 TB n

  29. Task stealing T1 TB 1 TB 2 T3 T2 TB n

  30. Task stealing TB 1 TB 2 T3 T2 TB n

  31. Task stealing TB 1 TB 2 T2 TB n

  32. Static Task List In T1 T2 T3 T4

  33. Static Task List In T1 TB 1 T2 TB 2 T3 TB 3 T4 TB 4

  34. Static Task List In Out T1 TB 1 T2 TB 2 T3 TB 3 T4 TB 4

  35. Static Task List In Out T1 TB 1 T5 T2 TB 2 T3 TB 3 T4 TB 4

  36. Static Task List In Out T1 TB 1 T5 T2 TB 2 T6 T3 TB 3 T4 TB 4

  37. Static Task List In Out T1 TB 1 T5 T2 TB 2 T6 T3 TB 3 T7 T4 TB 4

  38. Octree Partitioning • Bandwidth bound

  39. Octree Partitioning • Bandwidth bound

  40. Octree Partitioning • Bandwidth bound

  41. Octree Partitioning • Bandwidth bound

  42. Four-in-a-row • Computation intensive

  43. Graphics Processors 8800GT 9600GT 8 Multiprocessors 57 GB/sec bandwidth • 14 Multiprocessors • 57 GB/sec bandwidth

  44. Blocking Queue – Octree/9600GT

  45. Blocking Queue – Octree/8800GT

  46. Blocking Queue – Four-in-a-row

  47. Non-blocking Queue – Octree/9600GT

  48. Non-blocking Queue – Octree/8800GT

  49. Non-blocking Queue - Four-in-a-row

  50. Task stealing – Octree/9600GT

More Related