1 / 20

Compiler-Directed Power Density Reduction in NoC-Based Multi-Core Designs

International Symposium on Quality Electronic Design 03/27-29, 2006, San Jose. Compiler-Directed Power Density Reduction in NoC-Based Multi-Core Designs. Sri Hari Krishna Narayanan, Mahmut Kandemir, Ozcan Ozturk Embedded Mobile Computing Center (EMC 2 ) The Pennsylvania State University.

still
Télécharger la présentation

Compiler-Directed Power Density Reduction in NoC-Based Multi-Core Designs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. International Symposium on Quality Electronic Design 03/27-29, 2006, San Jose Compiler-Directed Power Density Reduction in NoC-Based Multi-Core Designs Sri Hari Krishna Narayanan, Mahmut Kandemir, Ozcan Ozturk Embedded Mobile Computing Center (EMC2) The Pennsylvania State University

  2. Introduction to the Problem • Increasing transistor counts and rising clock frequencies leads to increased power dissipation. • Increased scaling coupled with increased power dissipation has lead to increased power density. • Increased power density leads to rising thermal problems which requires solutions.

  3. Solutions to Thermal Issues in multiprocessor environments • Dynamic Thermal Management • Heo et al. ISLPED2003 • Activity Migration between two processors. • Shang et al. Micro 2003 • Communication is routed away from a potential hotspot. • Upon a thermal emergency communication is throttled.

  4. Problems with the current solutions • Repeated suspension of execution or communication leads to performance loss. • So it is beneficial to reduce the number of suspensions. • How? • Reduce the number of thermal emergencies by reducing the power density. • Reduce the density by changing which processors are active and how much computation they perform within certain bounds.

  5. Default Mapping Default (performance oriented) Mapping • Performance oriented • Active processors are close to each other. • Less communication cost. • Higher power density • More thermal emergencies. • We propose to change this mapping into a temperature aware one. Default Code Mapping Module #define N 5000 #define ITER 1int du1[N], du2[N], du3[N];int au1[N][N][2], au2[N][N][2], au3[N][N][2];int a11=1, a12=-1, a13=-1; int a21=2, a22=3, a23=-3; int a31=5, a32=-5, a33=-2; int l;/* Initialization loop */ int sig = 1;int main(){ int kx; int ky; int kz;printf("Thread:%d\n",mp_numthreads()); for(kx = 0; kx < N; kx = kx + 1) { for(ky = 0; ky < N; ky = ky + 1) { for(kz = 0; kz <= 1; kz = kz + 1) { au1[kx][ky][kz] = 1; au2[kx][ky][kz] = 1; au3[kx][ky][kz] = 1; } }} }} /* main */ Code

  6. Integer Linear Programming Model • Phase 1 • Increases the bounding box of the active processors given a communication cost limit and hence reduces the overall power density. After Phase 1 Initial

  7. Integer Linear Programming Model • Constraints * • The number of active processors remains constant • The amount of extra communication between active processors in the new mapping has to be under the sum of the old communication and the relaxation allowed. • The area of bounding box must be maximized. * Exact mathematical expressions are given in the paper.

  8. Phase1mapping Phase 1 Default (performance oriented) Mapping • Overall density is reduced • Communication cost increased Default Code Mapping Module Overall power density reduced mapping #define N 5000 #define ITER 1int du1[N], du2[N], du3[N];int au1[N][N][2], au2[N][N][2], au3[N][N][2];int a11=1, a12=-1, a13=-1; int a21=2, a22=3, a23=-3; int a31=5, a32=-5, a33=-2; int l;/* Initialization loop */ int sig = 1;int main(){ int kx; int ky; int kz;printf("Thread:%d\n",mp_numthreads()); for(kx = 0; kx < N; kx = kx + 1) { for(ky = 0; ky < N; ky = ky + 1) { for(kz = 0; kz <= 1; kz = kz + 1) { au1[kx][ky][kz] = 1; au2[kx][ky][kz] = 1; au3[kx][ky][kz] = 1; } }} }} /* main */ ILP Module Code

  9. Integer Linear Programming Model • Phase 2 • Given the reduced overall power density mapping from phase 1, a new mapping with reduced local power density is generated. After Phase 2 After Phase 1

  10. Integer Linear Programming Model • Constraints * • Each old active processor that has high power density is split. • Each split processor performs same communication as the old processor. • The area of the bounding box remains constant. • The total power spent is within the bouding box is minimized by minimizing the communication path. * Exact mathematical expressions are given in the paper.

  11. Phase 1 Default (performance oriented) Mapping Default Code Mapping Module Overall power density reduced mapping Thermal aware mapping Phase 2 #define N 5000 #define ITER 1int du1[N], du2[N], du3[N];int au1[N][N][2], au2[N][N][2], au3[N][N][2];int a11=1, a12=-1, a13=-1; int a21=2, a22=3, a23=-3; int a31=5, a32=-5, a33=-2; int l;/* Initialization loop */ int sig = 1;int main(){ int kx; int ky; int kz;printf("Thread:%d\n",mp_numthreads()); for(kx = 0; kx < N; kx = kx + 1) { for(ky = 0; ky < N; ky = ky + 1) { for(kz = 0; kz <= 1; kz = kz + 1) { au1[kx][ky][kz] = 1; au2[kx][ky][kz] = 1; au3[kx][ky][kz] = 1; } }} }} /* main */ ILP Module Code

  12. #define N 5000 #define ITER 1int du1[N], du2[N], du3[N];int au1[N][N][2], au2[N][N][2], au3[N][N][2];int a11=1, a12=-1, a13=-1; int a21=2, a22=3, a23=-3; int a31=5, a32=-5, a33=-2; int l;/* Initialization loop */ int sig = 1;int main(){ int kx; int ky; int kz;printf("Thread:%d\n",mp_numthreads()); for(kx = 0; kx < N; kx = kx + 1) { for(ky = 0; ky < N; ky = ky + 1) { for(kz = 0; kz <= 1; kz = kz + 1) { au1[kx][ky][kz] = 1; au2[kx][ky][kz] = 1; au3[kx][ky][kz] = 1; } }} }} /* main */ Profiling HotSpot + Shutdown HotSpot + Shutdown Implementation • HotSpot • Temperature estimation tool • Developed by Skadron at UVa • T(i+ ) = HS(T(i), floorplan, power,cycles,) • Shutdown • Any processor or router that is too hot • must be turned off to allow cooldown • Cycle times • Chunk sizes • Proc. Energy • Communication • Router Energy

  13. Algorithm • 1. Initially mark processors as being active • 2. While (all execution is not completed) { • 2.a Time_Taken = Time_Taken + 1 • 2.b If a processor was active • 2.b.i. Reduce the chunks that it has to execute by 1 • 2.c Calculate the new current temperature for all processors. • T(i+ ) = HS(T(i), floorplan, power,cycles,) • 2.d If a processor is too hot • 2.d.i. Mark it as inactive • 2.e If a router is too hot • 2.e.i. Mark all processors communicating though it as inactive. • 2.f Determine all the active processors and routers for the next • scheduling step. • } • 3. Return Time_Taken

  14. NoC Multi-core Model • Routers are roughly 1/5th the area of the processors • Processors communicate using x-y routing • Used to estimate the cost of communication

  15. Parameters used

  16. Benchmarks Used

  17. Results – Thermal Emergencies

  18. Results - Performance

  19. Conclusions • Dynamic thermal management leads to suspension of execution. • We propose a novel compiler directed mechanism to reduce occurrences of thermal emergencies. • By reducing the number of thermal emergencies performance is improved.

  20. Thank you!

More Related