Estimation et optimisation de la consommation d’énergie des circuits asynchrones

Estimation et optimisation de la consommation d’énergie des circuits asynchrones K. Slimani, Y. Remond, A. Sirianni, G. Sicard, L. Fesquet, M. Renaudin

Plan • Introduction • Objectifs • Flot de conception TAST • Principes • Estimation statique • Estimation dynamique • Caractéristiques de l’outil Trace • Optimisations • Exemple d’une application • Conclusion et perspectives

Introduction Objectifs : • Estimer la consommation d’un circuit au niveau HDL • Avant synthèse • Réduction du temps de conception • Faire du « Profiling » • Identifier les parties du circuit les plus activées • Optimiser • Tirer parti du « profiling » • Agir sur l’architecture et le codage des données

Le flot de conception TAST CHP Code CHPCompiler Energy estimator TAST Compiler Petri Net DTL1 Compliance Checker Simulation Model Generator Synthesizable Petri Net Behavioral Asynchronous VHDL Model µP2 Flavor Synthesizer QDI Flavor Synthesizer VHDL Custom Libraries for Simulation VHDL Gate Level Netlist Custom Cell Libraries TAST Model Generator Std Cell Libraries Reports VHDL Simulator TAST Synthesizer Standard Design Flow Back-end Tools

skip Le code CHP du multiplexeur : P0 TRUE No_label : [ ] P1 Component multiplexer Port ( cmd : in DI DR ; Input0mn : in DI MR[m][n] ; Input1mn : in DI MR[m][n] ; Outputmn : out DI MR[m][n] ) Begin process P_multiplexer Port (cmd : in DI DR ; Input0mn : in DI MR[m][n] ; Input1mn : in DI MR[m][n] ; Outputmn : out DI MR[m][n] ) begin [cmd ? G ; @[G=0 => Intput0mn ? x ; label_1 :Outputmn ! x ; break G=1 => Input1mn ? x ; label_2 : Outputmn ! x ; break ]; loop ]; end ; -- process end ; -- component TRUE cmd ? G P2 No_label : @[ ] P3 G1 G0 Input0mn ? x Input1mn ? x P5 P4 TRUE TRUE TRUE label_1 :Outputmn ! x label_2 : Outputmn ! x P7 P6 TRUE TRUE P8 No_label : break Estimation statique Représentation graphique du réseau de pétri :

Outputack G0 Input000 C CR Input0ack Input0(m-1)0 C CR C Output00 Input00(n-1) C CR Output(m-1)0 Cmdack Input0(m-1)(n-1) C CR G1 Output0(n-1) Input100 C CR Input1(m-1)0 C CR C Input1ack Input10(n-1) C CR Inputp1(m-1)(n-1) C CR Estimation statique Equations de dépendances : Circuit synthétisé : (1) (2) (3) (4) Output(m-1)(n-1)

G0 Outputack Input000 C C_R (1) Input0ack Input0(m-1)0 C C_R C Input00(n-1) Output00 C C_R (2) Output(m-1)0 Input0(m-1)(n-1) C C_R Cmd_ack G1 Output0(n-1) (3) Input100 C C_R Output(m-1)(n-1) Input1(m-1)0 C C_R C Input10(n-1) (4) Input1ack C C_R Inputp1(m-1)(n-1) C C_R Estimation statique Etape 1 : Coût1 = n*muller(2) + n*muller_R(2) + Fourche(m*n) + Fourche(G*m*n)

G0 Outputack Input000 C C_R (1) Input0ack Input0(m-1)0 C C_R C Input00(n-1) Output00 C C_R (2) Output(m-1)0 Input0(m-1)(n-1) C C_R Cmd_ack G1 Output0(n-1) (3) Input100 C C_R Output(m-1)(n-1) Input1(m-1)0 C C_R C Input10(n-1) (4) Input1ack C C_R Inputp1(m-1)(n-1) C C_R Coût1 = n*muller(2) + n*muller_R(2) + Fourche(m*n) + Fourche(G*m*n) Coût2 = n*OR(G) Estimation statique Etape 2 :

G0 Outputack Input000 C C_R (1) Input0ack Input0(m-1)0 C C_R C Input00(n-1) Output00 C C_R (2) Output(m-1)0 Input0(m-1)(n-1) C C_R Cmd_ack G1 Output0(n-1) (3) Input100 C C_R Output(m-1)(n-1) Input1(m-1)0 C C_R C Input10(n-1) (4) Input1ack C C_R Inputp1(m-1)(n-1) C C_R Coût1 = n*muller(2) + n*muller2_R + Fourche(m*n) + Fourche(G*m*n) Coût2 = n*OR(G) Coût3 = n*NOR(m) + muller(n) Estimation statique Etape 3 :

G0 Outputack Input000 C C_R (1) Input0ack Input0(m-1)0 C C_R C Input00(n-1) Output00 C C_R (2) Output(m-1)0 Input0(m-1)(n-1) C C_R Cmd_ack G1 Output0(n-1) (3) Input100 C C_R Output(m-1)(n-1) Input1(m-1)0 C C_R C Input10(n-1) (4) Input1ack C C_R Inputp1(m-1)(n-1) C C_R Coût1 = n*muller(2) + n*muller2_R + Fourche(m*n) + Fourche(G*m*n) Coût2 = n*OR(G) Coût3 = n*NOR(m) + muller(n) Coût4 = AND(G) Estimation statique Etape 4 :

Estimation statique • Coût statique total = Somme des coûts partiels • Exclusion mutuelle et additivité Coût statique = 2*(Coût1 + Coût2 + Coût3 + Coût4) Application à un protocole 4 phases

skip skip P0 P0 TRUE TRUE P1 P1 No_label : [ ] No_label : [ ] TRUE TRUE cmd ? G cmd ? G P2 P2 No_label : @[ ] No_label : @[ ] P3 P3 G0 G0 G1 G1 Input1mn ? x Input1mn ? x Input0mn ? x Input0mn ? x P5 P5 P4 P4 TRUE TRUE TRUE TRUE TRUE TRUE label_1 :Outputmn ! x label_1 : Outputmn ! x label_2 : Outputmn ! x label_2 : Outputmn ! x P7 P7 P6 P6 Coût (label_2) Coût (label_2) Coût (label_1) Coût (label_1) TRUE TRUE TRUE TRUE P8 P8 No_label : break No_label : break Estimation dynamique • Parcours du réseau de pétri lors de la simulation • Accumulation des coûts statiques

Trace -label Nb de transitions Nb de transitions Activité Consommation d’énergie des instructions labellisées Activité des instructions labellisées Consommation d’énergie des processus 10000 10000 90 75 5000 5000 50 1500 1500 processus 0 0 labels labels process_1 process_2 process_3 label_1 Label_2 Label_3 Label_4 label_1 Label_2 Label_3 Label_4 Activité Trace -process Activité des processus 90 75 50 processus process_1 process_2 process_3 Caractéristiques de l’outil Trace

Activité Trace -component Activité des composants 125 90 composants composant_1 composant_2 Nb de transitions Activité Trace -all Nb de transitions Consommation d’énergie de l’ensemble Activité de l’ensemble 11500 125 Consommation d’énergie des composants 10000 10000 11500 90 90 90 75 75 5000 5000 5000 50 50 5000 1500 1500 0 0 all all composant_2 composant_2 composant_1 composant_1 composants process_2 process_3 process_2 process_3 process_1 process_1 composant_1 composant_2 Label_1 Label_1 Label_2 Label_3 Label_4 Label_2 Label_3 Label_4 Caractéristiques de l’outil Trace

cmd cmd Input0 Probabilité cmd Output Input0 Input1 + cmd Input1 Output Input2 Input2 Input3 - Input3 MR[2][2] MR[4][1] 2 fils « commutent » 1 fil « commute » Optimisations Codage des données : Déséquilibrer les structures de choix

Input0 cmd Output Input1 cmd Input2 cmd cmd0 Input0 Input3 cmd1 Output Input1 cmd2 Input2 Input3 Optimisations Découper les canaux

Application ENVIRONMENT Component Register Component ALU Logical Process Register R0 Process Register R1 Process Label_and Label_or Arithmetic Process Register R2 Process Register R3 Process Label_add

Application Base Digit m = 2 n = 32 Nb de transitions Total = 1316 Trace -label 896 PROGRAM : R2 = R0 ANDR1 R3 = R0 OR R2 R1 = R0 AND R3 R3 = R0 AND R2 R2 = R0 AND R1 R3 = R0 AND R2 R1 = R0 AND R3 R3 = R0 ADD R1 R2 = R0 OR R2 R3 = R0 AND R3 R1 = R0 AND R1 224 196 labels Label_and Label_or Label_add Nb de transitions Trace -process Total = 12701 3223 3124 2519 2519 1120 196 processus R0 R1 R2 R3 logical arithmetic

Application Base Digit m = 4 n = 16 Nb de transitions Total = 956 Trace -label Réduction 38% 640 PROGRAM : R2 = R0 ANDR1 R3 = R0 OR R2 R1 = R0 AND R3 R3 = R0 AND R2 R2 = R0 AND R1 R3 = R0 AND R2 R1 = R0 AND R3 R3 = R0 ADD R1 R2 = R0 OR R2 R3 = R0 AND R3 R1 = R0 AND R1 160 156 labels Label_and Label_or Label_add Nb de transitions Trace -process Total = 9701 Réduction 31% 2596 2167 1991 1991 800 156 processus R0 R1 R2 R3 logical arithmetic

Conclusion et perspectives Conclusion • Un outil d’estimation de l’activité et de la consommation au niveau HDL (avant synthèse) a été développé • L’outil permet le « profiling » afin de déterminer les zones les plus actives dans le circuit • les optimisations sont guidées par l’outil et sont faites très tôt dans le flot de conception • Poursuivre le développement et étalonner l’outil • Illustrer les bénéfices de cette approche par la réalisation d’un microprocesseur asynchrone Perspectives

Merci de votre attention !

Application • Coût moyen d’un opérateur AND et OR : • en base m=2 : Coût(avg) = n*(muller(2) + 3/4*OR(3)) • en base m=4 : Coût(avg) = n*(muller(2) + 6/16*OR(3) + 9/16*OR(9)) • Coût moyen d’un opérateur ADD : • en base m=2 : Coût(avg) = n*(2*muller(2) + 2*OR(2)) • en base m=4 : Coût(avg) = n*(2*muller(2) + 3*OR(2) + 1.125*OR(3) + 0.25*OR(4)) • Coût d’une lecture d’un registre : • Coût = n*muller2 + n*AND2 + Fourche(m*n) + n*OR(Nb registre) + • n*NOR(m) + muller(n) + AND(Nb registre) + Fourche(Nb registre) • Coût d’écriture d’un registre : • Coût = n*muller2 + Fourche(m*n) + n*NOR(m) + muller(n) + • AND(Nb registre) + m*n*Fourche(Nb registre)

Estimation et optimisation de la consommation d’énergie des circuits asynchrones

Estimation et optimisation de la consommation d’énergie des circuits asynchrones

Presentation Transcript

Search Engine Optimisation

Optimisation: Getting More and Better for Less

Cost Estimation

EC1313 – LINEAR INTEGRATED CIRCUITS

Series dc Circuits

Sampling Distributions and Point Estimation of Parameters

Grounding and Jumpering

Motion Detail Preserving Optical Flow Estimation

Topic 3 Diodes and Diodes Circuits

Ch7 Operational Amplifiers and Op Amp Circuits

Hemoglobin estimation by Sahli's method ( Sahli’s haemoglobinometer )

Digital Testing: Sequential Circuits

Combinational Logic

Angle of Arrival Estimation (AOA)