350 likes | 470 Vues
The Era of Many-Module SoC : Revisiting the NoC Mapping Problem. Technion – Israel Institute of Technology. Isask’har ( Zigi ) Walter, Israel Cidon , Avinoam Kolodny , Daniel Sigalov. December, 2009. SoC Revolution. PE1. PE2. PE3. R. R. R. PE1. PE2. PE3. R. R. R. PE4.
E N D
The Era of Many-Module SoC:Revisiting the NoC Mapping Problem Technion – Israel Institute of Technology Isask’har (Zigi) Walter, Israel Cidon, AvinoamKolodny, Daniel Sigalov December, 2009
SoC Revolution PE1 PE2 PE3 R R R PE1 PE2 PE3 R R R PE4 PE5 PE6 PE4 PE5 PE6 Bus-based system NoC-based system
SoC Evolution R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R
Processor Evolution Single Core CPU Dual Core CPU1 CPU2 Cache Quad Core CPU1 CPU2 Cache Cache Cache Cache CPU3 CPU4 Cache Cache
The Era of Many-Module SoC • How would such chips be like? • Most likely • Power still important • Highly parallel • IP reuse • Ease of design and verification • High Certainty • Large number of modules • NoC Interconnect • Applications • Totally unknown R
Future SoCs - Observation#1 • Special purpose cores replace general purpose processors • Power considerations Task 2 Task 3 Task 2 Task 3 Task 1 Task 4 Task 5 Task 1 Task 4 Task 5 MEM MEM MEM MEM DSP GeneralPurposeCPU Pre. Proc. CPU GPU Memory Memory • Processing pipes are getting longer
Future SoCs - Observation#2 ? R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R • Large diversity • All modules are unique • Highly regular • Classes of Replicated cores • standard modules (DSP, HW accelerators, Cache banks, etc.)
The Era of Many-Module SoC • Increased use of specialized cores • Pipes are getting longer • Replication of processing elements • How is the design flow affected? • This work – mapping of the NoC Observation#1 Observation#2
Outline • The Era of Many Module SoC • Revisiting the Mapping Problem • Cross-Entropy Optimization • Evaluation
NoC Mapping • Given • Traffic pattern(s) • a set (or sets) of pair-wise bandwidth requirements and timing constraints • Routing • Topology • Goal • Find efficient mapping of cores to tiles PE1 PE2 PE3 PE5 PE2 PE8 PE8 PE5 PE7 PE9 PE2 PE6 PE3 PE2 PE1 PE4 PE5 PE6 PE7 PE1 PE6 PE4 PE9 PE1 PE3 PE1 PE8 PE8 PE9 PE6 PE7 PE8 PE9 PE3 PE4 PE9 PE2 PE3 PE6 PE4 PE7 PE5 PE5 PE4 PE7
Mapping Optimization • An important design step • Mapping affects power and performance! • A difficult problem! • Often heuristic algorithms are used • Common optimization goals • Minimize (dynamic) power • Minimize power + maximize performance • Minimize power subject to performance constraints
Modeling • Typical modeling • Power and latency proportional to distance • Cost function:
Calculating Mapping Cost PE1 PE2 PE3 30 100 PE4 PE5 PE6 Mapping π1 Mapping π2
Motivation - Example #1 PE1 PE2 PE3 PE4 1 1 1 1 1 1 1 MEM1 MEM2 PE1 MEM2 PE4 PE2 PE3 MEM1 Optimal mapping (π1):
Motivation - Example #1 (cont.) • Let the mapping algorithm assign the flows! PE1 PE2 PE3 PE4 2*MEM • Optimal mapping (π2): PE1 PE2 MEM2 PE1 MEM1 PE4 MEM1 PE3 PE4 PE2 PE3 MEM2 Cost(π2)=7 Cost(π1)=9
Motivation - Example #1 (cont.) PE1 PE2 PE3 PE4 1 1 1 1 1 1 1 MEM1 MEM2 PE1 PE2 MEM2 MEM1 PE3 PE4 Cost(π2)=7 • The mapping algorithm should be aware of replicated modules!
Classic Performance Constraints • Pair-wise point-to-point requirements • For example, in a 4-module system:
Motivation - Example #2 Stream 1 PE1 PE2 PE3 Stream 2 PE4
Example #2 – Pair-wise req. PE1 PE2 PE3 Req=2 Req=1 Req=1 Req=1 PE4 PE2. PE1 PE3 PE1 PE3 PE1 PE3 PE4. PE2. PE4. PE4 PE2. No feasible mapping!
Application-Level Requirements Stream 1 PE1 PE2 PE3 Req=2 Req=1 Req=1 Req=1 Stream 2 PE4 PE1 PE2 • A feasible mapping does exist! PE3 PE4 • It’s better to work with the application level requirements
This Work • Find efficient mappings by extending the formulation of the mapping problem • Adding degrees of freedom • Degree of freedom #1 • Leverage existence of replicated modules • Degree of freedom #2 • Replace p2p constraints with end-to-end, application-level requirements
Modifying the Formulation (1) • Leverage existence of replicated modules • Allow the mapping algorithm to allocate flows to the best replicated module
Modifying the Formulation (2) • Replace p2p constraints with end-to-end, application-level requirements P2P timing req. E2E timing req. • In this paper, for synthetic task graphs • Did so for a real application too
Outline • The Era of Many Module SoC • Revisiting the Mapping Problem • Cross-Entropy Optimization • Evaluation
Cross Entropy Optimization • Modern optimization heuristic • Good at combinatorial optimization problems • Akin to evolutionary algorithms • Generation of new solutions is based on sampling and estimation • Inherently a global search method • Reduced risk of getting trapped in a local minimum
Cross Entropy Optimization • Given an initial parameter vector v=v0, sample a random population of K solutions x1,x2,…,xk from the distribution given by f(x;v). • Evaluate the costs S(xi),i=1,…,K. • Using the ρK (0<ρ<1) elite (lowest cost) samples, obtain a new density function f(x;v) by calculating a new vector v via Maximum Likelihood (ML) estimation. • Repeat steps 1-3 with the new vector v unless maximum number of iterations is reached or no improvement is obtained for a predefined number of iterations. • For example: • Generate 10 random mappings: π1, π2, …, π10 • Find 3 lowest cost mappings: π2, π5, π7 • Examine those 3 best mappings: • For each tile, calculate the probability core PEi is mapped to that tile • Update probabilities accordingly
CE Example Prob(TileAPE1)=Prob(TileAPE2)= Prob(TileAPE3)=Prob(TileAPE4)=0.25 Prob(TileBPE1)=Prob(TileBPE2)= Prob(TileBPE3)=Prob(TileBPE4)=0.25 Prob(TileCPE1)=Prob(TileCPE2)= Prob(TileCPE3)=Prob(TileCPE4)=0.25 Prob(TileDPE1)=Prob(TileDPE2)= Prob(TileDPE3)=Prob(TileDPE4)=0.25 TileA TileB PE1 PE1 PE1 PE2 PE1 PE4 PE3 PE2 PE4 PE4 PE4 PE2 PE1 PE1 PE2 PE3 PE2 PE1 PE1 PE3 π2 π1 π3 π4 π5 PE4 PE3 PE2 PE1 PE4 PE3 PE1 PE3 PE3 PE3 PE4 PE4 PE3 PE2 PE2 PE2 PE2 PE4 PE3 PE4 TileC TileD π6 π7 π8 π9 π10
Updating Probabilities π2 π5 π3 • Prob(TileAPE1)=1 TileA TileB PE1 PE2 PE1 PE2 TileC TileD PE3 PE4 PE4 PE3 PE1 PE4 • Prob(TileBPE2)=2/3 • Prob(TileBPE4)=1/3 PE3 PE2 • Prob(TileDPE2)=1/3 • Prob(TileDPE3)=1/3 • Prob(TileDPE4)=1/3 • Prob(TileCPE3)=2/3 • Prob(TileCPE4)=1/3 • Following iteration uses these updates probabilities • Gradually, probabilities converge to 0/1
Outline • The Era of Many Module SoC • Revisiting the Mapping Problem • Cross-Entropy Optimization • Evaluation
Evaluation • Scenario • 6x6 mesh NoC • Synthetic, randomized SoC • Task graphs (and task-to-core mapping) • Varying number of replicated modules • Varying timing constraints • (Real application in DATE10 paper) • Compare with best cost of classic mapping • Averaging multiple runs
Accounting for Replication • “Class”: a group of identical PEs • Total number of replicated cores= {Number of classes}*{class size} Cost Reduction [%] 10 20 30 40 50 Total number of Replicated Modules [%]
Application-Level Requirements • SoCs with a pipeline data path and background P2P traffic • Varying pipeline slack • Different amounts of background constraints Cost Reduction [%] Pipeline Slack
Conclusions and Future Work • We are going into the era of “Many module SoC” • Extend the mapping to account for • Classes of replicated modules • Application-level requirements • Meaningful power savings • But mapping is an example • Routing? Task assignment? Link design? Topology selection?
The Era of Many-Module SoC QNoC Research Group Thank you! Questions? zigi@tx.technion.ac.il QNoC Research Group