260 likes | 276 Vues
Exploring Design Space for 3D Clustered Architectures. Manu Awasthi , Rajeev Balasubramonian University of Utah. Layer 2. Device Layer 2. Layer 1. Vertical Interconnect. 1. Device Layer. Silicon. Silicon. 3D Technologies. Very Small ~ 10 µm. Multiple layers of active devices
E N D
Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah
Layer 2 Device Layer 2 Layer 1 Vertical Interconnect 1 Device Layer Silicon Silicon 3D Technologies Very Small ~ 10µm • Multiple layers of active devices • Vertical interconnects between layers 3D Chip 2D Chip 1 Courtesy: K.Bernstein, IBM
Benefits of 3D • Reduction of global interconnect L L • Delay/Power reduction • Bandwidth • Mix-technology integration
Previous Proposals All are active HEAT!!! • Previously in 3D… • Break and stack (Folding) [Puttaswamy et al] • Vertical stacking of active devices Reduced Intra-block latency RegFile Break and Stack
An alternative approach? Die 1 3D Chip 2D Chip Die 0 • Prudent Stacking Can: • Improve Performance • Result in better thermal profile
Clustered Architectures • Centralized front-end • I-Cache & D-Cache • LSQ, Rename, Decode • Branch Predictor • Clustered back-end • Issue Queue • Regfile, FUs L1 D Cache Cluster Front- End Higher clock Frequency, High ILP!! Crossbar/Router
Decentralized Cache Banks L1 D Cache L1 D Cache L1 D Cache Possibly better performance
Decentralized Cache Banks L1 D Cache L1 D Cache L1 D Cache Replicated Cache Banks
Decentralized Cache Banks Odd Words Even Words L1 D Cache L1 D Cache Word Interleaved Cache Banks
Outline • Introduction • Motivation • 3D Architectures • Clustered Architectures • Proposals • Results • Conclusions
Architecture 1 Die 1 Intra Die Interconnect Inter Die Interconnect Die 0 Cache-on-cluster Cluster Cache Bank
Architecture 2 Die 1 Intra Die Interconnect Inter Die Interconnect Die 0 Cluster-on-cluster Cluster Cache Bank
Architecture 3 Die 1 Intra Die Interconnect Inter Die Interconnect Die 0 Staggered Cluster Cache Bank
Outline • Introduction • Motivation • 3D Architectures • Clustered Architectures • Proposals • Results • Conclusions
Experimental Setup • Framework • Simplescalar, Wattch and Hotspot 3.0 • Wire model : 8x global metal plane • Benchmarks • SPEC 2K, single threaded • Processor Configuration • 8 Clusters • 64 kB L1 I/D Caches, 2 way set-assoc • L1 Data cache Word-Interleaved or Replicated • 2D Centralized Cache – Base Case
Base Case Performances Best Case 2D Config
The 3D Effect 3D Replicated vs 2D Centralized
The 3D Effect 3D WI vs 2D Centralized
Comparisons Best Case 2D Best Case 3D - Rep Best Case 3D - WI 2D Case 3D Replicated 3D WI 12% Improvement for best case 3D vs best case 2D
Thermal Analysis • Wattch for power numbers • HotSpot 3.0 for thermal model (grid) • 500x500 grid resolution • Interconnect power modeling • Attributed to functional units • 8X plane wires • Router + Crossbar modeled as separate entity
Thermal Profiles Peak Temperature : Hottest on-chip Unit (Celsius)
Outline • Introduction • Motivation • 3D Architectures • Clustered Architectures • Proposals • Results • Conclusions
Conclusions • Wire delays are critical to performance • Some are more important than others. • Prudent block stacking • Performance improvement upto 12% over 2D • WI banks + Arch 3 (3D) • Better thermal profiles compared to folding
4 Cluster Arrangements Cluster Cache bank Intra-die horizontal wire Inter-die vertical wire Die 1 Die 0 (a) Arch-1 (cache-on-cluster) (b) Arch-2 (cluster on cluster) (c) Arch-3 (staggered)