1 / 23

Amalgam: a Reconfigurable Processor for Future Fabrication Processes

Amalgam: a Reconfigurable Processor for Future Fabrication Processes. Nicholas P. Carter University of Illinois at Urbana-Champaign. Performance = f(architecture, implementation). LD. LD. ADD. MUL. LD. MUL. ST. LD. MUL. ST. LD. ADD. MUL. LD. MUL. ST. ST. LD. LD. ADD. MUL.

Télécharger la présentation

Amalgam: a Reconfigurable Processor for Future Fabrication Processes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Amalgam: a Reconfigurable Processor for Future Fabrication Processes Nicholas P. Carter University of Illinois at Urbana-Champaign Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  2. Performance = f(architecture, implementation) LD LD ADD MUL LD MUL ST LD MUL ST LD ADD MUL LD MUL ST ST LD LD ADD MUL ADD MUL ADD MUL ST ST 1-D IDCT 1-D IDCT 1-D IDCT 1-D IDCT Time Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  3. Efficient Implementation • Everything you give up in clock rate you have to make back in architectural efficiency • Wire delay is the big limiting factor in system architectures today • Wires get slower relative to transistors as fab. process improves • Programmable processors moving to deeper pipelines • Not good enough to just prevent wires from making reconf. logic slower Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  4. Amalgam DRAM Cache (Multi-Banked) Network PCluster PCluster PCluster PCluster RCluster RCluster RCluster RCluster Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  5. Reconfigurable Cluster Design • 4 Register banks • 8 registers/bank • 4 Reconfigurable logic segments • 8 Rows x 32 LBs per segment • Array control unit • Network interface • Counter-clockwise flow of computation through cluster Network Interface Segment ACU Bank Segment Bank Segment Bank Segment Bank Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  6. Reconfigurable Clock Rates Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  7. Unpipelined Critical Path • Latches in logic blocks only resource for pipelining • Vertical and horizontal wires carry data between logic blocks • Wires have heavy loads, making them slower than their length would indicate • Effect on clock rate varies significantly with fabrication process LB FF HWIRE VWIRE Bank HWIRE VWIRE LB FF Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  8. Supporting Pipelining • Goal: make logic block delay the limiting factor on clock rate • Add configurable latches at each wire intersection • Problem: different paths may have different latencies • Add retiming buffers at logic block inputs/outputs • Add network queues to reduce synchronization overhead Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  9. Pipelined Critical Path • Delay of individual wires < logic block delay in all processes studied • Add configurable pipeline latches at junctions between wires • Pipeline latches also added on carry chains within rows LB FF FF HWIRE VWIRE Bank FF HWIRE VWIRE FF LB FF FF Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  10. Retiming Buffers • 5-deep chain of latches added to each logic block input • Similar structure added to LB output • Can “borrow” up to two cycles of additional delay from adjacent input • Total pipeline register overhead = 17% FF FF FF FF FF FF FF FF FF FF Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  11. Register Queues Original Architecture Original Architecture Network Network WRITE R8, Val1 WRITE R8, Val2 WRITE R8, Val1 WRITE R8, Val2 Sync. Message Register Queue EMPTY R8 Register File Register File Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  12. Implementing Pipelined Apps. • Logical vs. Physical pipelining • Logical: Program-visible, uses array and registers • Physical: Only visible to ACU, uses pipeline registers on wires, retiming buffers • Take advantage of decoupling provided by queues • Applications use same reconfigurable logic configurations in different fab. processes • Only FSM in ACU changes • Applications to portability, managing intra-die variation Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  13. Experimental Methodology • Programs simulated using Amalsim • Set each cluster’s clock rate independently • Benchmarks: IDCT, Rijndael, DNA comparison • Fine-grained version of each benchmark does one computation • Medium-grained version performs four independent computatons • Programmable cluster clock rates based on ITRS • Limit stages to 7 FO4 delay, slightly more aggressive than ITRS • Logic block latencies, wire lengths taken from circuit-level design of reconf. Cluster in 180nm CMOS • Convert logic block delay to FO4, scale by FO4 delay of each fabrication process • Scale wire length based on fabrication process, simulate wire delay in SPICE • Pipeline such that reconf. cluster cycle time is determined by logic block delay Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  14. Pipelined Clock Rates Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  15. Fine-Grained Benchmark Perf. • Reconfigurable version maintains about 20% perf. Improvement over programmable in all fab. processes • Pipelining only small benefit • Majority of speedup comes from reduction in memory references Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  16. Medium-Grain Benchmark Perf. • Pipelined architecture sees 2.6x perf improvement over programmable • Unpipelined architecture only minor improvement over programmable • Greater parallelism means more ability to tolerate memory delays Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  17. Limit Studies • Believe that memory operations are much of the benefit for small tasks • Study limit where memory latency = 1 • Also test theory that streaming benchmarks have enough parallelism to cover latency • Understand how much clock rate of reconfigurable unit affects performance • Model reconfigurable unit at same clock rate as programmable clusters • Completely unreasonable for unpipelined • Might be indicator of what industry could do with pipelined Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  18. Unpipelined Fine-Grained • Removing memory latencies makes programmable performance similar to reconfigurable • Latency of reconfig. clusters has large impact on performance -- no parallelism to cover latency Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  19. Pipelined Fine-Grained • Results similar to unpipelined • Benefit still mostly from memory reduction Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  20. Unpipelined Medium-Grain • Eliminating memory latencies really helps programmable • Latency of reconf. logic an even bigger problem • Programmable clusters can exploit parallelism through pipelines Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  21. Pipelined Medium-Grain • Impact of memory system on reconfigurable performance very small • Less benefit from increasing reconfigurable cluster clock rate • With even small amounts of parallelism, throughput becomes more important than latency. Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  22. Future Directions • ASIC-like performance with programmable systems • ASICs typically get 100x better performance per unit area than microprocessors • Application-specific memory systems in a programmable chip • Transform memory references into communication • Create natural division of programs into regular and irregular blocks Amalgam: a Reconfigurable Processor for Future Fabrication Processes

  23. Conclusion • Reconfigurable computing must provide both speedup from custom logic and high clock rates to succeed • Amalgam does this by limiting and tolerating wire delay at multiple levels • Clustered architecture • Segmented reconfigurable unit • Pipeline wire delays • Result: 2.6x speedup over 8-way CMP in current and future fabrication processes Amalgam: a Reconfigurable Processor for Future Fabrication Processes

More Related