1 / 53

Class Presentation Of Advance VLSI Course

Class Presentation Of Advance VLSI Course. Presented by : Ali Shahabi. Major Refrence is : Architecture and Circuit Techniques for a Reconfigurable Memory Block. By :

orrick
Télécharger la présentation

Class Presentation Of Advance VLSI Course

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Class Presentation Of Advance VLSI Course Presented by : Ali Shahabi Major Refrence is : Architecture and Circuit Techniques for a Reconfigurable Memory Block By : Ken Mai, Ron Ho, Elad Alon, Dean Liu,Younggon Kim, Dinesh Patil, and Mark Horowitz Presentation Date : 2004/12/30

  2. Custom ASICs are expensive • High design complexity • High non-recurring engineering costs • Need high-volume, high-profit market • Hard to modify or fix

  3. Reconfigurable computing • Growing interest in reconfigurable solutions – FPGAs – Structured ASICs – Coarse-grain architectures • Reconfigurable computing characteristics – Low non-recurring engineering costs – Good performance and efficiency – Reconfigurability overheads

  4. FPGA with hardwired blocks CLBs CLB : Configurab44le Logic Block [1]

  5. Coarse-grain architecture • Chip multi-processor • Compute, memory, interconnect, control • Reconfigure tile and global network [1]

  6. Current memory systems • Traditional emphasis on compute side – Memory system important • FPGAs – Fine grain with sizable overheads – Use CLBs for extra functionality – Slow compared to cutting-edge SRAMs • Coarse-grain architectures – Large grain – Low flexibility

  7. Design goal Low overhead, fast, reconfigurable memory system • Reconfigure along natural SRAM partition boundaries – Add hardwired blocks for extra functionality • Modern SRAM circuit techniques – Pulse-mode self-resetting logic – Replica timing paths • Design targets – Cache – FIFO

  8. Reconfigurable memory system • Array of homogeneous memory mats • Each memory has a port into the interconnect • Mat size chosen based on natural partition boundary • Small inter-mat control network [1]

  9. Smart Memories chip [2]

  10. Tile floor plan [2]

  11. Sample configuration: caches • Mats configured as tag or data • Direct mapped or set-associative caches • Use inter-mat control network to pass hit/miss [1]

  12. Mats configured as 2-way set-associative cache [2]

  13. Sample configuration: FIFOs • Data FIFOs, instruction store, and scratchpad • Completely self-contained FIFOs • A single FIFO can be <> 1 mat [1]

  14. Multi-porting • Some configurations need >1 access per cycle – Cache tag with snooping – FIFOs with independent read and write ports • Multi-porting each cell is expensive – Multiple ports not always needed • Run memory system faster than processor – Time multiplex single-port – Memory cycle = 10 fan-out of 4 inverter delays

  15. Virtual multi-porting [1]

  16. Mat latency • Total mat latency = 2 cycles – 20 FO4 – SRAM access = 1 cycle – Peripheral logic = 1 cycle • Fully pipelined – Accepts one access every cycle

  17. Added features [1]

  18. Meta-data [1]

  19. Mat details • 2KB SRAM array – 512 x 36b logical, 128 x 144b physical – 32b main data, 4b meta-data – Scan tunable replica bitline

  20. Meta-data bits meta-data data 32b 4b • Cache: valid, dirty, LRU, cache coherence state • FIFO: valid • Special operations – Gang – Read modify write

  21. Gang operation meta-data data mask clear set • Can gang set or clear columns of meta-data bits • Single cycle operation [1]

  22. Gang operation meta-data data mask clear set • Can gang set or clear columns of meta-data bits • Single cycle operation [1]

  23. Meta-data bit cell [1]

  24. Meta-data bit cell [1]

  25. Read modify write mdata data [1]

  26. Read modify write: read mdata data [1]

  27. Read modify write: modify mdata data [1]

  28. Read modify write: write mdata data [1]

  29. RMW decoder circuits [1]

  30. RMW decoder circuits [1]

  31. RMW decoder circuits: read [1]

  32. RMW decoder circuits: modify [1]

  33. RMW decoder circuits: write [1]

  34. RMW decoder circuits: write [1]

  35. Timing [1]

  36. PLA • Reconfigurable NOR-NOR PLA • 1st NOR plane = ternary-CAM • 2nd NOR plane = SRAM [1]

  37. PLA: 1st NOR plane [1]

  38. PLA: normal delay chain [1]

  39. PLA: early reset-off delay chain [1]

  40. PLA: 2nd NOR plane [1]

  41. Pointer logic [1]

  42. Pointer logic For FIFO configurations we add pointer logic • 4 pointer/stride pairs – 11b pointer – 4b stride Pointer cells are 2-ported [1]

  43. Write buffer [1]

  44. Write buffer Pipeline writes for single-cycle cache writes • On write, data mat stores incoming data in WB • Tag check – Cache miss 􀃆 WB entry is invalidated – Cache hit 􀃆 WB entry is allowed to write • Writes into data mat on next write • On every write, the WB and mat are both active [3]

  45. Comparator [1]

  46. Comparator • Maskable comparator – Can mask out any combination of meta-data bits – Can mask out the main data as a chunk • Example use: cache tag compare – Want to check valid state of line (in meta-data) – Want to check tag itself (in main data)

  47. Putting it all together [1]

  48. Testchip • 0.18µm 6M TSMC • 3mm x 3.3mm die • 4 memory blocks • Low swing crossbar • Test vector storage • 1.1GHz (10 FO4) • 1.8V, room temp. [1]

  49. Testchip mat details • 2KB SRAM array – 512 x 36b logical, 128 x 144b physical – 32b main data, 4b meta-data • 16 AND-term PLA – 6 inputs, 4 outputs • 4 pointer/stride pairs – 11b pointer – 4b stride

  50. Mat area breakdown (mm2) • 32% mat area in peripheral logic [1]

More Related