1 / 53

CS184a: Computer Architecture (Structure and Organization)

CS184a: Computer Architecture (Structure and Organization). Day 10: January 31, 2003 Compute 2:. Last Time. LUTs area structure big LUTs vs. small LUTs with interconnect design space optimization. Today. Cascades ALUs PLAs. Last Time. Larger LUTs Less interconnect delay

Télécharger la présentation

CS184a: Computer Architecture (Structure and Organization)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS184a:Computer Architecture(Structure and Organization) Day 10: January 31, 2003 Compute 2:

  2. Last Time • LUTs • area • structure • big LUTs vs. small LUTs with interconnect • design space • optimization

  3. Today • Cascades • ALUs • PLAs

  4. Last Time • Larger LUTs • Less interconnect delay • General: Larger compute blocks • Minimize interconnect • Large LUTs • Not efficient for typical logic structure

  5. Different Structure • How can we have “larger” compute nodes (less general interconnect) without paying huge area penalty of large LUTs?

  6. Small LUTs capture structure What structure does a small-LUT-mapped netlist have? Structure in subgraphs

  7. Structure • LUT sequences ubiquitous

  8. Hardwired Logic Blocks Single Output

  9. Hardwired Logic Blocks Two outputs

  10. Delay Model • Tcascade =T(3LUT) + T(mux)

  11. Options

  12. Chung & Rose Study [Chung & Rose, DAC ’92]

  13. Cascade LUT Mappings [Chung & Rose, DAC ’92]

  14. ALU vs. Cascaded LUT?

  15. 4-LUT Cascade ALU

  16. ALU vs. LUT ? • Compare/contrast • ALU • Only subset of ops available • Denser coding for those ops • Smaller • …but interconnect dominates

  17. Parallel Prefix LUT Cascade? • Can compute LUT cascade in O(log(N)) time? • Can compute mux cascade using parallel prefix? • Can make mux cascade associative?

  18. Parallel Prefix Mux cascade • How can mux transform Smux-out? • A=0, B=0  mux-out=0 • A=1, B=1  mux-out=1 • A=0, B=1  mux-out=S • A=1, B=0  mux-out=/S

  19. Parallel Prefix Mux cascade • How can mux transform Smux-out? • A=0, B=0  mux-out=0 Stop= S • A=1, B=1  mux-out=1 Generate= G • A=0, B=1  mux-out=S Buffer = B • A=1, B=0  mux-out=/S Invert = I

  20. Parallel Prefix Mux cascade • How can 2 muxes transform input? • Can I compute transform from 1 mux transforms?

  21. SSS SGG SBS SIG Two-mux transforms • GSS • GGG • GBG • GIS • BSS • BGG • BBB • BII • ISS • IGG • IBI • IIB

  22. Generalizing mux-cascade • How can N muxes transform the input? • Is mux transform composition associative?

  23. Parallel Prefix Mux-cascade

  24. Commercial Devices

  25. Xilinx XC4000 CLB

  26. Xilinx Virtex-II

  27. Altera Stratix

  28. Programmable Array Logic(PLAs)

  29. PLA • Directly implement flat (two-level) logic • O=a*b*c*d + !a*b*!d + b*!c*d • Exploit substrate properties allow wired-OR

  30. Wired-or • Connect series of inputs to wire • Any of the inputs can drive the wire high

  31. Wired-or • Implementation with Transistors

  32. Programmable Wired-or • Use some memory function to programmable connect (disconnect) wires to OR • Fuse:

  33. Programmable Wired-or • Gate-memory model

  34. Diagram Wired-or

  35. Wired-or array • Build into array • Compute many different or functions from set of inputs

  36. Combined or-arrays to PLA • Combine two or (nor) arrays to produce PLA (and-or array) Programmable Logic Array

  37. PLA • Can implement each and on single line in first array • Can implement each or on single line in second array

  38. PLA • Efficiency questions: • Each and/or is linear in total number of potential inputs (not actual) • How many product terms between arrays?

  39. PLAs • Fast Implementations for large ANDs or Ors • Number of P-terms can be exponential in number of input bits • most complicated functions • not exponential for many functions • Can use arrays of small PLAs • to exploit structure • like we saw arrays of small memories last time

  40. PLA Product Terms • Can be exponential in number of inputs • E.g. n-input xor (parity function) • When flatten to two-level logic, requires exponential product terms • a*!b+!a*b • a*!b*!c+!a*b*!c+!a*!b*c+a*b*c • …and shows up in important functions • Like addition…

  41. PLAs vs. LUTs? • Look at Inputs, Outputs, P-Terms • minimum area (one study, see paper) • K=10, N=12, M=3 • A(PLA 10,12,3) comparable to 4-LUT? • 80-130%? • 300% on ECC (structure LUT can exploit) • Delay? • Claim 40% fewer logic levels (4-LUT) • (general interconnect crossings) [Kouloheris & El Gamal/CICC’92]

  42. PLA

  43. PLA and Memory

  44. PLA and PAL PAL = Programmable Array Logic

  45. Conventional/Commercial FPGA Altera 9K (from databook)

More Related