1 / 44

New Opportunities with Platform Based Design

New Opportunities with Platform Based Design. Frank Vahid Associate Professor Dept. of Computer Science and Engineering University of California, Riverside Also with the Center for Embedded Computer Systems at UC Irvine http://www.cs.ucr.edu/~vahid

fadhila
Télécharger la présentation

New Opportunities with Platform Based Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. New Opportunities with Platform Based Design Frank Vahid Associate Professor Dept. of Computer Science and Engineering University of California, Riverside Also with the Center for Embedded Computer Systems at UC Irvine http://www.cs.ucr.edu/~vahid This research has been supported by the National Science Foundation, NEC, Trimedia, and Triscend Frank Vahid, UC Riverside

  2. How Much is Enough? Frank Vahid, UC Riverside

  3. How Much is Enough? Perhaps a bit small Frank Vahid, UC Riverside

  4. How Much is Enough? Reasonably sized Frank Vahid, UC Riverside

  5. How Much is Enough? Probably plenty big Frank Vahid, UC Riverside

  6. How Much is Enough? More than typically necessary Frank Vahid, UC Riverside

  7. How Much is Enough? Very few people could use this Frank Vahid, UC Riverside

  8. IC package IC How Much Custom Logic is Enough? 1993: ~ 1 million logic transistors Perhaps a bit small Frank Vahid, UC Riverside

  9. How Much Custom Logic is Enough? 1996: ~ 5-8 million logic transistors Reasonably sized Frank Vahid, UC Riverside

  10. How Much Custom Logic is Enough? 1999: ~ 10-50 million logic transistors Probably plenty big Frank Vahid, UC Riverside

  11. How Much Custom Logic is Enough? 2002: ~ 100-200 million logic transistors More than typically necessary Frank Vahid, UC Riverside

  12. How Much Custom Logic is Enough? • Point of diminishing returns • 32-bit ARM: ~30K • MPEG dcd: ~1M • Other examples • Fast cars (> 100 mph) • High res digital cameras (> 4M) • Disk space • Even IC performance 1993: 1 M 2008: >1 BILLION logic transistors Perhaps very few people could design this Frank Vahid, UC Riverside

  13. 10,000 100,000 1,000 10,000 100 1000 Logic transistors per chip (in millions) Gap Productivity (K) Trans./Staff-Mo. 10 100 IC capacity 1 10 0.1 1 productivity 0.01 0.1 0.001 0.01 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 Very Few Companies Can Design High-End ICs • Designer productivity growing at slower rate • 1981: 100 designer months  ~$1M • 2002: 30,000 designer months  ~$300M Design productivity gap Source: ITRS’99 Frank Vahid, UC Riverside

  14. Meanwhile, ICs Themselves are Costlier • And take longer to fabricate • While market windows are shrinking • Less than 1,000 out of 10,000 ASIC designs have volumes to justify fabrication in 0.13 micron Source: DAC’01 panel on embedded programmable logic Frank Vahid, UC Riverside

  15. * Transistors are less scarce • ICs are big enough, fast enough • * ICs take more time and money to design and fabricate • While market windows are shrinking Buy pre-fabricated system-level ICs: platforms Summarizing So Far... Designers Frank Vahid, UC Riverside

  16. Trend Towards Pre-Fabricated Platforms: ASSPs • ASSP: application specific standard product • Domain-specific pre-fabricated IC • e.g., digital camera IC • ASIC: application specific IC • ASSP revenue > ASIC • ASSP design starts > ASIC • Unique IC design • Ignores quantity of same IC • ASIC design starts decreasing • Due to strong benefits of using pre-fabricated devices Source: Gartner/Dataquest September’01 Frank Vahid, UC Riverside

  17. Becoming out of reach of mainstream designers Will High End ICs Still be Made? • YES • The point is that mainstream designers likely won’t be making them • Very high volume or very high cost products • Platforms are one such product – high volume • Need to be highly configurable to adapt to different applications and constraints Frank Vahid, UC Riverside

  18. Configurable Platform Design: Cache • ARM920T: Caches consume half of total power (Segars 01) • M*CORE: Unified cache consumes half of total power (Lee/Moyer/Arends 99) Periph- erals JPEG dcd L1 cache L1 cache uP DSP FPGA IC Pre-fabricated Platform (A pre-designed system-level architecture) Frank Vahid, UC Riverside

  19. Best Cache Architecture for Embedded Systems • Not clear • Huge variety among popular embedded processors • What’s the best… • Associativity, Line size, Total size? Frank Vahid, UC Riverside

  20. Set associative cache • Multiple “ways” • Fewer index bits, more tag bits, simultaneous comparisons • More expensive, but better hit rate Tag Index 11 D 0000 Conflict 110 D 100 C 000 Direct mapped cache (1-way set associative) 2-way set associative cache Cache Associativity A 00 0 000 • Direct mapped cache • Certain bits “index” into cache • Remaining “tag” bits compared B 01 0 000 C 10 0 000 D 11 0 000 Frank Vahid, UC Riverside

  21. Cache Associativity • Reduces miss rate – thus improving performance • Impact on power and energy? • (Energy = Power * Time) Frank Vahid, UC Riverside

  22. Associativity is Costly • Associativity improves hit rate, but at the cost of more power per access • Are the power savings from reduced misses outweighed by the increased power per hit? Energy access breakdown for 8 Kbyte, 4-way set associative cache (considering dynamic power only) Energy per access for 8 Kbyte cache Frank Vahid, UC Riverside

  23. Significantly poorer energy Associativity and Energy • Best performing cache is not always lowest energy Frank Vahid, UC Riverside

  24. Associativity Dilemma • Direct mapped cache • Good hit rate on most examples • Low power per access • But poor hit rate on some examples • High power due to many misses • Four-way set-associative cache • Good hit rate on nearly all examples • But high power per access • Overkill for most examples, thus wasting energy • Dilemma: Design for the average or worst case? Frank Vahid, UC Riverside

  25. Associativity Dilemma • Obviously not a clear choice Frank Vahid, UC Riverside

  26. Our Solution: Configurable Cache • Can be configured as 4, 2, or 1 way • Ways can be concatenated • Size can also be configured • By shutting down ways • Saves static power (leakage) 11x D 10x C 0000 110 D 0000 This bit selects the way 11 0 000 11 0 000 Frank Vahid, UC Riverside

  27. 6x64 c0 c1 c3 c2 Configurable Cache Design: Way Concatenation (4, 2 or 1 way) a31 tag address a13 a12 a11 a10 index a5 a4 line offset a0 Configuration circuit a11 Small area and performance overhead reg0 a12 reg1 tag part c3 c1 c0 c2 bitline c1 c0 index 6x64 6x64 6x64 data array c2 c3 6x64 6x64 column mux sense amps tag address line offset mux driver data output critical path Frank Vahid, UC Riverside

  28. Configurable Cache Experiments • Motorola PowerStone benchmark g3fax • Way concatenate outperforms 4 way and direct map. Frank Vahid, UC Riverside

  29. Configurable Cache Experiments 100% = 4-way conventional cache • Configurable cache with both way concatenation and way shutdown was best on average • Considered programs from Powerstone, MediaBench, and Spec2000 • And, it was superior on every benchmark Frank Vahid, UC Riverside

  30. Configurable Cache Experiments – Line Size Too 100% = 4-way conventional cache csb: concatenate plus shutdown cache • Best line size also differs per example • Our cache can be configured for line of 16, 32 or 64 bytes • 64 is usually best; but 16 is much better in a couple cases • A configurable cache with way concatenation, way shutdown, and variable line size, can save a lot of energy Frank Vahid, UC Riverside

  31. uP FPGA Configurable Platform Use • Platforms increasingly come with on-chip FPGA • Can we use that FPGA to improve software performance and energy? Periph- erals JPEG dcd L1 cache uP DSP FPGA IC Pre-fabricated Platform Frank Vahid, UC Riverside

  32. Triscend E5 chip Configurable logic 8051 processor plus other peripherals Memory Commercial Single-Chip Microprocessor/FPGA Platforms • Triscend E5: based on 8-bit 8051 CISC core • 10 Dhrystone MIPS at 40MHz • 60 kbytes on-chip RAM • up to 40K logic gates • Cost only about $4 (in volume) Frank Vahid, UC Riverside

  33. Single-Chip Microprocessor/FPGA Platforms • Atmel FPSLIC • Field-Programmable System-Level IC • Based on AVR 8-bit RISC core • 20 Dhrystone MIPS • 5k-40k configurable logic gates • On-chip RAM (20-36Kb) and EEPROM • $5-$10 Courtesy of Atmel Frank Vahid, UC Riverside

  34. Single-Chip Microprocessor/FPGA Platforms • Triscend A7 chip • Based on ARM7 32-bit RISC processor • 54 Dhrystone MIPS at 60 MHz • Up to 40k logic gates • On-chip cache and RAM • $10-$20 in volume Courtesy of Triscend Frank Vahid, UC Riverside

  35. Single-Chip Microprocessor/FPGA Platforms • Altera’s Excalibur EPXA 10 • ARM (922T) hard core • ~200 Dhrystone MIPS at ~200 MHz • Devices range from ~200k to ~2 million programmable logic gates Source: www.altera.com Frank Vahid, UC Riverside

  36. Single-Chip Microprocessor/FPGA Platforms • Xilinx Virtex II Pro • PowerPC based • 420 Dhrystone MIPS at 300 MHz • 1 to 4 PowerPCs • 4 to 16 gigabit transceivers • 12 to 216 multipliers • 3,000 to 50,000 logic cells • 200k to 4M bits RAM • 204 to 852 I/O • $100-$500 (>25,000 units) • Up to 16 serial transceivers • 622 Mbps to 3.125 Gbps PowerPCs Config. logic Courtesy of Xilinx Frank Vahid, UC Riverside

  37. Single-Chip Microprocessor/FPGA Platforms • Why wouldn’t future microprocessor chips include some amount of on-chip FPGA? Frank Vahid, UC Riverside

  38. Single-Chip Microprocessor/FPGA Platforms • Lots of silicon area taken up by configurable logic • As discussed earlier, less of an issue every year • Smaller area doesn’t necessarily mean higher yield (lower costs) any more • Previously could pack more die onto a wafer • But die are becoming pad (pin) limited in nanoscale technologies • Configurable logic typically used for peripherals, glue logic, etc. • We have investigated another use... Frank Vahid, UC Riverside

  39. Software Improvements using On-Chip Configurable Logic • Partitioned software critical loops onto on-chip FPGA for several benchmarks • Most time spent in one or two loops • Extensive simulated results for 8051 and MIPS • For Powerstone (PS), MediaBench (MB) and Netbench (NB) Frank Vahid, UC Riverside

  40. Speedup of 3.2 and energy savings of 34% obtained with only 10,500 gates (avg) Software Improvements using On-Chip Configurable Logic Frank Vahid, UC Riverside

  41. Speedup Gained with Relatively Few Gates • Created several partitioned versions of each benchmarks • Most speedup gained with first 20,000 gates • Surprisingly few gates • Stitt, Grattan and Vahid, Field-programmable Custom Computing Machines (FCCM) 2002 • Stitt and Vahid, IEEE Design and Test, Dec. 2002 • J. Villarreal, D. Suresh, G. Stitt, F. Vahid and W. Najjar, Design Automation of Embedded Systems, 2002 (to appear). Frank Vahid, UC Riverside

  42. Software Improvements using On-Chip Configurable Logic – Verified through Physical Measurement A7 IC • Performed physical measurements on Triscend A7 and E5 devices • Similar results (even a bit better) Triscend A7 development board Frank Vahid, UC Riverside

  43. Other Types of Configurability • Microprocessor (other researchers) • VLIW configurations • Voltage scaling • Peripherals • e.g., JPEG decoder with different precisions • Bus topology • Etc. Periph- erals JPEG dcd L1 cache uP DSP FPGA IC Frank Vahid, UC Riverside

  44. Conclusions • Trend is away from semi-custom IC fabrication • Pressures encourage buying pre-fabricated platforms • Platforms must be highly configurable • To be useful for a variety of applications, and hence mass produced • We have discussed • Software speedup/energy benefits of on-chip configurable logic: 3x speedups and 34% energy savings with only ~10,000 gates • Creating a highly-configurable cache architecture: 40% energy savings compared to conventional cache • Designing highly-configurable platforms, and facilitating their use with good exploration tools, can help enable platform-based design • See http://www.cs.ucr.edu/~vahid for more information Frank Vahid, UC Riverside

More Related