230 likes | 337 Vues
This presentation outlines a reconfigurable signal processing IC with an embedded FPGA and multi-port flash memory, aimed at addressing industry trends and enhancing system integration. The system architecture includes an extensible MPU, memory subsystem, and programmable I/O, offering a simple model for programmers. The reconfigurable core combines a processor with e-FPGA, optimized for system performance and energy efficiency, demonstrated in applications like an embedded face recognition system. The project emphasizes faster project turnaround, lower risks, and the usage of reconfigurable silicon fabrics. The architecture involves a customizable flash memory subsystem with modular 2Mb modules and N independent ports for efficient data handling. The platform approach enables system applications compilation and configuration, with focus on performance, energy efficiency, and overall system design flow, showcasing the benefits of adaptable hardware co-processors.
E N D
A Reconfigurable Signal Processing IC with embedded FPGA and Multi-Port Flash Memory M. Borgatti, L. Calì, G. De Sandre, B. Forêt, D. Iezzi, F. Lertora, G. Muzzi, M. Pasotti, M. Poles, P.L. Rolandi STMicroelectronics - Central R&D - Italy
Outline of Presentation • Project motivation and background • System architecture • Reconfigurable core • Memory subsystem • System performance • Application example: embedded face recognition system • Energy efficiency, measurements • SoC integration and design flow • System 2 RTL and RTL 2 Layout • Summary 2
Project motivation and background • Conflicting industry trends • Economics of system integration • Even more complex SoC • More integration • Cost effectiveness and performance (per unit) • Increasing design complexity and risks • Increasing NREs • Shorter time-to-market and product life • Strong need for: • Faster project turnaround • Lower risk • Usage of re-configurable silicon fabrics 3
Project motivation and background • Pragmatic approach proposed: • Reconfigurable architecture • Joins a statically extensible processor with e-FPGA • Tight connection to Flash memory subsystem • Open architecture with flexible programmable I/O • Programmable platform approach • Simple model for programmers 4
Programmable Platform Approach System Applications Family System Application Application Compilation Platform Compilation Config. Proc + e-FPGA Silicon process + Enabling technologies Programmable platform 5
System Architecture 48 kB SRAM 8KB D$ 8KB I$ bus bridge Extensible MPU 64 bit AHB BUS 8KB D$ M/S AHB I/F DMA & FPGA Prog. I/F FP CP DP INTs e-FPGA Instr. Ext. Flash Mem Inst. Ext I/F Buffer I/F AHB/APB Bridge 1kB Buffer GP I/O 64 bit APB BUS I2C BUS General Purpose I/O Lines I/O registers I2C Master 6
e-FPGA Purposes • Processor ISA extensions • Simplest programmer’s model • Specific interface to the MPU datapath • Impact on processor performance • Impact on processor energy efficiency • Efficiency limited by instruction stream decoding • Bus-mapped co-processor • Maximum benefits in speed/power • Flexible I/O 7
e-FPGA – Microprocessor interface e-FPGA Clock Microprocessor clock Clock Ctrl Instruction Other FPGA Purposes Decode Pipe Control Register File R Instruction extension E Result 8
Flash Memory Architecture 2Mb #0 2Mb #1 2Mb #2 2Mb #3 DFT PMA Power Block 128-bit Memory Sub-System Crossbar 128 128 128 128 P I/F DP CP FP 64 64 32 8-bit P Data Port Code Port FPGA Port 9
Flash Memory Subsystem • Modular approach • Customizable array of N independent 2Mb modules • 3 content-specific ports (CP, DP, FP) • HW support for filesystem implem. (DP) • Defrag • Compression • Virtual erase • 2Mb Module features: • 128b I/O • 40ns access time (400MB/s peak throughput) • Power management and arbitration 10
32-bit uP RegisterFile System Memory Hierarchy AHB Bridge 64-bit AHB Bus 32-bit FPGA PI/F • AHB Peak Throughput: • 800MB/s • e-FPGA • 400MB/s • (50MB/s sustained) • Total Aggregate Peak • 1.2GB/s 64-bit AHB 32-bit 64-bit CP I/F 64-bit DP I/F DMA 64 bit Port CP 32-bit Port FP 64-bit Port DP 512-B Buffer 2 x 64- + 1 x 32-bitMemory Port I/Fs 6x4 128-bit Crossbar 4 x Flash Memory Controller Logic 4 x 16384 x 128-bit Memory Module 11
Application Ex.: Face Recognition • Target application: • Recognize a face out of twenty • low-resolution images from CMOS cameras • Potential applications: • Low cost smart toys • Advanced human-machine interfaces • Color CMOS camera processors • Image preprocessing: Bayer filter • Face location: based on Hough transform • Face recognition: Line-Based • Recognition rates over 90 % • Scale-invariant • Tolerant to changes in illumination intensity 12
‘8’ ’16’ Processor Extension (I) + + Processor Load Unit 4-segm. 4-segm. • 8-issue, 8-bit L2 distance • Complexity: • 23 8-bit OPS • 6 64-bit OPS • 1GOPS peak throughput • Distance computation • 10k equiv. ASIC gates • Mapped to e-FPGA _ x 64-bit register + Result 13
Processor Extension (II) root Remaind. Number +1 >>1 <<2 >>30 >>2 + • Fixed-point square root kernel • Complexity: • 12 32-bit OPS • 2k equiv. ASIC gates • Mapped to e-FPGA _ > + 2 << 1 Result 14
Performance: Processing Time @ 100 MHz
Energy Efficiency vs. Flexibility FPGA-mapped CoProcessors 1000 Dedicated HW uP + FPGA Instructions 100 Energy Efficiency (MOPS/mW) Energy-Flexibility Gap ! 10 ASIPs, DSPs 1 Embedded Processors 0.1 Flexibility (Coverage) from: Zhang et Al., ISSCC 2000 16
Performance: Energy Efficiency 17
Functional model (untimed) Partitioning / I/F Synthesis / Refinement uP ISS Cycle Accurate Simulation Performance Analysis Libraries HW/SW VHDL (e-FPGA) Inst.Ext. Verilog HW (RTL) uP, AHB/APB Bus Peripherals C Soft Hardware (eFPGA) SW Apps eFPGA mapping eFPGA HARD MACRO SoC Integration 18
CPU core, IPs Interface RTL code Flash RAM eFPGA core Inst. Ext. Coproc. I/O I/F Synthesis Floorplanning / P&R Synthesis Static Timing Analysis, Dynamic Verification Con. Mapping (P&R) Netlist + Timing Database FPGA Timing DB Bit-stream Static Timing Analysis (SoC + eFPGA) Silicon fab 19
Chip Layout DFT 1MB FLASH Memory 8+8 KB I$ + D$ Embedded FPGA TAGS 32b uP + AHB & APB + 250k GATES Flash Ports Buffers uP AHB/APB FPGA 48 KB SRAM BUFFER 48kB SRAM 8+8 kB I$+D$ 20
Summary • e-FPGAs allow architectural tradeoffs for reconfigurable embedded systems: • Processor ISA extensions • Bus-mapped co-processor • Flexible I/O • Modular, content-specific, multiport e-Flash • Performance figures: • Up to 10x speedup • Up to 9x energy reduction • Dynamic reconfiguration in 500 us • Specific design-flow for system and RTL 22
Acknowledgements: The authors thank: all the colleagues of NVM-DP Dept. A. Maurelli, F. Piazza and L. Fumagalli. 23