1 / 21

Soft Core Viterbi Decoder

Soft Core Viterbi Decoder. EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang. High Level Architecture. 4% 1% 4%. 23% 36% 29%. 38% 8% 21%. 2% 1% 4%. 0% 48% 18%. 18% 4% 15%. 9% 2% 8%. % Gates % Area % Power. Branch & Path Metric Generation. U. U. U. U. U. U.

aponte
Télécharger la présentation

Soft Core Viterbi Decoder

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang

  2. High Level Architecture 4%1%4% 23%36%29% 38%8%21% 2%1%4% 0%48%18% 18%4%15% 9%2%8% % Gates% Area% Power

  3. Branch & Path Metric Generation U U U U U U U U L L L L L L L L • Branch Metrics Computation apparently implemented with a CORDIC block (contains 840 MUX’s, 58 adders & flip-flops, 32 15-bit busses) • Branch Metrics Hard-wired to each ACS unit • Path Metrics Stored in ACS units • Each ACS unit handles 16 states Hard-wired Path Metric Interconnect

  4. ACS Architecture 8x9 SRAM PMU PMU BMU PML Pipeline Register BML CompareSelect Add PML MUX • Each ACS unit stores 32 path metrics • Only two SRAM’s are active at a time • Across all four ACS units, each path metric is stored twice • SRAM accounts for 88% of the area and 27% of the power for each ACS unit

  5. Traceback Architecture PipelineRegister MUX SRAM DecisionBits Traceback Next_ramin Traceback Memory Unit 192 DecisionBits Out TracebackMemory Unit22% Area20% Power Finite StateMachine11% Area13% Power Traceback Unit • State-Machine blocks are just large sum-of products combinational networks(351 gates each) • Each memory unit contains a 16x64 SRAM and logic(192 MUX’s, 128 flip-flops)

  6. Design Flow Synthesis & Module Generation Pre-Layout Verification & Analysis Floor Planning Place & Route Post-Layout Verification & Analysis • Design Compiler Synthesis script (from Mentor/Inventra) • SRAM Generator (from Norman Walker) • VHDL gate-level sims (timing verification, switching activity annotation) • PowerMill Simulations (SRAM, core) • Design Compiler, Power Compiler (Static timing, power analysis) • Floor Planning (Preview) • Place & Route (Silicon Ensemble) • Interconnect Parasitic Extraction (“report simcap”) • PowerMill simulations, PathMill static analysis • Design Compiler, Power Compiler (Static timing, power analysis with back-annotated interconnect parasitics)

  7. Synthesis and SRAM Generation • Synthesis with Synopsys Design Compiler • Constraint: 66 kHz clock (effectively infinite) • Bottom-up synthesis of 62 VHDL entities • Low-Power SRAM generator (from Pleiades) • Very large sense-amps, control logic • Optimized for power, speed at low supply-voltages • Word-length limited to a power of 2

  8. Simulation Models • Parameterized, bit-true, and fast • Used for system level design and BER simulations Behavioral C Behavioral VHDL • Parameterized, bit-true, and cycle-true • Used for structural simulations and test bench reference • Synthesizable, crafted for specific parameters and • implementation structure • Used for synthesis quality RTL VHDL

  9. BER Simulation Results

  10. SRAM • Simulation Tools: TimeMill & PowerMill • Parameters • 66 MHz clock • Voltage 2.5V • Random Generated Test Vectors • Results • Power Analysis • Timing Analysis

  11. SRAM: Power Numbers • SRAM used for ACS Unit • 8 words by 9 data bits Operations Avg.(µA) Avg.(mW) Avg.(pJ) Read Activity 663.73 1.659 24.885 Write Activity 563.21 1.408 21.120 Read/Write 612.29 1.530 22.950 Parasitic Extraction Operations Avg.(µA) Avg.(mW) Avg.(pJ) Read Activity 949.89 2.3747 35.6205 Write Activity 772.830 1.9320 28.980 Read/Write 851.42 2.1285 31.9275

  12. SRAM: Power Numbers • SRAM used for Traceback Unit • 16 words by 64 data bits Operations Avg.(µA) Avg.(mW) Avg.(pJ) Read Activity 2170.7 5.4267 81.4005 Write Activity 1893.4 4.7335 71.0025 Read/Write 2086.9 5.2172 78.2580

  13. SRAM: Timing Numbers • Delays • Delays • Setup Time; Hold Time • time needed for data address to become stable Setup(ns) Hold(ns) Data Resolution(ns) ACS SRAM ~1 ~2 ~1.8 Traceback SRAM ~1 ~2 ~5

  14. Place and Route • Floor planning of the Viterbi SRAM macro cells and standard cells was done in Preview, and Silicon Ensemble was used for routing. • Total SRAM macro cell area was 1.58 mm2 (1.08 mm2 with 9x8 SRAMs) • Area of the 16 9x8 bit SRAM macro cells: 0.052 mm2 each, 62% larger than required, as 16x8 bit SRAMs were used (SRAM generator output had been verified for powers of 2) • Area of the 3 16x64 bit SRAM macro cells: 0.25 mm2 each • Area of the standard cells 1.02 mm2 (0.35 mm2 from DEF file) • Final chip area was 4.0 mm2 (original estimate 2.5 mm2) • Parasitics for timing simulation were extracted from the final routed nets in Silicon Ensemble.

  15. Wiring Statistics • Six metal layers, layers 5 and 6 used for power and ground respectively • Ground and power spaced alternately 100 um apart horizontally and vertically. • There were about 6200 nets and 46,114 vias. Total wire lengths: • metal layer 1: 3,293 um • metal layer 2: 458,440 um • metal layer 3: 510,517 um • metal layer 4: 218,023 um • metal layer 5: 96,882 um signal, and 38,400 um power • metal layer 6: 8,660 um signal, and 37,500 um ground • wire length: 685 mm horizontal, 611 mm vertical, total 1296 mm

  16. Final Placement and Routing • Significant routing congestion at 16 by 64 bit SRAM outputs, due to Silicon Ensemble grid size of 1 um (observe white and light blue wires). • Minimum of 6 unroutable nets observed, even at 12 mm2 chip area. • Final size was 1.25 mm x 3.2 mm, 4 mm2, with 9 unroutable nets. • Violation reports in Silicon Ensemble did not identify which nets were unroutable, other than problems with ground and power connections.

  17. Static Timing Checks • All timing checks performed with Design Compiler’s report_timing command • Parasitic capacitances back-annotated with the set_load command • No RC parasitics annotated • No SRAM model was used for timing checks • Critical Path was from ACS control logic, through a PM ouput MUX select signal (in an ACS unit), through the following ACS unit. • Checks performed at 2.5V

  18. Static Power Checks • All timing checks performed with Design Compiler’s report_power command • Switching activity was measured for every output port (transition counts over 16,000-cycle simulation) • Back-annotation performed with SAIF files • No SRAM model was used for power checks (added in manually) • Checks performed at 2.5V w/ 60 MHz clock

  19. Delay and Energy Scaling

  20. Performance Results For fixed throughput requirement 100ksps:

  21. Summary • Performance in intended operation (100ksps) • Clock Speed: 1.6 MHz • Power Dissipation: 0.14 mW • Power Density: 34.9 uW per mm2 • Cost • Die Size: 4 mm2 • Design effort: 30 work days • Predictability and portability • Mentor/Inventra predictions vs. measured results

More Related