1 / 25

Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits. Christopher LaFrieda and Rajit Manohar Computer Systems Laboratory Cornell University. Outline. Motivation / Background Contributions Relaxed Quasi Delay-Insensitive (RQDI) RQDI Voltage Scaling

umika
Télécharger la présentation

Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits Christopher LaFrieda and Rajit Manohar Computer Systems Laboratory Cornell University

  2. Outline • Motivation / Background • Contributions • Relaxed Quasi Delay-Insensitive (RQDI) • RQDI Voltage Scaling • RQDI Two Phase Circuits • Results • Summary

  3. Motivation:How Does Dynamic Power Scale? • α – activity factor (1x) • N – total number of transistors (2x) • CL –average load capacitance per transistor (.7x) • Vdd – doesn’t scale well anymore • Scaled by 17-20% from 130nm to 65nm. • Scaled by 10% at 45nm and 5.5% at 32nm.

  4. Motivation:Power Scaling With Fixed Frequency

  5. Motivation:Process Variations Getting Worse • Process Variation in 65nm: • FO4 delays across corners: • FF is 70% faster than SS. • Circuits need to be robust w.r.t. process variations. • QDI is a logical place to start.

  6. Background:QDI – WCHB Buffer Simple buffer. Neutrality is checked in the pull-up stack of the c-element. Timing assumption?

  7. RQDI:Staticizer Timing Assumption I Data is neutral and enable is high.

  8. RQDI:Staticizer Timing Assumption II Data is neutral and enable is high. Data becomes valid which sets _R0 low. If R0 inverter is slow, R0 will remain low.

  9. RQDI:Staticizer Timing Assumption III Data is neutral and enable is high. Data becomes valid which sets _R0 low. If R0 inverter is slow, R0 will remain low. Nothing is fighting the weak feedback, _R0 can go high.

  10. RQDI:Half Cycle Timing Assumption • The half cycle timing assumption (HCTA):A small amount of combinational logic (1-2 transitions) will always switch within one half cycle of a process. • There is a 4.5x (@ 18 t.p.c.) timing margin. • With worst case corners, 2.7x margin in 65nm. • Wire delays make the assumption even more conservative. • QDI has an HCTA in staticizers. • RQDI allows them everywhere.

  11. RQDI:HCHB Template N tracks neutrality. Check N+, but assume N- happens in the first half cycle. Two transition latency. 14 transition cycle time. Validity must be checked by pull-down.

  12. RQDI Voltage Scaling:Scaling Scenarios Mismatched slack Two possible scenarios for voltage scaling. Top: mismatched slack. Lower pipeline can run slower. Bottom: Token limited loop. Latency through loop should be minimal, but cycle time can scale. In some applications these can’t be avoided. Token limited loop

  13. RQDI Voltage Scaling:Slack Mismatch In An FPGA Logic blocks (LB) for logic. Switch boxes (SB) for routing. Limited routing resources. Imperfect slack matching. Can scale voltage on blue path.

  14. RQDI Voltage Scaling:DVHB: Dual Voltage Template Data rails are full swing. Acknowledges are low swing. Latency remains constant through voltage scaling. Cycle time can be adjusted through voltage scaling.

  15. RQDI Two Phase Circuits:Two Phase Buffer (HCFB2P) An HCTA exists on the right pair of XORs. Two transition latency. Seven transition cycle time. Twice the area of a WCHB. However, it can replace two stages.

  16. RQDI Two Phase Circuits:Two Phase In An FPGA Replace routing (SB) with two phase logic. Logic (LB) remains four phase. Phase converters are placed around logic blocks. Routing makes up over half the area in an asynchronous FPGA, so power savings can be large. Width N Switch

  17. RQDI Two Phase Circuits:Converters • Need to convert between two phase (for routing) and four phase (for logic). • The 4:2 converter is 3x larger than a WCHB. • The 2:4 converter is 3.25x larger than a WCHB.

  18. Experimental Setup Simulated in HSpice with a 65nm bulk technology. Circuits are sized to the drive strength of a 20/10 lambda inverter.

  19. Results :HCHB – Energy Per Cycle HCHB consumes 32% less energy than PCHB. HCHB consumes 36% less energy than PCEHB. Slight frequency improvement. Negligible latency penalty.

  20. Results:HCHB – Total Transistor Area Despite the additional transistors to check validity, HCHB is smaller. HCHB is about 20% smaller than PCHB. HCHB is about 15% smaller than PCEHB.

  21. Results:DVHB – Low voltage vs. Dual Voltage

  22. Results:HCFB2P Switch – Energy Reduction vs. WCHB Wider switches means larger MUXes and larger PCs. The associated caps switch half as much. Over 50% reduction in power. Due to replacing two stages.

  23. RQDI Two Phase Circuits:Results – Area Overhead Typically, there is about of 8 stages of 4-wide switches between logic blocks. Area overhead is 15%. With direct connections, there are about 10 stages with an overhead of 10%.

  24. Summary • RQDI allows half cycle timing assumptions outside of staticizers. • With RQDI, we can simplify the PCHB logic template. The resulting template, HCHB, consumes 32% less energy. • The dual voltage logic template can be used to adjust the dynamic slack of a stage. This allows us to save energy with a minimal throughput penalty in token limited loops. • Replacing the routing in an FPGA with two phase logic can reduce energy consumption by 50%. Using the RQDI two phase buffer and converters will achieve this with a 10-15% area overhead.

  25. Questions?

More Related