1 / 18

Kaijie Wu and Ramesh Karri CAD Lab Department of Electrical Engineering Polytechnic University

Algorithm Level RE-computing with Shifted Operands - A Register Transfer Level Concurrent Error Detection Technique. Kaijie Wu and Ramesh Karri CAD Lab Department of Electrical Engineering Polytechnic University (kwu03@utopia.poly.edu,ramesh@india.poly.edu). Outline.

carrington
Télécharger la présentation

Kaijie Wu and Ramesh Karri CAD Lab Department of Electrical Engineering Polytechnic University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Algorithm Level RE-computing with Shifted Operands -A Register Transfer Level Concurrent Error Detection Technique Kaijie Wu and Ramesh Karri CAD Lab Department of Electrical Engineering Polytechnic University (kwu03@utopia.poly.edu,ramesh@india.poly.edu)

  2. Outline • Review time redundancy based CED techniques • Describe ARESO • operation • checking ratio • benefits / drawbacks • Integrate pipelining with ARESO • Summary of ARESO overhead • Examples and Experimental Results • Conclusion

  3. 1. Perform basic computation 0 0 X3 Y3 X2 Y2 X1 Y1 X0 Y0 +4 +3 +2 +1 +0 result 1 Z3 Z4 Z2 Z1 Z0 2. Repeat computation with 1-bit shifted operands X3 Y3 X2 Y2 X1 Y1 X0 Y0 0 0 +4 +3 +2 +1 +0 result 2 Z3 0 Z1 Z0 Z2 3. Compare results result 1 result 2 c Error RE Computing with Shifted Operands (RESO)

  4. Fault detection capability of RESO • With k-bit shift, RESO can detect errors in • all bit-wise logical operations when failures are confined to k adjacent bit-slices. • arithmetic operations in a ripple-carry adder and carry-lookahead adder when failures are confined to k-1 adjacent bit-slices, k>1. • arithmetic operations in a group carry look ahead adder when failures are confined to a group. Each group i consists of a k-1 bit adder and circuits for group-carry generate Gi, group-carry propagate Pi, and group carry-in Ci. • Up to k errors in a bit-slice of an array multiplier can be detected by shifting at most Log2(2k+1) bits in one of the operands.

  5. + + + * + * + + + + + + + + C C C C C C C + + + + + + + + + + * C C + + + + * * + + * C * C (a) (b) (c) (d) • No CED • (a) Example CDFG • Logic Level CED • (b) Duplication • (c) RESO, RERO, REDWC etc.. • Algorithm Level CED • (d) Algorithm level time redundancy Comparison

  6. Algorithm Level Re-Computing with Shifted Operands (ARESO) • Does not use fault tolerant logic operators • Performs checking operations at the Register Transfer Level • Supports hardware overhead vs. performance penalty vs. error detection latency trade-offs

  7. RTL Data path Operation of ARESO Indicator input input shift register shift register shift register C Output Error

  8. ARESO - Checking Ratio (R) L Sh Input R Input R Input samples L Sh Input R R=1  check all results !!! Input R Input 2 Input 1 time L = # of clock cycles per iteration

  9. ARESO features • Good points • # of comparison(s) are reduced • By increasing checking ratio, time overhead can be reduced • Compared to straightforward duplication, area overhead is reduced • Bad points • Large detection latency, (R+1)  L

  10. Integrating ARESO with Pipelining • Reduces Error Detection Latency • If L=18, R=2 • Detection Latency = 54 cycles for basic ARESO (R+1)L • Detection Latency = 30 cycles for pipeline ARESO with initiation interval I = 6 (RIARESO)+L … … Detection Latency L Detection Latency Shifted Input 2 Shifted Input 2 IARESO Input 2 Input 2 L Input 1 Input 1 time 0 36 54 0 6 12 18 30 18 Basic ARESO Pipeline ARESO

  11. ARESO -Throughput • Throughput: # of results that come from non-shifted inputs per clock cycle (= ) • To maintain this throughput, the initiation interval of the pipelined ARESO design should be IARESO= … IARESO Shifted Input 2 I … Input 2 I Input 2 Shifted Input 1 Input 1 Input 1 0 12 0 6 12 18 30 30 pipeline design w/o ARESO Pipeline design w ARESO (R = 1)

  12. ARESO Design Tradeoffs

  13. Error detection capability of ARESO • All RESO detectable permanent faults • The transient faults detection capability varies with R (the checking ratio) and D (the # of data outputs that will be affected) • when 1  R  D, 100 % RESO detectable faults • when D<R, 100 x (D / R) % RESO detectable faults

  14. FIR Filter Example - overhead 50 ns clock FIR I=12, L=23 ARESO-1 FIR IARESO=6, R=1,L=24 ARESO-2 FIR IARESO =8, R=2, L=24 Multipliers (8×814) (9×817) (10 ×1016) (10 ×1017) (10 ×1019) (10 ×1016) (10 ×1019) Adders 2 (19×1919) 3 (21×2121) 2 (21×2121) Register bits 419 963 750 Combinational area (unit cells) 4051 6960 71.8% 5483 35.3% Sequential area (unit cells) 4983 11506 130.9% 8635 73.3% Total area (unit cells) 9034 18466 104.4% 14118 56.3% Detection latency (ns) - (6+24) ×50 = 1500 (2×8+24)×50= 2000 30.8% reduction in area at the expense of 33.3% increase in error detection latency.

  15. *17 *16 *15 *14 +16 *13 *12 +14 +15 *11 +13 *10 +12 +11 *9 *8 +9 +10 +8 +7 *7 +6 *6 +5 *5 +4 read inputs test checking ratio counter *4 +3 *3 +2 *2 +1 *1 FIR Filter Example - Schedule • 17 multiplications, 16 additions • ARESO with • checking ratio = 2 • IARESO=8 clock cycles • L=24 clock cycles • 50 ns clock cycle • ARESO constraints were incorporated into Synopsys BC synthesis scripts • Two 21×2121 adders • One 10 ×1016 and One 10 ×1019 multipliers • Detection latency of 2000 ns

  16. Multi-cycle ops (30 ns clock) FIR I=12 L=36 ARESO-1 FIR IARESO =6, R=1,L=36 ARESO-3 FIR IARESO =9, R=3,L=36 Combinational area (unit cells) 5318 8666 63.0% 6868 29.1% Sequential area (unit cells) 7898 14410 82.5% 10637 34.7% Total area (unit cells) 13216 23076 74.6% 17505 32.5% Detection latency (ns) - (6+36)×30=1260 (3×9+36)×30=1890 FIR using multi-cycle operations 31.8% reduction in area at the expense of 50% increase in error detection latency.

  17. Combinational area (unit cells) 4186 6912 65.1% 5491 31.2% Sequential area (unit cells) 4983 11044 121.6% 8910 78.8% Total area (unit cells) 9169 17956 82.5% 14401 57.1% Detection latency (ns) - (6+24)×100= 3000 (2×8+24)×100= 4000 FIR using chained operations Op. chaining (100 ns clock) FIR I=12, L=24 ARESO-1 FIR IARESO =6, R=1,L=24 ARESO-2 FIR IARESO =8,R=2,L=24 24.7% reduction in area at the expense of 33.3% increase in error detection latency.

  18. Conclusions • Compared to straightforward duplication, area overhead of ARESO-R designs are in the range 30%-100%. • The detection latency of ARESO-R increases with checking ratio R. • For a given throughput, the area overhead decreases as the checking ratio R increases. • ARESO constraints incorporated into Synopsys BC.

More Related