Enhancing Random Access Scan for Soft Error Tolerance in Digital Circuits
This paper explores advancements in Random Access Scan (RAS) technologies to improve fault tolerance in digital circuits, specifically against soft errors caused by environmental factors. By comparing RAS with traditional Serial Scan (SS) methods, we demonstrate that RAS significantly reduces test time, volume, and power while enhancing reliability. Additionally, we propose a new scan-out structure that further improves error tolerance. Our findings indicate that RAS structures can achieve a soft error rate (SER) reduction of nearly 1/N compared to SS, illustrating their superior robustness for modern electronic systems.
Enhancing Random Access Scan for Soft Error Tolerance in Digital Circuits
E N D
Presentation Transcript
Enhancing Random Access Scan for Soft Error Tolerance Fan Wang* Vishwani D. Agrawal Department of Electrical and Computer Engineering, Auburn University, AL 36849 *Now with Juniper Networks, Inc. Sunnyvale, CA, 94086 42ndIEEE Southeastern Symposium on System Theory, March, 2010
Motivation for This Work • Recent work on random access scan (RAS) has shown its advantages in reducing test time, test volume and test power over serial scan (SS). • The RAS structure can also improve the fault tolerance ability in both normal function mode and test mode.
Outline • Background • Review of RAS design • Soft error tolerance of RAS • A new scan-out structure • Further enhancing error tolerance using RAS structure • Conclusion
Soft Errors • Soft errors are caused by the operating environment. • They are not due to permanent hardware faults. • Soft errors are intermittent or random, which makes their testing unreliable. • One way to deal with soft errors is to make hardware robust: • Capable of detecting soft errors • Capable of correcting soft errors • Both measures are probabilistic
Effect on Digital Circuit Charged Particles Charged Particles Combinational Logic Flip-flops Flip-flops IN OUT CK M. Nicolaidis (Editor), Soft Errors in Modern Electronic Systems, Springer, 2010.
Random Access Scan (RAS) • Testing requires that flip-flops be controllable and observable. Two methods are: • Serial scan (SS) using shift register • Random access scan (RAS) using memory-like addressing • RAS reduces test application time and test power, which are otherwise complementary objectives in SS. • Previous and current publications on RAS: • Ando, COMPCON-80 • Wagner, COMPCON-83 • Ito, DAC-90 • Bushnell & Agrawal, textbook, pp. 484-485 • Mudlapur et al., ITC-05 • Saluja et al., VLSI Design-04, ITC-05, ATS-05, VLSIDesign-06, VLSI Design-10.
Background Error tolerant computing techniques are characterized by the level of reliability: • Device level error tolerance techniques either increase the device critical charge or decrease the collected charge to reduce SER • Circuit or system level error tolerance techniques include error detection and correction (EDAC) codes and time/space redundancy.
BISER Design With C-Element S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim, “Robust System Design with Built-In Soft-Error Resilience,” Computer, vol. 38, no. 2, pp. 43-52, February 2005.
The “Toggle” RAS Flip-Flop Combinational Logic Data 1 M S To Output BUS M U X Combinational Logic Data 0 Clock Output BUS Control x y RAS-FF √nff Lines √nff Lines Row Decoder Column Decoder Address (log2nff)
Natural Soft Error Tolerance of RAS • For SS, soft error can be induced on each SFF as it transports test data to output. • For RAS, only when selected RAS cell has induced error, will the result be affected.
SER Analysis • For a 4- cell RAS structure, the SER is • N · ( A + Δ)· P · α f • For a 4-cell SS structure, the SER is • 4N · A · P · α f • Where • N is the particle flux in #particles · cm-2 · s-1 • A is sensitive area per FF in cm2 • P is probability of SET per strike in a FF • Δ is average area overhead (routing, decoder, etc.) per FF to implement RAS • αf is a temporal derating factor between 0 and 1
Fault Tolerant Design Using BISER-RAS Copy R1 R11 R12 R13 RAS FF11 RAS RAS RAS FF12 RAS RAS FF13 RAS FF14 RAS C- x1 ck Copy R2 R21 R22 R23 RAS RAS FF11 RAS FF12 RAS RAS RAS FF13 RAS RAS FF14 C- x2 To Next Level Copy R3 R31 R32 R33 RAS FF11 RAS RAS FF12 RAS RAS FF13 RAS RAS FF14 RAS C- x3 y1 y2 y3 y4
Conclusion • The RAS design has a natural soft error tolerance capability that is inherited from its unique structural and operation. • In a circuit with N FFs, the SER of RAS can be nearly 1/N that for the SER of SS. • The BISER-RAS can save on average 20.51% hardware over BISER applied to SS, and TMR-RAS saves on average of 179.28% over TMR-SS for ISCAS89 benchmarks.