210 likes | 236 Vues
Emerging Systems. Course No. 5: Expanding Bio-Inspiration: Towards Reliable MuxTree Memory Arrays – Part 2 –. ”Politehnica” University of Timisoara. Winter Semester 2010. Presentation Outline.
E N D
Emerging Systems Course No. 5: Expanding Bio-Inspiration: Towards Reliable MuxTree Memory Arrays – Part 2 – ”Politehnica” University of Timisoara Winter Semester 2010
Presentation Outline • Chapter 1: Bio-Inspired Reliability(With a plea for bio-inspiration and a comparison between artificial Embryonics cells and the stem cells from biology) • Chapter 2: A Bird’s Eye View Over Faults (Includes fault tolerance motivation, causes of unexpected, soft errors and a description of the physical phenomena involved) • Chapter 3: Embryonics and SEUs (Particularities of the project, datapath model in memory structures, and reliability analysis)
Chapter 3: Embryonics and SEUs (1) Current state-of-the-art • Bio-inspired memory for Embryonics genome storage • Genome storage critical: • drives actual hardware (polymerase and ribosomic genome) • contains instructions on how additional hardware will be driven (operative genome) • No memory protection mechanismscurrently • Both desirable and feasible
Chapter 3: Embryonics and SEUs (1) 3.1. Error-Type Distribution • “By far the most common type of chip failure is a soft error of a single cell on a chip” • Multiple bit flips 1÷7% of total soft fails recorded • Double bit-flips under 5% of the total events • 2 cases of quadruple bit flip events witnessed; predicted rate 1 in 65 years per device
Chapter 3: Embryonics and SEUs (2) 3.2. Datapath Model for Memory Structures • 3D matrix; M rows and N columns of physically identical storage molecules, of F 1-bit memory cells each • Data synchronously circled
Chapter 3: Embryonics and SEUs (3) 3.2. Datapath Model for Memory Structures • For each Li,j a vicinity V(Li,j) = Lx,y Li,j Lz,w defined • Data shifting process:
Chapter 3: Embryonics and SEUs (4) 3.2. Datapath Model for Memory Structures • Useful for error injection testing
Chapter 3: Embryonics and SEUs (5) 3.3. Reliability Analysis • Basic assumption: failures exponentially distributedinside a molecule • Similar assumptions found to work well
Chapter 3: Embryonics and SEUs (6) 3.4. Error Coding • Failure situations: • Single failure; recovery by parity-based coding • Double failure; core affected by at least one error, at most two errors on the same row; recovery by Hamming-like codes • Multiple failure; same as previous, likelihood found to be minimal • Terminal failure; too many faults, cannot be recovered • No failures detected; either normal operating or undetectable combination of errors; does not require/ cannot be established recovery measures
Chapter 3: Embryonics and SEUs (7) 3.4. Error Coding • Strategies of tolerating faults in Embryonics • Fault tolerance at the molecular level Advantage: isolating faulty molecules possible, use of the transparent reconfiguration process; Disadvantage: considerable portion of molecular core affected for redundant coding • Fault tolerance at the macro-cell level Advantage: separate macro-cells for redundant coding and additional logic Disadvantage: reconfiguration process quite difficult due to lack of addressing
Chapter 3: Embryonics and SEUs (8) 3.5.1. Macro-Cell Level, Classic SEC
Chapter 3: Embryonics and SEUs (9) 3.5.1. Macro-Cell Level, Classic SEC
Chapter 3: Embryonics and SEUs (10) 3.5.1. Macro-Cell Level, Protochip SEC • Faults in a row superimposed onto a protochip • In each protochip, independent Poisson processes formed by failure types • a the probability for a type Afailure
Chapter 3: Embryonics and SEUs (11) 3.5.1. Macro-Cell Level, Protochip SEC
Chapter 3: Embryonics and SEUs (12) 3.5.2. Macro-Cell Level, Protochip DEC • a the probability for a type Afailure
Chapter 3: Embryonics and SEUs (13) 3.5.2. Macro-Cell Level, Protochip DEC
Chapter 3: Embryonics and SEUs (14) 3.6. Molecular Level • Molecular reliability λ known
Chapter 3: Embryonics and SEUs (15) 3.6. Molecular Level • Reliability>90%: 28.4 million hours (SEC) VS 63.3 million hours (DEC) periods; • Reliability=50% reached after 89.8 million hours (SEC) VS 154.5 million hours (DEC)
Chapter 3: Embryonics and SEUs (16) 3.7. Conclusions • Final expressions of R and MTTF quite complicated • Failure rate λ essentially empirical • determined through extensive measurements • may be affected by aggressive environments • constant → variable
Chapter 3: Embryonics and SEUs (17) 3.7. Conclusions • Unfortunately, no accurate model for cosmic rays • Understanding causes and modeling soft fails hot field of research • Stochastic nature of soft fails
Chapter 3: Embryonics and SEUs (18) 3.7. Conclusions • Different macro-cell configurations; may prove too small for real applications • Classic reliability analysis difficult, based on non-stochastic parameters • Protochip-based analysis with similar results, better suited to other influences (such as cosmic rays)