170 likes | 316 Vues
This presentation explores a lightweight technique known as selective replication, aimed at addressing soft errors in modern processors. It introduces the concept of soft errors, their causes, and implications for reliability. The technique focuses on protecting the most vulnerable parts of the execution process, leading to significant improvements in reliability with minimal performance degradation and area overhead. The results show a 60% reduction in Failure in Time (FIT) while maintaining less than 4% performance impact. The presentation concludes with comparisons to related approaches and highlights potential future work in the area.
E N D
Selective Replication : A Lightweight Technique for Soft Errors Xavier Vera, JaumeAbella ,Javier Carreteroand Antonio Gonzalez
Outline of Presentation • Introduction • Soft Errors • Selective Replication Technique • Implementation of Selective Replication • Coverage and Results of Selective Replication • Related work • Conclusion
Introduction • Cause for Soft Errors • Why name soft error? • Mean Time To Failure (MTTF) • Failure In Time (FIT)
Introduction • Which structures to protect • FIT(i) = RawErrorRate × TVF × AVFw(i) • Architectural Vulnerability Factor (AVF) • Architecturally Correct Execution (ACE)
Soft Errors • Scaling of the SER Problem • SER Processor • Implementation of the SMT Processor
Selective Replication Technique • What is Selective Replication? • Protecting the Frontend - Map table and free list protection - Logic protection • Protecting the Backend
Selective Replication Technique • Instruction Vs Vulnerability [minAVF + (maxAVF − minAVF)/25× (i − 1), minAVF + (maxAVF − minAVF)/25× i] • Implementing the Vulnerability Threshold
Implementation of Selective Replication • widened ROB Validation • Selective Queue • Priority Selection (P-Sel)
Implementation of Selective Replication • Commit • Branch Misprediction • Vulnerability of the Added Hardware
Coverage and Results of Selective Replication • Functional units are protected • Logic in charge of performing the computation • Latches protected by reexecution • ROB,AGUs are not protected
Coverage and Results of Selective Replication • FIT reduction • Performance slowdown • Power and Area overhead
Coverage and Results of Selective Replication • SQ Size • Vulnerability Threshold • QoS
Related Work • Replicating register values into unused registers • Reis proposed Replicating instructions at the compiler level • IBM G5 replicates the frontend and the execution engine
Conclusion • Article presents a new approach to improve the reliability of modern processors by selectively replicating the most vulnerable instructions • Achieved 60% FIT reduction with less than 4% performance degradation with small area and complexity overhead