190 likes | 426 Vues
Designing a Fast and Adaptive Error Correction Scheme for Increasing the Lifetime of Phase Change Memories. Rudrajit Datta and Nur A. Touba Computer Engineering Research Center Dept. of Electrical and Computer Engineering University of Texas at Austin. Introduction.
E N D
Designing a Fast and Adaptive Error Correction Scheme for Increasing theLifetime of Phase Change Memories Rudrajit Datta and Nur A. Touba Computer Engineering Research Center Dept. of Electrical and Computer Engineering University of Texas at Austin
Introduction • Challenges for traditional memories • Scalability • Device leakage • Retention time • Phase Change Memories (PCM) – a possible substitute • Non-volatile • Amenable to process scaling • High density – 4x DRAM [Seznec 10]
Phase Change Memories • Crystalline state • Low resistance – ‘1’ • Amorphous state • High resistance – ‘0’ • Thermally induced state changes • Scalable • Disadvantages • Relatively quick degradation • ~107 writes [Ferreira 10] • Slow writes • PCM in place of DRAM – fix PCM reliability [Fantini 06]
Previous Work • Hybrid PCM/DRAM [Zhang 09] • OS level paging scheme • BCH code correcting up to 7 errors • Slow • Spread/minimize PCM writes • [Ferreira 10] – minimize PCM writes • [Lee 09] – buffer reorganization and partial writes
Previous Work • Architectural solutions so far • None using novel error correction code (ECC) • PCM errors increasing function of time • Function of writes/cell • Very different from traditional DRAM • Increasing permanent errors
Proposed Scheme • Adaptive Error Correction • OS monitors errors corrected • Signals memory controller • Increase number of check bits • Physical line size of memory unchanged • More check bits, less data bits • Main memory to cache bandwidth affected • Gradually decreasing cache line size • Minimal performance impact • Orthogonal Latin Square (OLS) codes used • Fast – single step decode • Modular
Proposed Scheme OLS Check Bits Word 1 Word 2 Word 3 Word 4 Enhanced ECC Word 1 Word 2 OLS Check Bits Word 3 Into Cache Word 2 Word 1 Word 3
Proposed Scheme Data Regular Check-bitGenerator Enhanced Check-bitGenerator Signal from OS Main Memory Check Bits Information Bits Regular Check-bitGenerator Enhanced Check-bitGenerator Corrected Data
Orthogonal Latin Square Codes • Latin Square • m x m array • Row-columns permutation of digits 0,1,…..m-1 • Orthogonal Latin Squares • Ordered pair of elements (r, c, s) appear only once • m2 data bits, 2tm check bits, t-error correctable [Hsiao 70]
Adaptive ECC • Increase number of check bits per line • Break up line into small segments • Based on number of data bits • Implement ECC separately on each segment • Constraint – original line size unchanged • (Data + ECC)Original = ∑Segments (DataSegment + ECCSegment) • Overall error tolerance goes up
Adaptive ECC Word 1 Word 2 Word 3 Word 4 ECC_OLS Enhanced ECC ECC_OLS Word 1 Word 2 Word 3 Enhanced Adaptive ECC ECC1 ECC2 ECC3 ECC4 Segment 3 Segment 1 Segment 4 Segment 2
Adaptive ECC – Numerical example • Original configuration • 3-bit OLS code on 256-bit line – total 352 bits • Corrects all 3-error patterns and less • Increased check-bits • 25% of data-bits store ECC – 192 data bits • 2 64-bit data segments • 4 16-bit data segments • Check-bits – (352 – 192) = 160 • 3-bit OLS on the 64-bit segments • 2-bit OLS on the 16-bit segments
Adaptive ECC – Numerical example • Enhanced ECC configuration corrects • 99.97% 3-bit errors • 99.73% 4-bit errors • ….. • Small fraction of 14-bit errors • Segmented ECC implementation boosts error tolerance
Results Error Tolerance (no. of errors / no. of bits * 100) for varying memory sizes
Results Percentage of operational memory lines versus number of errors injected out of 100,000 experiments
Results SPEC2006 Benchmarks
Results SPEC2006 Benchmark – bzip2
Conclusion • Novel error correction scheme for PCM • Fast • Adaptive • Graceful decrease in memory capacity • Increases PCM lifetime • Switching period (to enhanced ECC) of the order of years