1 / 30

Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj

Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj. Motivation. Soft errors – issue for correct operation of CMOS circuits Problem becomes more severe – ITRS 2009 Smaller device sizes Low supply voltages Effect of soft errors on circuits

justin
Télécharger la présentation

Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assuring Application-level Correctness Against Soft Errors Jason Cong and KarthikGururaj

  2. Motivation • Soft errors – issue for correct operation of CMOS circuits • Problem becomes more severe – ITRS 2009 • Smaller device sizes • Low supply voltages • Effect of soft errors on circuits • Karnik 2004, Nguyen 2003 • Effect of soft errors on software and processors • Li et al 2005, Wang et al 2004

  3. Motivation • Traditional notion of correctness • Every last bit of every variable in a program should be correct • Referred to as numerical correctness • Application-level correctness • Several applications can tolerate a degree of error • Image viewer, video decoding etc • However, there exist critical instructions even in such applications • Example: state machine in video decoder

  4. Motivation • Goal: Detect all “critical” instructions in the program • Protect “critical” instructions in the program against soft errors • Using duplication

  5. Outline • Motivation • Definition of critical instructions • Program representation • Static analysis to detect critical instructions • Profiling and runtime monitoring • Results

  6. Outline • Motivation • Definition of critical instructions • Program representation • Static analysis to detect critical instructions • Profiling and runtime monitoring • Results

  7. Defining critical instructions • Elastic outputs – program outputs which can tolerate a certain amount of error • Media applications – image, video etc • Heuristics – Support vector machine • Characterizing quality of elastic outputs – Fidelity metric • Example: PSNR (peak signal to noise ratio) for JPEG, bit error rate,

  8. Defining critical instructions • Given application A: • I is the input to the application • A set of outputs Oc - numerical correctness required • A set of elastic outputs O • Fidelity metric F(I,O) for elastic outputs • T – threshold for acceptable output • An execution of A is said to satisfy application-level correctness if: • All outputs εOc are numerically correct • F(I,O) ≥ T for elastic outputs • Nmin – the minimum number of elements of O that need to erroneous for F(I,O) to fall below T

  9. Example: JPEG decoder • PSNR of 35dB is assumed to be good quality • MSE = 20.56 • Using 8-bit pixel values (MAX=255), • Max error = 255 • For a 1024x768 pixel image, Nmin ~ 251

  10. Defining critical instructions • An instruction X is said to be critical if • X affects one of the outputs of Oc (numerical correctness required) OR • X affects Nmin elastic output elements O

  11. Outline • Motivation • Definition of critical instructions • Program representation • Static analysis to detect critical instructions • Profiling and runtime monitoring • Results

  12. Program representation • LLVM compiler infrastructure • LLVM intermediate representation • Weighted program dependence graph (PDG) – G

  13. Example LLVM IR – 3 address code

  14. Example PDG - based on LLVM IR

  15. Example Node for computing X

  16. Example Node for computing X Node (out_i) to compute C[Z]+X Node (so) to store C[Z]+X into array output

  17. Example Node for computing X Node (so) to write to output array Node (so) to store C[Z]+X into array output Edge to represent dependence between X and out_i Edge to represent dependence between out_i and so

  18. Assigning edge weights • Edge weight u→v - how many instances of node v are affected by 1 instance of u? • Example: • X outside the loop, out_i inside the loop • Edge weight N • Nodes out_i and so are in the same basic block – • Edge weight 1

  19. Outline • Motivation • Definition of critical instructions • Program representation • Static analysis to detect critical instructions • Profiling and runtime monitoring • Results

  20. Static analysis for detecting critical instructions • Find how many instances of output O are affected by node x • propagate(x →v) is the number of instances of v that are affected by an instance of x

  21. Example • propagate(u→v) initialized to edge weight for all edges (u →v) • propagate(X →out_i) = N • w(out_i →so) = 1 • propagate(X →so) = propagate(X →out_i) * w(out_i →so) • More formally

  22. Outline • Motivation • Definition of critical instructions • Program representation • Static analysis to detect critical instructions • Profiling and runtime monitoring • Results

  23. Profiling and runtime monitoring • Static analysis is conservative in nature • May produce overly pessimistic results • Main reason – edge weights are initialized too high • Profiling with test inputs to estimate edge weights

  24. Example • Assum static analysis overestimates edge weight between sc and c_z • Profiling gives value of 1 • Node sc is likely non-critical (LNC) • Contrast this with node X which is static critical

  25. Profiling and runtime monitoring • Likely critical instructions – duplicated and checked in software • Using the SWIFT method proposed by Reis et al 2005 • Likely non-critical instructions – monitored using lightweight runtime monitoring technique • Static non-critical instructions – no error checking

  26. Outline • Motivation • Definition of critical instructions • Program representation • Static analysis to detect critical instructions • Profiling and runtime monitoring • Results

  27. Results • Benchmarks for Mediabench, SPEC, Mibench • Simics/GEMS simulation infrastructure

  28. Static instruction classification • Significant number of instructions are non-critical • Profiling helps to determine likely non-critical instructions

  29. Comparison with previous work • Significant savings over approach proposed by Thaker et al • Protects all instructions which compute memory addresses and control flow

  30. Conclusion • Static + dynamic technique for detecting critical instructions • Detect several non-critical instructions • Reduce overall energy by 25%

More Related