1 / 56

HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK

HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK. DEPENDABLE SYSTEMS Vorlesung 8 FAULT-TOLERANT SOFTWARE Wintersemester 99/00 Leitung: Prof. Dr. Miroslaw Malek www.informatik.hu-berlin.de/~rok/ftc. OBJECTIVES. TO INTRODUCE MEASURES OF SOFTWARE RELIABILITY AND COMPLEXITY

santos
Télécharger la présentation

HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK DEPENDABLE SYSTEMS Vorlesung 8 FAULT-TOLERANT SOFTWARE Wintersemester 99/00 Leitung: Prof. Dr. Miroslaw Malek www.informatik.hu-berlin.de/~rok/ftc FTC - VIII - FTS - 0

  2. OBJECTIVES • TO INTRODUCE MEASURES OF SOFTWARE RELIABILITY AND COMPLEXITY • TO INTRODUCE TECHNIQUES FOR RELIABLE AND FAULT-TOLERANT SOFTWARE DESIGN FTC - VIII - FTS - 1

  3. CONTENTS RELIABILITY MEASURES COMPLEXITY MEASURES EXCEPTION HANDLING RECOVERY BLOCKS N-VERSION PROGRAMMING FTC - VIII - FTS - 2

  4. RELIABILITY MODELSJELINSKI - MORANDA MODEL • ASSUMPTION: • A HAZARD RATE FOR FAILURES IS A PIECEWISE CONSTANT FUNCTION AND IS PROPORTIONAL TO THE REMAINING NUMBER OF ERRORS  • z(t) = C [ N - (i - 1) ] • C - PROPORTIONALITY CONSTANT • N - THE NUMBER OF ERRORS INITIALLY IN THE PROGRAM • z(t) IS TO BE APPLIED IN THE INTERVAL BETWEEN DETECTION OF ERROR (i -1) AND DETECTION OF ERROR i • R(t) = EXP { - C [N-(i-1) t] } • MTTF = 1/ { C [N-(i-1)] } FTC - VIII - FTS - 3

  5. SHOOMAN'S MODEL (SIMILAR TO JELINSKI-MORANDA'S) • ASSUMPTIONS: • FAILURE RATE (HAZARD FUNCTION) IS PROPORTIONAL TO THE NUMBER OF REMAINING ERRORS. ERRORS ARE REMOVED AS SOON AS THEY ARE DISCOVERED • R(t) = EXP { -C[ET/IT - EC/IT] t } • ET - THE NUMBER OF ERRORS INITIALLY PRESENT IN THE PROGRAM • IT - THE NUMBER OF STATEMENTS IN THE PROGRAM • EC - THE NUMBER OF ERRORS CORRECTED SO FAR BOTH MODELS ARE ALMOST IDENTICAL FTC - VIII - FTS - 4

  6. SCHNEIDEWIND'S APPROACH • SCHNEIDEWIND APPROACHED SOFTWARE RELIABILITY MODELING FROM EMPIRICAL VIEWPOINT. HE RECOMMENDED INVESTIGATION OF DIFFERENT RELIABILITY FUNCTIONS SUCH AS THE EXPONENTIAL, NORMAL, GAMMA AND WEIBULL AND CHOOSING THE DISTRIBUTION THAT BEST FITS THE PARTICULAR PROJECT IN QUESTION. FTC - VIII - FTS - 5

  7. MUSA'S MODEL(EXECUTION TIME MODEL) • ASSUMPTIONS: • EXECUTION TIME (PROCESS TIME) IS THE BASE FOR SOFTWARE RELIABILITY THEORY INSTEAD OF CALENDAR TIME. z(t) IS PROPORTIONAL TO THE NUMBER OF REMAINING ERRORS. EXECUTION TIME RATE OF CHANGE OF THE NUMBER OF FAULTS CORRECTED IS PROPORTIONAL TO THE HAZARD RATE. • R(t) = EXP { [ -fK(N-m)] t } • t - THE EXECUTION TIME • f - AVERAGE INSTRUCTION EXECUTION TIME DIVIDED BY THE TOTAL NUMBER OF INSTRUCTIONS. • K - PROPORTIONALITY CONSTANT • N -THE NUMBER OF ERRORS INITIALLY PRESENT IN THE PROGRAM • m -THE NUMBER OF ERRORS CORRECTED SO FAR FTC - VIII - FTS - 6

  8. BAYESIAN MODELLITTLEWOOD'S MODEL • ASSUMPTIONS: • EXPONENTIAL DISTRIBUTION OF FAILURES WITH RESPECT TO OPERATING TIME OF THE PROGRAM • INDEPENDENT ERROR DISTRIBUTION • FAILURE RATE PARAMETER IS A RANDOM PROCESS • ERROR IS REMOVED UPON OCCURRENCE • z(t) = (N-i)a / (b + t + t1) • a, b - PARAMETERS OF THE GAMMA DISTRIBUTION • t - TIME ELAPSED SO FAR • R(t) = {1- [(b+t) / (b + t0 + t1)]b +[(b + t0) / (b + t0 + t1 + t)] a} N-i • t1 - EXECUTION TIME FTC - VIII - FTS - 7

  9. CONCLUSIONS • SOFTWARE RELIABILITY MEASURES CAN AT PRESENT BE ACHIEVED WITH PRETTY GOOD ACCURACY IF PROGRAMMING TEAM HAS A SUBSTANTIAL TRACK DATA AND LOTS OF RELIABILITY DATA TO SUPPORT IT • NO STANDARD OR WIDELY ACCEPTED RELIABILITY MODEL EXISTS • LOTS OF EXPERIMENTAL DATA ARE NEEDED • SOFTWARE RELIABILITY MEASUREMENTS SHOULD BE AUTOMATED AND BUILT IN ANY LARGE SOFTWARE PROJECT • PROGRAM COMPLEXITY MEASURES MAY ALSO HELP IN EVALUATION OF SOFTWARE RELIABILITY FTC - VIII - FTS - 8

  10. SOFTWARE COMPLEXITY MEASURES • COMPLEXITY MEASURES SHOULD REFLECT THE DEGREE OF DIFFICULTY INHERENT IN DESIGNING AND IMPLEMENTING INTRICATE DATA STRUCTURES, ALGORITHMS, PROGRAM STRUCTURES, AND INTERMODULAR COMMUNICATION SCHEMES • OVER 300 DIFFERENT SOFTWARE METRICS HAVE BEEN DEFINED AND, TO SOME DEGREE, EXAMINED FOR VALIDITY AND USEFULNESS • TYPICAL MEASURE: • LINES OF CODE • TYPICAL INDUSTRY STANDARDS: • FROM 0.3 TO 2 BUGS PER 1000 LINES OF CODE • TYPICAL REALITY: • ONE BUG PER 100 LINES OF CODE • WORST CASE SEEN: • 2.2 BUGS PER INSTRUCTION • QUALITY ASSURANCE COSTS: • FROM 2% to 80% OF THE TOTAL SOFTWARE DEVELOPMENT COST FTC - VIII - FTS - 9

  11. A STATISTICAL APPROACH TO PROGRAM COMPLEXITY • THE STATISTICAL APPROACH IS BASED ON THE VIEWPOINT THAT THE COMPLEXITY OF THE PROGRAM IS RELATED TO THE NUMBER OF OPERATORS (e.g., +, -, *, /, >, <, IF THEN ELSE, DO WHILE) AND OPERANDS (VARIABLES AND CONSTANTS*) • EXAMPLE: • A FORTRAN PROGRAM WHICH USES NEWTON'S METHOD TO SOLVE FOR THE SQUARE ROOT OF A NUMBER • THE FOLLOWING (SEE NEXT PAGE), IS A TYPICAL PROGRAM TO EVALUATE THE SQUARE ROOT (B) OF A NUMBER (X) * CHARACTER STRING VARIABLES AND CHARACTER STRING CONSTANTS MUST ALSO BE COUNTED FTC - VIII - FTS - 10

  12. READ(5,1)X 1 FORMAT(F10.5) A = X/2 2 B = (X/A + A)/2 C = B - A IF(C.LT.0)C = -C IF(C.LT.10.E-6)GO TO 3 A = B GO TO 2 3 WRITE (6,1)B STOP END 1. THIRTEEN DISTINCT OPERATORS (OPERATOR TYPES): READ, FORMAT, =, /, ( ), +, -, .LT., IF, GO TO, WRITE, STOP, AND END 2. TWENTY-FOUR TOTAL OPERATORS: FIVE =; THREE/; TWO EACH OF ( ), -, .LT., IF, AND GO TO; AND ONE EACH OF THE OTHER SIX 3. ELEVEN DISTINCT OPERANDS (OPERAND TYPES): X, F 0.5, A, 2, B, C, 10.E - 6, 0, AND THE LABELS 1, 2, AND 3 4. TWENTY-FIVE TOTAL OPERANDS: FIVE EACH OF A AND C, FOUR B; THREE X; TWO 2; AND ONE EACH OF THE OTHER SIX THE PROGRAM FTC - VIII - FTS - 11

  13. ESTIMATION OF PROGRAM LENGTH A PRIORI • ESTIMATE THE NUMBER OF OPERATOR TYPES n1. • ESTIMATE THE NUMBER OF DISTINCT OPERANDS n2 BY COUNTING INPUT VARIABLES, OUTPUT VARIABLES, INTERMEDIATE VARIABLES, AND CONSTANTS WHICH WILL BE NEEDED. • ESTIMATE THE PROGRAM LENGTH N = t(0.5772 + lnt) (ZIPF'S LAW) • WHERE t = n1+n2 FTC - VIII - FTS - 12

  14. ESTIMATION OF PROGRAM LENGTH FOR SEVERAL EXAMPLES From M. L. Shooman FTC - VIII - FTS - 13

  15. HALSTEAD'S MEASURE N = n1 LOG n1 + n2 LOG n2 • N - PROGRAM LENGTH (TOTAL NUMBER OF OPERATORS AND OPERANDS) • PROGRAM VOCABULARY • n1 - NUMBER OF OPERATOR TYPES • n2 - NUMBER OF OPERAND TYPES • HALSTEAD AND ZIPF'S LENGTH ARE CLOSELY RELATED SINCE t = n1+n2 THEN  N = (n1 + n2) [0.5772 + LN (n1 + n2)] FTC - VIII - FTS - 14

  16. GRAPH THEORETICAL COMPLEXITY MEASURES(CONTROL FLOW COMPLEXITY) MCCABE'S CYCLOMATIC COMPLEXITY • CYCLOMATIC NUMBER (NULLITY) FOR A STRONGLY CONNECTED GRAPH IS: m(G) = E - N + K • m(G) - THE CYCLOMATIC NUMBER FOR GRAPH G • E - THE NUMBER OF DIRECTED EDGES (ARCS) IN G • N - THE NUMBER OF VERTICES IN G • K - THE NUMBER OF SEPARATE PARTS (COMPONENTS) IN G (K = 1 FOR A SINGLE PROGRAM) • m(G) CORRESPONDS TO: • THE NUMBER OF FUNDAMENTAL CIRCUITS IN GRAPH G • THE NUMBER OF BASIC PATHS THAT CAN BE COMBINED TO MAKE UP ANY POSSIBLE CIRCUIT ON THE GRAPH • THE NUMBER OF CHORDS (EDGES FORMING FUNDAMENTAL CIRCUITS FROM A SPANNING TREE OF GRAPH G)  • m(G) IS ALSO A GOOD SOFTWARE COMPLEXITY MEASURE FTC - VIII - FTS - 15

  17. a 1 Added phantom b edge (to make a graph 2 strongly-connected) c 3 d 10 4 e 13 12 g 5 11 f 6 g 7 h 8 i EXAMPLE McCABE NUMBER IS m(G) = e - n + 1=13 - 10 + 1= 4 In practice all modules in top-down structured design should have m(G) = 10 FTC - VIII - FTS - 16

  18. HSU AND MALEK'S CUMULATIVE MINIMAL CUT-SETS MEASURE (MCS) • MCS IS A SUM OF THE NUMBER OF EDGES CUT AFTER EVERY DECISION BLOCK MCS=5 MCS=6 MCS=7 MCS=8 (a) (b) (c) (d) S-graphs. FTC - VIII - FTS - 17

  19. RELIABLE SOFTWARE DESIGN • FAULT ANALYSIS - use fault trees • FAULT DETECTION - use methods (as shown in Testing Techniques Unit) • DAMAGE CONFINEMENT AND ASSESSMENT • RECOVERY • FAULT TREATMENT AND CONTINUED SERVICE FTC - VIII - FTS - 18

  20. FAULT ANALYSIS (1) • Fault characterization  • Faults are detected when an operation violates its specification • Three modes of operation: • The operation may return • An exception may be raised • The operation may fail to terminate FTC - VIII - FTS - 19

  21. FAULT ANALYSIS (2) • OPERATION TERMINATES • BY RETURNING AND • THE POSTCONDITION IS SATISFIED (CORRECT OUTCOME) • THE POSTCONDITION IS NOT SATISFIED • WHEN AN EXCEPTION IS RAISED • BY THE OPERATION AND • THE EXCEPTION CONDITION IS SATISFIED (CORRECT OUTCOME) • THE EXCEPTION CONDITION IS NOT SATISFIED • BY ANOTHER OPERATION AND • THE EXCEPTION CONDITION IS SATISFIED • THE EXCEPTION CONDITION IS NOT SATISFIED • OPERATION FAILS TO TERMINATE AND • THE OPERATION MAKES NO FURTHER CHANGES TO ITS OBJECT • THE OPERATION CHANGES ITS OBJECT REPEATEDLY FTC - VIII - FTS - 20

  22. OPERATIONS OUTCOME(Fig. 1 I. Durham's Dissertation, CMU, 1986)DAMAGE CONFINEMENT AND ASSESSMENT • CONFINE THE DAMAGE • SPREAD DURING THE TIME BETWEEN THE ERRONEOUS TRANSITION AND THE DETECTION OF THE ERROR • PLACE CONSTRAINTS ("FENCES") ON SYSTEM INFORMATION FLOW • ATOMIC ACTIONS • ASSESS THE DAMAGE • BEFORE STARTING RECOVERY • BASED ON PREDICTABLE PROCESS INTERACTIONS FTC - VIII - FTS - 21

  23. T 1 P1 T T 3 2 P2 T TIME PRESENT ATOMIC ACTIONS • GENERALIZATION OF A TRANSACTION FROM DATABASE DESIGN • NO INTERACTIONS BETWEEN THE ACTIVITY OF A GROUP OF COMPONENTS AND THE REST OF A SYSTEM DURING THE ACTIVITY • PARALLELISM INCREASES SPREAD OF DAMAGE • PREVENT UNWANTED INFORMATION FLOW • FIREWALLS (FENCES) • LIMIT SPREAD OF DAMAGE EVEN IF FAULTS ARE PRESENT TPRESENT - time of error detection  T3 - time when process P2 starts autonomous action. (If a fault occurs in P2 after T3 then P2 has no impact on P1)  If a fault occurs in P1 then both P1 and P2 should be recovered FTC - VIII - FTS - 22

  24. HARDWARE ENFORCED ATOMIC ACTIONS • MESSAGE PASSING SYSTEMS • SIGNAL EXCEPTION IF ATTEMPT FROM WITHIN ATOMIC ACTION TO SEND OR RECEIVE A MESSAGE FROM OUTSIDE OF ATOMIC ACTION • COMMUNICATIONS THROUGH SHARED OBJECTS • CONSTRAIN EXECUTION SEQUENCES BY ALLOWING ONLY PROCESSES INVOLVED IN ATOMIC ACTION TO EXECUTE • USE SEPARATE PROCESSORS • CONSTRAIN DATA ACCESS BY ISOLATING OBJECTS WITHIN AN ATOMIC ACTION FROM OTHER OBJECTS • LOCKS, SEMAPHORES, MONITORS • INTERRUPT DISABLING • DATABASE LOCKING • PROTECTION MECHANISMS FTC - VIII - FTS - 23

  25. PROTECTION MECHANISMS • LIST-BASED SCHEMES • ASSOCIATE WITH EACH POTENTIALLY SHAREABLE OBJECT A LIST OF AUTHORIZED USERS; CHECK EACH ACCESS FOR AUTHORIZATION • OBJECT • PROCESS X • PROCESS Y • PROCESS Z • EASY TO ENFORCE • COSTLY RUN-TIME EFFORT • TICKET-BASED SCHEMES • ASSOCIATE WITH EACH PROCESS A LIST OF RIGHTS TO ACCESS AN OBJECT; CHECK EACH PROCESS FOR ITS ACCESS TICKET • CAPABILITY-BASED • READ, WRITE, EXECUTE CAPABILITIES • PROCESS • OBJECT X • OBJECT Y • OBJECT Z • EFFICIENT RUN-TIME EFFORT • DIFFICULT TO ENFORCE • PROCESS MIGHT INCORRECTLY GIVE TICKET AWAY FTC - VIII - FTS - 24

  26. RECOVERY • ROLL-FORWARD (FORWARD ERROR RECOVERY) • REAL-TIME SYSTEM (OLD DATA ARE OBSOLETE) • DAMAGE IS PREDICTABLE • PERFORMANCE IS MORE IMPORTANT THAN CORRECTNESS • ROLL-BACK (BACKWARD ERROR RECOVERY) • UNPREDICTABLE DAMAGE • MORE EXPENSIVE • CORRECTNESS IS MORE IMPORTANT THAN PERFORMANCE FTC - VIII - FTS - 25

  27. ROLL-BACK TECHNIQUES • CHECKPOINTING • AUDIT TRAILS (AUDIT LOGS) • FILE CHECKPOINTS • TRANSACTION PROCESSING • RECOVERY CACHES (SPECIAL TYPE OF CHECKPOINT TO MINIMIZE STORAGE) • OBJECTIVE: • REPLACEMENT OF CURRENT FAULTY STATE WITH PREDEFINED CORRECT STATE FTC - VIII - FTS - 26

  28. FAULT-TOLERANT SOFTWARE DEFINITIONS • FAULT-TOLERANT SOFTWARE IS CONCERNED WITH TECHNIQUES WHICH ENABLE A SYSTEM TO TOLERATE SOFTWARE FAULTS • SOFTWARE FAULT TOLERANCE IS CONCERNED WITH TECHNIQUES DESIGNED TO TOLERATE THE EFFECTS OF FAULTS IN THE UNDERLYING (HARDWARE) INTERPRETER. THOSE TECHNIQUES ARE IMPLEMENTED IN SOFTWARE MAIN FAULT-TOLERANT SOFTWARE TECHNIQUES  • EXCEPTION HANDLING • RECOVERY BLOCKS (ROLL-BACK) • N-VERSION PROGRAMMING • ROLL-FORWARD FTC - VIII - FTS - 27

  29. EXCEPTION • AN INDICATION THAT SOMETHING OUT OF THE ORDINARY HAS OCCURRED • SHOULD BE DETECTED AND RAISED • EXAMPLES • NONEXISTENT MEMORY ACCESS (OR OTHER PROTECTION VIOLATION) • DIVISION BY ZERO • UNDERFLOW OR OVERFLOW • SYSTEM CRASH • TIMEOUT FTC - VIII - FTS - 28

  30. EXCEPTION MECHANISMS • PROVIDE CAPABILITY IN THE PROGRAM TO EXPLICITLY RAISE AN EXCEPTION IF THE PROGRAM DETECTS AN ERRONEOUS STATE • RAISE (EXCEPTION NAME, OTHER PARAMETERS) • ALLOW A SYSTEM ROUTINE TO SIGNAL THE RAISING OF AN EXCEPTION BY A CALLING PROGRAM • SIGNAL (EXCEPTION NAME, OTHER PARAMETERS) • ALLOW USER-DEFINED EXCEPTIONS • EACH PROCEDURE SHOULD HAVE TWO EXITS: • A STANDARD EXIT AND AN EXCEPTION EXIT FTC - VIII - FTS - 29

  31. EXCEPTION EXAMPLES • OV IS THE OVERFLOW EXCEPTION; BACKWARD ERROR RECOVERY IS EXPLICITLY PROGRAMMED • ADD PROCEDURE I = I + J + K PROC P SIGNALS OW; BEGIN I = I + J [OV -> SIGNAL OW]; /*EXPLICITLY PROGRAM */ I = I + K [OV -> I = I - J; SIGNAL OW]; /*BACKWARD RECOVERY*/ END; • PROPAGATION OF THE EXCEPTION IS MASKED BY THE PROCEDURE; THE PROCEDURE PROVIDES STANDARD SERVICE IN SPITE OF THE EXCEPTION WHICH OCCURRED IN THE CALLED ROUTINE PROC CREATE (Z: POSITIVE.INTEGER) SIGNALS NO.SPACE; BEGIN CALL M1.ALLOCATE(Z) [DISK.OVERFLOW -> CALL M2.ALLOCATE(Z) [DISK.OVERFLOW -> RECOVER RECOVERY POINT; SIGNAL NO.SPACE]]; . . . END; FTC - VIII - FTS - 30

  32. THE RECOVERY BLOCK SCHEME (1) • PRIMARY MODULE • DEBUGGED AND TESTED IRREDUNDANT SOFTWARE WHICH HOPEFULLY MEETS SPECIFICATIONS • ACCEPTANCE TEST • MEASURES FOR ERROR DETECTION IN THE FORM OF A CHECK IN THE REASONABLENESS OF THE CALCULATED RESULTS • STRUCTURE • PRIMARY MODULE ACCEPTANCE TEST FTC - VIII - FTS - 31

  33. THE RECOVERY BLOCK SCHEME (2) • ACCEPTANCE TEST SHOULD RAISE AN EXCEPTION IF THE STATE OF THE SYSTEM IS NOT ACCEPTABLE • THE MODULE IS SAID TO BE FAILED IF ANY EXCEPTION IS RAISED OR SIGNALED BY THE ACCEPTANCE TEST • RECOVERY POINT IS ESTABLISHED AT THE BEGINNING OF THE PRIMARY MODULE • ALTERNATE MODULE IS A STANDBY SOFTWARE TO BE EXECUTED IF PRIMARY MODULE FAILS FTC - VIII - FTS - 32

  34. THE SYSTEM WILL NOW CONSIST OF: • ESTABLISH RECOVERY POINT • PRIMARY MODULE • ACCEPTANCE TEST • ALTERNATE MODULE • ACCEPTANCE TEST • ADDITIONAL ALTERNATE MODULES CAN BE PROVIDED TO ENSURE BETTER SOFTWARE FAULT TOLERANCE FTC - VIII - FTS - 33

  35. RECOVERY BLOCK STRUCTURE ENSURE < ACCEPTANCE TEST > BY < PRIMARY MODULE > ELSE BY < ALTERNATE MODULE 1 > ELSE BY < ALTERNATE MODULE 2 > ELSE BY < ALTERNATE MODULE N > ELSE ERROR • NOTICE THAT THE RECOVERY BLOCK SCHEME IS INDEPENDENT OF • PROGRAMMING STYLE • METHODOLOGY • LANGUAGE FTC - VIII - FTS - 34

  36. EXAMPLE ENSURE A[j+1]  A[j] FOR j=1,2,...,n-1 BY SORT A USING QUICKSORT ELSE BY SORT B USING HEAPSORT ELSE BY SORT C USING BUBBLESORT ELSE ERROR FTC - VIII - FTS - 35

  37. OPTIMUM INTERVAL BETWEEN RECOVERY POINTS • OPTIMUM INTERVAL BETWEEN RECOVERY POINTS SHOULD BE ESTABLISHED ON THE BASIS OF • COMPLEXITY MEASURES OR • 2tT • WHERE t IS TIME TAKEN TO ESTABLISH A RECOVERY POINT • T IS THE MEAN TIME BETWEEN RECOVERY POINT RESTORATIONS • IF RECOVERY POINTS ARE TOO FREQUENT THEN HIGH PROPORTION OF TIME WILL BE SPENT RECORDING RECOVERY DATA. IF THE POINTS ARE TOO RARE THEN IT MAY TAKE A LONG TIME TO RECOVER. FTC - VIII - FTS - 36

  38. i REQUEST REQUEST i + 1 PM AM 7 ms 3 ms 10 ms ONE SECOND CASE STUDY - THE DEADLINE MECHANISM • THE DEADLINE MECHANISM IS INTENDED TO SUPPORT SOFTWARE FAULT TOLERANCE IN REAL-TIME APPLICATIONS WHERE A PROGRAM HAS TO SATISFY REQUESTS FOR SERVICE WITH A GIVEN TIME LIMIT (DEADLINE) OR ELSE SYSTEM FAILURE IS LIKELY TO ENSUE EXAMPLE: NAVIGATION PROGRAM OUTLINE EVERY SECOND WITHIN TEN MS CALCULATE BY READ SENSORS; CALCULATE NEW POSITION PRIMARY MODULE(PM)  ELSE BY APPROXIMATE NEW POSITION FROM OLD POSITION ALTERNATE MODULE(AM) TIMING CONSTRAINTS FOR NAVIGATION PROGRAM • PRIMARY FIRST THEN ALTERNATE IF NECESSARY (OPTIMISTIC SCHEME) OR • ALTERNATE FIRST AND THEN PRIMARY (PESSIMISTIC SCHEME) FTC - VIII - FTS - 37

  39. TECHNIQUES FOR THE ACCEPTANCE TESTS • REASONABLENESS TESTS • TIMING TESTS (TIMEOUTS) • REVERSAL TESTS • REPLICATION TESTS • ACCEPTANCE TEST HAS A MAJOR IMPACT ON THE EFFECTIVENESS OF A RECOVERY BLOCK METHOD • COMPLETENESS AND THEREFORE THE SIZE OF THE ACCEPTANCE TEST DEPENDS ON THE SYSTEM REQUIREMENTS  • DIFFICULT PROBLEM: • TEST TIME AND STORAGE OVERHEAD VERSUS COMPLETENESS • ACCEPTANCE TEST OVERHEADS ARE USUALLY BETWEEN 3% AND 40% FTC - VIII - FTS - 38

  40. THE N-VERSION PROGRAMMING • N-VERSIONS OF A PROGRAM (N>1), WHICH HAVE BEEN INDEPENDENTLY DESIGNED TO SATISFY A COMMON SPECIFICATION, ARE EXECUTED AND THEIR RESULTS COMPARED BY VOTING MECHANISM • VERSIONS MAY BE WRITTEN BY DIFFERENT PROGRAMMERS AND/OR DIFFERENT ALGORITHMS MAY BE USED • IMPLEMENTATIONS MAY BE ON: • A SINGLE PROCESSOR • MULTIPLE HOMOGENEOUS PROCESSORS • MULTIPLE HETEROGENEOUS PROCESSORS FTC - VIII - FTS - 39

  41. DRIVER PROGRAM • THE DRIVER PROGRAM (CONTROL PROGRAM) IS NEEDED • THE DRIVER PROGRAM IS RESPONSIBLE FOR: • INVOKING EACH OF THE VERSIONS • WAITING FOR THE VERSIONS TO COMPLETE THEIR EXECUTION • COMPARING AND ACTING UPON THE N SETS OF RESULTS FTC - VIII - FTS - 40

  42. VOTING CHECK • VOTING CHECK IS PERFORMED BY THE DRIVER PROGRAM OR A HARDWARE VOTER • IT SHOULD BE APPLICATION-INDEPENDENT BUT IN PRACTICE IT VERY OFTEN NEEDS "INEXACT" COMPARISONS DUE TO THE ROUNDOFF ERRORS IN e.g., ARITHMETIC OPERATIONS • SOME FORM OF RANGE CHECK CAN BE USED BUT IN GENERAL "INEXACT VOTING" REQUIREMENT IS A PRETTY DIFFICULT PROBLEM FTC - VIII - FTS - 41

  43. CASE STUDYONN-VERSION PROGRAMMING(NVP) • DESIGN DIVERSITY APPLIES TO BOTH: • HARDWARE AND SOFTWARE  • NVP PROMISES • TOLERANCE OF DESIGN FAULTS (WHICH ARE USUALLY THE MOST DIFFICULT TO DETECT) • Based on: A. Avizienis and J.P.J. Kelly, "Fault Tolerance by Design Diversity: Concepts and Experiments", Computer, August, 1984 FTC - VIII - FTS - 42

  44. N-VERSION PROGRAMMING(NVP) • TWO REQUIREMENTS FOR SUCCESSFUL NVP (N-VERSION PROGRAMMING) AND/OR NMR (N-MODULAR REDUNDANCY) • THE CONSISTENCY OF INITIAL CONDITIONS AND INPUTS FOR ALL N COMPUTATIONS MUST BE ASSURED • A RELIABLE DECISION ALGORITHM THAT DETERMINES A SINGLE DECISION RESULT FROM THE MULTIPLE RESULTS MUST BE PROVIDED • TIME-SPACE TRADEOFFS • SEQUENTIAL VS. CONCURRENT • EXECUTION OF N VERSIONS FTC - VIII - FTS - 43

  45. NVP - EXPERIMENT (1) TWO QUESTIONS: • WHAT IS REQUIRED IN TERMS OF • FORMAL SPECIFICATIONS • CHOICES OF PROBLEMS • NATURE OF ALGORITHMS • TIMING CONSTRAINTS (SYNCHRONIZATION) • DECISION ALGORITHMS TO MAKE NVP FEASIBLE? • WHAT METHODS SHOULD BE USED TO COMPARE THE COST AND THE EFFECTIVENESS OF THE NVP APPROACH TO: • SINGLE VERSION PROGRAMMING AND THE RECOVERY BLOCK METHOD? FTC - VIII - FTS - 44

  46. NVP - EXPERIMENT (2) • 30 PROGRAMMERS WERE SELECTED • STEPS OF EXPERIMENT • RECRUITING • TEACHING OBJ AND PDL (SPECIFICATION LANGUAGES) • EXAMINING AND RANKING • ASSIGNING THE PROBLEM • EVALUATING PROGRAMS • PROBLEM (SPECS WRITTEN IN OBJ, PDL AND ENGLISH) • AIRPORT SCHEDULER (OPERATION OF AN AIRPORT, ARRIVING AND DEPARTING FLIGHTS, SEAT RESERVATION) • 100 TRANSACTIONS WITH N-VERSION • CROSS-CHECK AFTER EACH TRANSACTION FTC - VIII - FTS - 45

  47. NVP - EXPERIMENT (3) • OBJ SPECIFICATION WAS 13 PAGES LONG • PDL SPECIFICATION WAS 74 PAGES LONG • ENGLISH SPECIFICATION WAS 10 PAGES LONG • AFTER 4 WEEKS 18 WORKING PROGRAM VERSIONS OF THE AIRPORT SCHEDULER WERE RECEIVED. PROGRAMS PASSED THE STANDARD VIABILITY TEST • 11 OF 18 PROGRAMS ABORTED ON INVALID INPUT • (VERY DANGEROUS IF NVP IS IMPROPERLY IMPLEMENTED; IT MAY STOP THE SYSTEM) • THEREFORE AN ACCEPTANCE TEST WITH RECOVERY WAS ADDED TO ALL 11 PROGRAMS FTC - VIII - FTS - 46

  48. NVP - EXPERIMENT (4) • 18 PROGRAMS • 7 OBJ • 5 PDL • 6 ENGLISH • Table 1. Characteristics of the 18 versions of programs written in OBJ, PDL, and English by UCLA study participants. For each program version, the number of PL/1 statements used in the program (PL/1 STMTS), the number of procedures used (NO. PROCS), the compile time (COMPILE TIME), the execution time for the 100 transaction test case (EXEC. TIME), and the program size in bytes (SIZE BYTES) are shown. The time unit is a machine-unit second, which is a measure of time, and other resources, such as I/O needs. FTC - VIII - FTS - 47

  49. NVP – EXPERIMENT (Table 1) NAME OF PL/1 NO. COMPILE EXEC. SIZE VERSION STMTS PROCS TIME TIME BYTES OBJ1 423 22 15.14 3.89 37600 OBJ2 400 28 11.35 3.96 28048 OBJ3 398 17 7.42 4.33 30904 OBJ4 328 14 8.62 4.77 29920 OBJ5 455 14 14.79 3.10 32304 OBJ6 243 16 4.71 2.70 20960 OBJ7 336 23 8.30 4.92 34808 PDL1 455 27 16.96 3.16 24928 PDL2 501 33 19.58 19.68 29656 PDL3 242 19 4.31 4.09 27360 PDL4 437 39 16.31 2.84 30896 PDL5 217 11 4.26 4.30 26440 ENG1 260 21 4.75 3.33 27552 ENG2 372 19 12.41 3.89 37792 ENG3 385 30 8.12 2.41 20648 ENG4 689 25 28.23 2.94 26864 ENG5 481 15 8.76 2.42 24056 ENG6 387 12 19.23 3.99 24656 FTC - VIII - FTS - 48

  50. TEST RESULTS (Table 2) • Test results for individual versions of the 18 programs written in OBJ, PDL, and English OK COSMETIC GOOD DETECTED UNDET. VERSION POINTS ERRORS OK OR COS. ERRORS ERRORS OBJ1 73 0 73 2 25 OBJ2 71 18 89 8 3 OBJ3 67 11 78 4 18 OBJ4 69 3 72 8 20 OBJ5 67 12 79 0 21 OBJ6 46 0 46 0 54 OBJ7 52 17 69 7 24 PDL1 59 2 61 1 38 PDL2 54 2 56 32 12 PDL3 95 0 95 4 1 PDL4 45 28 73 0 27 PDL5 94 0 94 5 1 ENG1 74 12 86 0 14 ENG2 67 27 94 0 6 ENG3 97 1 98 0 2 ENG4 30 5 35 25 40 ENG5 55 6 61 0 39 ENG6 53 3 56 9 35 •OK - EXPECTED RESULT; •COSMETIC-BAD FORMAT •DETECTED (D) ERROR - IF PROGRAM FAILED AT TYPOS, SPELLING; •UNDETECTED (U or U*) •IF DETECTED BY COMPARISON WITH OTHER VERSIONS U-DISTINCT U*-SIMILAR •GOOD =OK+COSMETICS(G) FTC - VIII - FTS - 49

More Related