1 / 14

“Revisiting Fault Diagnosis Agreement in a New Territory”

“Revisiting Fault Diagnosis Agreement in a New Territory”. S. C. Wang and K. Q. Yan Operating Systems Review, April 2004, p. 41– 61. An extension of the Byzantine General’s algorithm – and hot off the press. Agreement Problem.

selia
Télécharger la présentation

“Revisiting Fault Diagnosis Agreement in a New Territory”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Revisiting Fault Diagnosis Agreement in a New Territory” • S. C. Wang and K. Q. Yan • Operating Systems Review, April 2004, p. 41– 61. • An extension of the Byzantine General’s algorithm – and hot off the press

  2. Agreement Problem • In the Byzantine General problem there is a commanding general that issues an “order” and all loyal lieutenant generals must come to the same agreement on the order. • A related subproblem is the consensus problem – each processor, which has its own initial value, has to communicate with all other processors to reach a common value among the healthy processors.

  3. Consensus constraints • All the healthy processors agree on the common value (Consensus) • If there exists a common initial value v_i among ALL the processors, then all the healthy processors must agree on v_i Most protocols for solving Byzantine Agreement or consensus are fault-masking protocols – come to consensus without the fault affecting the outcome.

  4. Fault Diagnosis Agreement (FDA) Goal is to make each healthy processor able to detect and locate the faulty components in the distributed system • ALL the healthy processor identify the common set of faulty components in the process of reaching consensus (Agreement) • No healthy component is falsely detected as faulty by any healthy processor (Fairness)

  5. Paper assumes dual failure mode on the network • Most previous papers assume that the faulty components are processors only and that the network is fault-free. • Here we assume that the processors are fault-free and that the network may have a fault. • Also, most other papers assume that the fault is malicious only. Here we assume dual failure: • Malicious faults (a random value is sent), and • Dormant faults (no value/crash or a stuck-at value is sent). Assume that a healthy process can detect components with dormant faults.

  6. Assumptions • A synchronous distributed system whose processors are reliable during the protocol execution • Some faults, crash, stuck-at, noise or an intruder may interfere with message transmission • N-processor fully connected network, with m malicious faults, d dormant faults, m<=ceiling[(n-d-3)/2]

  7. Dual Fault Detection Consensus (DFDC) Algorithm • Three phases: • Message exchange phase • Decision making phase • Fault detection phase • Message exchange phase and the decision making phase is (similar to) OM(1) in the Byzantine General paper. This results in a matrix of information at each processor, MAT_i, which is used to construct a majority vector, MAJ_i

  8. Fault detection phase • Each processor sends every other processor its MAT_i. The MAT_i is used to find the faults by each healthy processor i: • Take the majority value in each position of the matrix to get FDMAT_i • If no majority exists for the i,jth position, use the negative value of the i,jth position of the MAT_j that was sent

  9. P1=0 P2=0 P3=0 malcious faulty dormant faulty P4=1 P5=1

  10. P1=0 P2=0 P3=0 malcious faulty dormant faulty P4=1 P5=1 MAT_1 MAJ_1

  11. P1=0 P2=0 P3=0 malcious faulty dormant faulty P4=1 P5=1 MAT_2,3 MAJ_2,3

  12. P1=0 P2=0 P3=0 malcious faulty dormant faulty P4=1 P5=1 MAT_4 MAJ_4

  13. P1=0 P2=0 P3=0 malcious faulty dormant faulty P4=1 P5=1 MAT_5 MAJ_5

  14. MAT from P1 MAT from P2 MAT from P3 MAT from P4 MAT from P5 FDMAT Fault detection phase with processor P1

More Related