1 / 15

Fault-Tolerant Pulse Synchronization Jennifer L. Welch Texas A&M University

Fault-Tolerant Pulse Synchronization Jennifer L. Welch Texas A&M University. Pulse Synchronization. Given a set of nodes in a distributed system that pulse (or fire) repeatedly, how can we get them to fire periodically at the same times?. unsynchronized. Pulse Synchronization.

Télécharger la présentation

Fault-Tolerant Pulse Synchronization Jennifer L. Welch Texas A&M University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fault-Tolerant Pulse Synchronization Jennifer L. Welch Texas A&M University Dagstuhl: September 2008

  2. Pulse Synchronization • Given a set of nodes in a distributed system that pulse (or fire) repeatedly, how can we get them to fire periodically at the same times? unsynchronized Dagstuhl: September 2008

  3. Pulse Synchronization • Given a set of nodes in a distributed system that pulse (or fire) repeatedly, how can we get them to fire periodically at the same times? synchronized Dagstuhl: September 2008

  4. Why Pulse Synchronization? • Understand natural phenomena: • fireflies flashing • crickets chirping • electrically synchronous pacemaker cells • In computer networks: • scheduling duty cycles in sensor networks • used to achieve clock synchronization (nodes have common idea of increasing values) Dagstuhl: September 2008

  5. Firing Oscillators • Mirollo and Strogatz (1990) • mathematical model of a "population of identical integrate-and-fire oscillators" • describe a simple algorithm: when an oscillator fires, it instantaneously causes the others to jump ahead toward their next firing times according to a certain function • show mathematically under what conditions the system converges to synchronous firing Dagstuhl: September 2008

  6. M&S Model  skip cycle Dagstuhl: September 2008

  7. Sensor Networks • Werner-Allen et al. (2005) • "Reachback Firefly Algorithm" (RFA): adapt M&S ideas to sensor networks under realistic communication assumptions • since don’t receive messages instantaneously, collect observations during each cycle and then adjust cycle immediately after each firing Photo source: http://animals.howstuffworks.com/insects/firefly-info.htm Dagstuhl: September 2008

  8. RFA Algorithm  skip cycle Dagstuhl: September 2008

  9. What About Fault Tolerance? • Daliot, D. Dolev, and Parnas (2003, 2008): • adapt ideas about biological fault tolerance in such systems and apply to networks • When a node has heard about the firing of some number of other nodes (by receiving messages), it compares the sum to a threshold function to decide whether to fire • Proved to be self stabilizing and tolerant of up to a third Byzantine faults Dagstuhl: September 2008

  10. Alternative Approach to Fault Tolerance? • Modify RFA, which collects data during a cycle, and then uses it to update cycle • Apply approximate agreement ideas from D. Dolev, Lynch, Pinter, Stark and Weihl (1983, 1986), previously applied to clock synchronization (Welch & Lynch, 1984, 1988) • Eliminate outliers and then perform RFA calculations on the remainder to modify cycle • Might provide a simpler solution than DDP Dagstuhl: September 2008

  11. Fault-Tolerant Averaging • [DLPSW] fault-tolerant outlier-elimination method: • works for problems in which nodes have some numerical values as inputs and want to output numerical values, such as approximate agreement and clock synchronization • to tolerate f Byzantine failures: • eliminate f largest and f smallest values. • For agreement-type problems, do some kind of averaging function on the remaining values Dagstuhl: September 2008

  12. Applying FTA Idea to RFA • Identifying outliers: • values are from a bounded range, not unbounded, so need to worry about wrap-around (cf. S. Dolev & Welch, 1995, 2004) • What to do with remaining values? • currently just doing the original RFA calculation • maybe something cleverer can be done Dagstuhl: September 2008

  13. Preliminary Results • Discrete event simulation considering two kinds of faults: • no jump (faulty node never changes its cycle) • random jump (faulty node changes its cycle by a random amount after each firing) • Appears that • original RFA has some tolerance to these kinds of faults in that it still converges • FT-RFA has better periodicity (after convergence, time between firings is closer to 1) Dagstuhl: September 2008

  14. Still To Do • Understand what is going on • Mathematical analysis to show convergence • maybe it doesn't, could try techniques from [DW] to get more consistent set of firings • Comparison with [DDP] • Lower bound for pulse synchronization on number of nodes to tolerate f faulty nodes? • does known result for clock synch carry over? • Extension to multihop? • known lower bounds on required connectivity for clock synch is probably relevant Dagstuhl: September 2008

  15. Acknowledgments • Radu Stoleru • Keerthi Deconda Dagstuhl: September 2008

More Related