1 / 21

Heavy-ion Fault Injections in the Time-triggered Communication Protocol

Heavy-ion Fault Injections in the Time-triggered Communication Protocol. Håkan Sivencrona, SP Per Johannessen, Volvo Car Corporation Mattias Persson & Jan Torin, Chalmers University of Technology. Agenda. Objective Time-triggered Protocol Membership Agreement Communication Failures

Télécharger la présentation

Heavy-ion Fault Injections in the Time-triggered Communication Protocol

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Heavy-ion Fault Injections in the Time-triggered Communication Protocol Håkan Sivencrona, SP Per Johannessen, Volvo Car Corporation Mattias Persson & Jan Torin, Chalmers University of Technology

  2. Agenda • Objective • Time-triggered Protocol • Membership Agreement • Communication Failures • Heavy-ion Fault Injections • Experimental Set-up • Results • Discussion • Conclusions LADC 2003 São Paulo, Brazil

  3. Objective • Validate the fault hypothesis and fault handling mechanisms of a specific implementation of TTP/C • Use results for improvements of TTP/C and time-triggered systems in general • To gain experience with safety-critical broadcast buses using FI-techniques • Explore new failure modes of time-triggered communication LADC 2003 São Paulo, Brazil

  4. Time-Triggered Protocol • Time Division Multiple Access, TDMA For safety-critical applications • Fault tolerance is mainly implemented as redundant hardware and software mechanisms • Fault Hypothesis: tolerate any single fault • Services: • Deterministic message sending • Clock synchronization • Membership service • Clique avoidance LADC 2003 São Paulo, Brazil

  5. Membership Agreement • Gives a consistent system state • All nodes have a membership vector • The cluster’s membership vector includes the nodes that have the same global state • Every node is represented by a unique bit in the vectors in all nodes LADC 2003 São Paulo, Brazil

  6. Communication Failures • A node stops transmitting messages • Application fault • Controller crash/failure • A message interference in the physical layer • Permanent or temporary persistent • Transient • An asymmetric message interpretation • Byzantine • Omission inconsistent • … and the system behavior depends on the application LADC 2003 São Paulo, Brazil

  7. Heavy-ion Fault Injection • Californium 252 source which radiates heavy-ions with high energy, >> 1 MeV • Causes so-called single event upsets, SEUs, and other effects in the CMOS device • Can affect locations not accessible with other methods • Only statistically reproducible • Low controllability • … LADC 2003 São Paulo, Brazil

  8. Experimental Set-up System with 4-9 nodes with similar message schedules Software that monitors and detects discrepancies LADC 2003 São Paulo, Brazil

  9. Fault Injection Results • Null Frame – No transmission, eg. Fail Silence • Checksum Errors, CRC, Message has the right format but wrong content • Invalid Frame, A message that may or may not be readable but not valid to use • In time domain • In value domain • Time discrepancies, when times are close to the unacceptable LADC 2003 São Paulo, Brazil

  10. CNI-register Error Log Files Error diagnosis field Invalid frame flagged Correct frame received LADC 2003 São Paulo, Brazil

  11. Example of Logged Data LADC 2003 São Paulo, Brazil

  12. Results Fail Silence Violations • Approximately 12 % of all faults were undetected by the FI-node resulting in a fail silence violation • More than 90% of these were CRC faults • The rest were invalid frames • Approximately 0.1 % of all faults were SOS messages, mainly invalid frames in the time domain LADC 2003 São Paulo, Brazil

  13. Fault Injection Results in Cluster • A node stops transmitting messages • The FI-node is silent • Message Interference • Babbling idiot, needed manual reset of the system • Reintegration • Asymmetric interpretation of messages • Asymmetric timing faults – SOS faults in time domain • Asymmetric value faults - SOS faults in value domain • … and the system behavior depends on the protocol implementation and the application LADC 2003 São Paulo, Brazil

  14. Asymmetric value failure scenario LADC 2003 São Paulo, Brazil

  15. Asymmetric timing failure scenario Time deviation in microtics from own clock Node 7 Node 2 LADC 2003 São Paulo, Brazil

  16. Cluster Size Comparisons LADC 2003 São Paulo, Brazil

  17. Concerns Membership vs. Asymmetry • Faulty node remains undetected in case of SOS faults • Applications within the minority partition – system safety? • Protocol membership gives a brittle system • Reintegration – a possible hazard LADC 2003 São Paulo, Brazil

  18. Discussion Application Application Communication Protocol Communication Protocol Physical layer Physical layer • Active star coupler • Modified membership agreement protocol • Algorithms to detect and handle SOS failures Dependability increase Membership Membership LADC 2003 São Paulo, Brazil

  19. Conclusions TTP/C • Partitioning due to asymmetric faults should be resolved smoother and maybe not by forced reintegration • Stronger fault containment regions are needed • Larger system/cluster more resilient against SOS faults LADC 2003 São Paulo, Brazil

  20. General Conclusions • Heavy-ion fault injection is efficient in stressing silicon designs to arbitrary failure modes • High-integrity systems must handle asymmetric and Byzantine faults • Coverage against arbitrary faults is the only realistic approach for safety critical systems but difficult to achieve LADC 2003 São Paulo, Brazil

  21. Questions?Thank you for listening! LADC 2003 São Paulo, Brazil

More Related