1 / 28

Assessment of Reliability/ Dependability –COTS Components

Assessment of Digital Equipment for Safety and High Integrity Applications – Session 4 of 6. Assessment of Reliability/ Dependability –COTS Components. Thuy Nguyen and Ray Torok Joint IAEA - EPRI Workshop on Modernization of Instrumentation and Control Systems in NPPs 3 - 6 October, 2006

rigg
Télécharger la présentation

Assessment of Reliability/ Dependability –COTS Components

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assessment of Digital Equipment for Safety and High Integrity Applications – Session 4 of 6 Assessment of Reliability/ Dependability –COTS Components Thuy Nguyen and Ray Torok Joint IAEA - EPRI Workshop on Modernization of Instrumentation and Control Systems in NPPs 3 - 6 October, 2006 Vienna, Austria

  2. Commercial off-the-Shelf (COTS) Components are Attractive • Many advantages • Proven track record • Lower vendor costs • More available • Opportunity to standardize • Features • ……. • However, for applications critical to safety or power production, want assurance of high quality/dependability • Problematic for digital equipment, even more so for COTS • Don’t forget – other industries have this problem too • The alternative, developing new equipment from scratch, is even worse for safety and dependability

  3. Review - Digital “Issues” • New behaviors and failure modes • Greater complexity • Human-machine interface • Software (real-time) • Quality • Limited testability • Common mode failure • Flaws are ‘designed in’ • ‘Like-for-like’ replacement not generally possible

  4. Assessment of COTS Components is Problematic • COT components are usually “evolutionary” • Variable development process • Rely on expertise of individuals • Variable documentation - not up to nuclear safety expectations • Operating history used to detect/fix problems • Still, the end product can be highly dependable • Strong development process is considered important for digital • Vendor cooperation to ‘look inside the box’ to understand design features, defensive measures and failure modes • Can’t (and don’t want to) force vendors to use nuclear safety standards • Want to find and credit all evidence of high dependability

  5. Establishing Assurance Quality / Dependability

  6. Tests and Evaluations Do Not Add Quality, They Seek to Confirm its Existence • Environmental qualification – temperature, humidity, seismic, electromagnetic compatibility, etc. • Functional & challenge testing • Review vendor processes & documentation • Software development • Configuration management • Corrective actions • Manufacturing • Review and credit use of standards, third party certifications as appropriate – TUV, IEC, IEEE, ISO, etc. (with verification)

  7. Tests and Evaluations, cont’d • Operating history assessment (mostly non-nuclear) • Relevance • Extent • Success • Evidence / documentation • Critical design review • software/hardware architectures • failure modes • abnormal behaviors • Grade effort based on complexity and safety significance • Base judgment on preponderance of evidence • Want “reasonable assurance” (there are no guarantees)

  8. EPRI ‘COTS Guidelines’ for Digital • EPRI TR-106439, Guideline on Evaluation and Acceptance of Commercial Grade Digital Equipment for Nuclear Safety Applications, October 1996 • Endorsed by NRC in SER, July 1997 • EPRI TR-107339, Evaluating Commercial Digital Equipment for High Integrity Applications - A Supplement to EPRI Report TR-106439, December 1997 • More detailed, ‘how-to’ guidance • EPRI – 1011710, Handbook for Evaluating Critical Digital Equipment and Systems, November 2005 • Update based on lessons learned

  9. Popular Components for Evaluation Smart transmitter Single loop controller Positioners for air-operated valve Circuit breaker trip controller

  10. General Results of EPRI Component Evaluations • Evolutionary development • Experienced development team • Good manufacturing controls • Successful operating history • Software development documentation lacking • “Continue to run” design philosophy • Limited diagnostics • Failed parts of EMC tests

  11. Lessons Learned – Selecting Devices and Vendors • The purchase price is a small fraction of the overall cost for qualification. (Don’t select device based on price) • Establish acceptable failure modes and abnormal behaviors before selecting candidate devices • If possible, select simplest device that will do the job • Costs for qualification will depend on: • To what extent commercial testing and/or certifications can be credited • What is required to extend device capabilities beyond commercial specifications (e.g. EMC filter) • Complexity of the device • Extent and relevance of device operating history • Level of involvement and cooperation of device vendor

  12. Lessons Learned – Project Planning • Avoid special application requirements or configurations not in accordance with manufacturer recommendations. • Establish appropriate level of QA for control of device, testing, and V&V of test equipment. • Define and budget for mitigation efforts for problems that may be encountered during testing. • Establish method for maintaining qualification.

  13. Lessons Learned – Vendor/Device On-site Review • Review vendor design and development documents before the visit to streamline and focus the on-site review. • Assure the review team has appropriate experience and expertise. • Expect CDR shortcomings and plan for compensation. • Develop a matrix of the critical attributes and methods of verification prior to the on-site review.

  14. Lessons Learned – EMC Qualification • Investigate and credit (if possible) vendor testing to CE Mark, European EMC Directives, etc. • Assure test equipment is immune to expected EMI levels for device qualification testing. • Identify potential device vulnerabilities through informal testing. • Fully understand test laboratory capabilities and expertise of personnel. • Plan and budget for fixes as failures are encountered.

  15. Evaluation of Programmable Logic Controller (PLC) Platforms • Apply the same COTS evaluation techniques • Added complexity increases difficulty • Vendor should take the lead • Three platforms have been “pre-qualified” by US regulator • Siemens Teleperm XS • Invensys/Triconex Tricon • Westinghouse Common Q • Others are considering pre-qualification

  16. Assessment of Digital Equipment for Safety and High Integrity Applications – Session 4 of 6 Inter-Channel / Inter-SystemData Communications Thuy Nguyen and Ray Torok Joint IAEA - EPRI Workshop on Modernization of Instrumentation and Control Systems in NPPs 3 - 6 October, 2006 Vienna, Austria

  17. Data Communication in Digital I&C Systems • Advanced digital I&C architectures may feature data communication between: • Redundant divisions of I&C systems important to safety • I&C systems of different safety classes • Objective: improve error detection and fault tolerance • May concern • Digital upgrade of obsolete analog I&C systems • Digital I&C in new plants

  18. IEEE Standard 603-1998 • Standard Criteria for Safety Systems for Nuclear Power Generating Stations • Independence and physical separation between the redundant channels of a safety system • The failure of one channel cannot adversely affect the ability of redundant channels to perform the necessary safety functions • Credible failures in, and consequential actions by, other systems cannot adversely affect the ability of the safety system to perform their intended safety functions

  19. Data Communication and Digital Common Cause Failures (CCF) • Potential for digital CCF due to possible • Failure of data communication links • Uncommon (but correct) modes of data communication links • These could trigger concurrent digital failures of redundant divisions or multiple systems • Error propagation through data communication • Identification of susceptibilities to digital CCF • Diversity Guideline of BTP-19 and NUREG/CR 6303: 7 forms of diversity • Complementary approach based on the analysis of defensive measures (EPRI D3 technical report TR-1002835)

  20. Defensive Measures for Data Communication • Fault-tolerant overall digital architecture • Single failure criterion • Multiple data communication links • Defensive measures against CCF of multiple links • One-way data communication gateways • Reliable data communication links • Prevention of data communication failures and CCF • Stable data communication conditions • Communicating stations tolerant to: • Data communication links failures • Transmission of erroneous data

  21. One-way gateway to lower safety classes One-way gateway to lower safety classes Voting & Priority Logic Simplified Example of Fault-Tolerant Overall Architecture

  22. Simplified Example of Fault-Tolerant Overall Architecture - Cont’d A B C D A B C D A B C D A B C D 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Division A Division B Division C Division D

  23. Preventing Data Communication Failure • Application of rigorous development standards • Low level of residual faults • As few internal states as possible • Facilitates testing and recovery • Transparency to plant conditions • Data communication links transparent to transmitted data values • Stable data communication rates and conditions • Protection against failures of communicating stations • Stations failures cannot affect communication links behavior besides acknowledgement and transmission of their availability / unavailability • Detection & correction or signaling of data transmission errors

  24. Preventing Data Communication CCF • Different applications and operating conditions • Communicating stations, Data messages, Cycle time, ... • Influence conditions need to be identified, and differences / similarities need to be assessed • Same data communication platform • Design measures can be taken to reduce the likelihood of CCF due to faults in data communication platform • Overall design, Software, Hardware

  25. Stable Data Communication Conditions • Deterministic cyclic functioning of communication links • Fixed cycle time • For each cycle, fixed number of messages of fixed length, of fixed semantics, in a fixed order • Fixed number and identity of communicating stations • Stations withdrawal and reinsertion do not affect the pre-determined cyclic behavior • Stations states (availability / unavailability) transmitted at each cycle • Fixed role for each communicating station • With respect to each message (send / receive / ignore)

  26. Tolerance to Failures of Data Communication Links • Multiple communication links in diverse operating conditions • Reflecting overall redundancy, separation and diversity in the I&C architecture • Identification and characterization of failure modes of communication links • Detection of communication links failures by stations • Safety-classified stations can perform their safety functions or reach safe state even when communication links fail • Protection of stations against communication links failures • Failures cannot affect stations behavior besides the required actions

  27. Tolerance to Transmission of Erroneous Data • Plausibility checks of data received through communication links • Erroneous data caused by a single postulated failure received through communication links cannot prevent a safety classified station from performing its safety functions • May cause safe failures

  28. Conclusion • Appropriate defensive measures can provide reasonable assurance that data communication between redundant channels or safety / non-safety systems will not trigger digital CCF • Measures to be taken within the data communication subsystems, within safety-classified stations, and at the interface between communication subsystems and stations

More Related