730 likes | 1.03k Vues
Advancing Medical Equipment Maintenance using RCM Methodology. Malcolm G. Ridgway, Ph.D., CCE Senior Vice President, Technology Management Masterplan, Inc., Chatsworth, California. How A Machine Fails Traditional / Classical Concept (Pre-1945). First Generation Maintenance (Pre-1945).
E N D
AdvancingMedical Equipment MaintenanceusingRCM Methodology Malcolm G. Ridgway, Ph.D., CCESenior Vice President, Technology Management Masterplan, Inc., Chatsworth, California
How A Machine FailsTraditional / Classical Concept(Pre-1945)
First Generation Maintenance (Pre-1945) Was – like the machines – relatively simple. Primary maintenance strategy was “keep it looking sharp” and “Run To Failure” Primary maintenance tool was an oily rag
How A Machine FailsSecond Generation ConceptThe “Bath Tub” Curve
Second Generation Maintenance (1945 - 60) Was – like the machines – a little more complex because the consequences of unreliable machines had become more serious (economically). Maintenance strategy – Fixed Interval Overhauls PM was still relatively primitive – more of a craft than a science, and based on the manufacturer’sexperience-based (?) recommendations.
Third Generation Maintenance(1960s) Became – like the machines – considerably more complex. The civil aviation industry became the driver on machine reliability because of the FAA’s concerns for the public safety 1960 - FAA established a Task Force which became known as the Maintenance Steering Group (MSG) 1968 – Landmark document (MSG-1) revolutionized the maintenance business and made the 747 viable
How Machines Really FailThird Generation ConceptBased on FAA data
In the case of aircraft components • Only 6% show a wear-out failure (Type B) pattern • And only 14% have a random failure (Type E) pattern Whereas • 72% show an infant mortality (Type F) characteristic
The Famous Moment of Enlightenment in the 1960s… ...About Scheduled Maintenance
How This New Approach To Maintenance Made Jumbo Jets Economically Feasible DC8 – Required the scheduled overhaul of 339 items and 4M man-hours of maintenance prior to its 20,000 hour inspection DC10 – Required the scheduled overhaul of 7 items and 66K man-hours of maintenance prior to its 20,000 hour inspection The DC10 is 3X larger, more complex, and 200X more reliable than the DC8 The “event” rate of the DC 8 is 60 per million takeoffs; The “event” rate of the DC10 is 0.3 per million takeoffs.
The 1970s Introduction of the systems approach to maintenance 1974 – DOD contracted with United Airlinesto document the maintenance processes being used by the civil aviation industry, and directed that the new approach embodied in the pioneering new concepts be labeled Reliability-Centered Maintenance (RCM). 1978 – Publication of the book “Reliability-Centered Maintenance” by Stanley Nowlan and Howard Heap.
Explosive growth of RCM during the 80s & 90s The military adopts RCM for its ships (including its nuclear submarines) and its aircraft NASA joins in with its Shuttle Program The utility industry adopts RCM for many of its power stations, including its nuclear power plants. 1982 – MSG-3 rev 2 Type Certification for the 757/ 767
What Exactly Is Reliability-Centered Maintenance? • Uses processes based on modern reliability analyses • Considers the entire system: equipment; accessories; user; maintainer; environment; utilities; & the patient • Focuses on maintaining the device’s function with minimum downtime and acceptable levels of safety • Uses FMEA to define what can go wrong and why • Uses precise effectiveness metrics and criteria for whether or not proactive maintenance is cost effective • If interval-based maintenance is feasible, it provides precise formulas for what the intervals should be
Benefits (claimed to result) from using RCM Increased reliability – 50-70% reduction in repairs Increased availability – 25-50% reduction in downtime Greater maintenance cost effectiveness Improved levels of safety Longer useful life of maintained items Creation of comprehensive maintenance databases
Current Joint Commission Standards Standard EC.02.04.01 The hospital manages medical equipment risks Elements of Performance for EC.02.04.01 • The hospital identifies the activities, in writing, for maintaining, inspecting, and testing for all medical equipment on the inventory Note:Hospitals may use different strategies for different items, as appropriate. For example, strategies such as predictive maintenance, reliability-centered maintenance, interval-based inspections, corrective maintenance, or metered maintenance may be selected to ensure reliable performance.
Reality Check • Maintenance (particularly PM) is an issue of declining importance - relative to several other equipment issues (such as use errors and network connectivity) • But we are still dedicating an estimated 3000 FTEs (costing about $300M /year) to our PM programs • We could (and should) be doing something more productive and more valuable with these resources !
Key PM Issues • We still do not have a good consensus on what we mean by the term “PM”, or even why we do it ! 2. Although the Joint Commission has allowed us to exclude “non-critical” devices from our PM programs since 1989, we still don’t have a rational definition for a non-critical/ non-life-support device. 3. We don’t have any good methods for justifying the PMintervals that we use. 4. The PM procedures that most of us use could be improved.
What Causes Equipment To Fail? (1) • Progressive wear or deterioration of a component part • Random failure of a component part • Poor fabrication or assembly of the hardware • Poor design of the system (hardware or processes) • Subjecting the device to physical stress outside its design tolerances • Exposing the device to environmental stress outside its design tolerances
What Causes Equipment To Fail? (2) • Incorrect set up or operation of the device by the user • The use of a wrong or defective accessory • Poor or incomplete initial set-up or installation, or a poor quality previous repair • Human interference with the device including (possibly) earlier intrusive PM • Only the first and (possibly) the last of these could be classed as maintenance-related failures
Hidden failures • Equipment failures are either likely to be noticed (they are evident…i.e.overt) or they are hidden. • Ideally, devices that are safety-critical or downtime-critical and that have hidden failure modes i.e. failures that are unlikely to be noticed by the “operating crew” should be provided with special protection mechanisms. • It is important to subject devices that are safety critical or downtime-critical and that have hidden failure modes, without reliable special protection mechanisms , to appropriate performance and safety testing.
Special Protection Mechanisms • Operator warning devices • Automatic shut-down devices • Automatic relief devices • Dual components for functional redundancy • Guard mechanisms • Special concern = “multiple failures” = failure modes within the protection mechanisms
PM Basics – Why do we do it? • PM should address: • Failures that result from the degradation of the device’s non-durable parts and • Detecting the presence of hidden failures. • PM cannot and does not prevent all types of equipment failures. • There are several other, more common, causes of device failure. • Very important PM issue = hidden failures of any special protection mechanisms
What does PM achieve? • PM prevents some equipment failures and the associated downtime. • It creates a certain (usually unspecified) level of confidence that the devices tested are safe (because they are not in a hidden failed state).
Indirect benefits of PM programs • Finding failed or damaged devices that have not been reported as needing to be repaired • Periodically confirming that the devices are actually still present in the facility • Providing some level of comfort and security that everything possible is being done to maximize the level of equipment safety.
What PM does not achieve? • PM cannot and does not preventall equipment failures – onlythose that would have resulted from the degradation of the device’s non-durable parts. • PM cannot and does not mitigate the most common causes of adverse equipment-related accidents
The Bottom Line on PM • With respect to: • reducing the downtime of downtime-critical equipment, and • eliminating the most common causes of adverse equipment -related incidents and accidents….. • ..even a well implemented PM program provides only a relatively limited value – and it also has a cost • The more we can optimize the program and quantify the benefits, the easier it will be to balancethe value gained from a well-implemented PM program against its cost
Better PM terminology • True preventive maintenance (TPM) = inspecting, cleaning, lubricating, adjusting or replacing the device’s non-durable parts… (aka scheduled restoration, scheduled discard tasks or predictive maintenance - JIT remediation via Condition Monitoring) • Performance verification and/or safety testing (PVST) = functional testing to detect hidden failures … (aka failure-finding tasks)
TPM = True Preventive Maintenance …is the inspection, cleaning, lubricating, adjustment or replacement of a device’s non-durable parts. Non-durable parts are those components of the device that have been identified either by the device manufacturer or by general industry experience as needing periodic attention, or being subject to functional deterioration and having a useful lifetime less than that of the complete device. Examples include filters, batteries, cables, bearings, gaskets, and flexible tubing.
Predictive Maintenance… …involves direct monitoring of some variable that will provide a reliable early warning that a non-durable part is about to fail (aka Condition Monitoring). An example might be using an oil contaminant sensor in your car’s engine lubricant to turn on a dashboard warning light to tell you when it is time to change your oil. At the moment this particular PM strategy probably has more potential in the physical plant area than in the biomedical area. Physical plant examples include: using vibration analysis to warn of bearing wear, and using infrared scanning to detect overheating in electrical switchgear
PVST = Performance Verification and Safety Testing …is functional testing to detect hidden failures. Examples of hidden failures include: Defibrillators that are delivering significantly less energy than they are set to deliver; heart rate alarms that do not alarm at the set threshold, and protective power cut-offs on hypo-hyperthermia machines that do not operate at the pre-set cut-off temperature.
Special features of the ASHE format • The procedure number as a “universal product code” • Separation of the TPM and PVST tasks • Use of the Note box for concise reporting • User tasks disclaimer
Repair Call Cause Coding Cat 1 Are the device and its accessories still working properly and safely? If yes, this a Category 1 failure(aka: use error; “cannot duplicate”). Cat 2. Is the device itself OK; the problem is due to use of a wrong or defective accessory or problem in a connected network? If … Cat 3. Is the problem due to physical stress? If … Cat 4. Is there evidence that this problem could be the result of a poor initial installation or an incomplete repair of a previous problem (a “run on”)? If …. Cat 5. Is there evidence that the failure was due to an out-of-tolerance ambient environmental condition?
Repair Call Cause Coding Cat 8. Is there evidence that the failure is due to a battery problem? If yes, …. Cat 7. Is there evidence that the failure was due to a lack of preventive maintenance? If yes, …. Cat 8. Is there evidence that the failure was caused by human interference e.g. earlier intrusive PM? If Cat 9. Is there any reason to believe that the failure was due to general wear and tear? If yes, …. Cat 0. The cause of failure is unknown (cannot be categorized).
Some types of devices will benefit more than others from receiving PM: (1) Those with non-durable parts • Identify all possible PM–preventable failure modes by examining each TPM task listed in the PM procedure • Perform a PM Risk Analysis. Rank each failure mode according to the Level of Severity of its potential adverse consequences (LOS score). • Estimate the MTBF (Likelihood of Occurrence score) (How far out is the knee on the Type B Failure Curve) • Multiply the LOS score by the LOO score to determine the device’s PM Risk Score.
Classifying the Level of Severity (LOS) of any likely adverse consequences from (1) any non-durable parts-related failures
Adverse consequences of(overt) equipment failures Three different kinds of consequences: • Adverse safety consequences • Life-threatening (LOS = 4), safety-major concern (LOS=3), safety-moderate concern (LOS=2), safety-only minor concern • Adverse operational consequences (uptime) • Uptime-critical (LOS = 4), uptime-major concern (LOS = 3), uptime-moderate concern (LOS=2), etc • Adverse non-operational consequences (cost of repair) • Very high costof repair (LOS = 4), high cost of repair (LOS=3), moderate cost of repair (LOS=2), etc
Adverse consequences of(overt) equipment failures Economic consequences: • Uptime-critical devices (LOS =4) • Sophisticated imaging devices, such as CT scanners • Uptime-major concern devices (LOS =3) • Key devices with little or no back-up, such as large central sterilizers and automated lab analyzers • High and very high cost of repair devices (LOS = 3 and 4) • Specialized devices, such as lasers, some sterilizers, some ventilators, etc.
Classifying the Likelihood of Failure (LOF) of (1) any non-durable parts
RCM Risk Score. Compounding Level of Severity (LOS) and Likelihood of Failure (LOF) 12 - 16 = Critical risk 6 – 9 = “Worth doing”
Some types of devices will benefit more than others from receiving PM: (2) Those with hidden failure modes • Identify all possible hidden failure modes by examining each PVST task listed in the PM procedure • Perform a PM Risk Analysis. Rank each hidden failure mode according to the Level of Severity of its potential adverse consequences (LOS Score). • Rank the Likelihood of Failure of each hidden failure (LOF Score) by reviewing data on the “yield” of previous PVST testing (# of HFs/ device-year) • Multiply the LOS Score by the LOF Score to determine the device’s PM Risk Score.
Classifying the Level of Severity (LOS) of any likely adverse consequences from(2) any hidden failures