620 likes | 1k Vues
Failure Prevention and recovery . Chapter -19 . Summary . What is failure? Why failures happen? How do we measure failures? Detection and analysis of failures. How operations can improve their reliability? How should the operations should recover from the failures? . Failure . Failure .
E N D
Failure Prevention and recovery Chapter -19
Summary What is failure? Why failures happen? How do we measure failures? Detection and analysis of failures. How operations can improve their reliability? How should the operations should recover from the failures?
What is failure? At its simplest ‘failure’ is when something does not work as it should do. If the shop assistant who sells you an item of clothing ‘fails’ to inform you of the fact that it should be dry cleaned, it is technically a failure. Yet usually in operation management, we use the term failure to denote a more dramatic event. Usually we mean something stopping to do what it should do. So a piece of material fails, or a process fails.
Why do operations fail? There are various reasons for the operations failures: • Design failures • Facilities failures • Supplier failures • Customer failures • Environmental failures
Design failures A design may look fine on paper, but in real circumstances the limitations will become clearer. Design failures happen due to two different situations: Because miscalculating or overlooking a characteristic of demand – process fail to adjust with demand. For example a company process is designed to manufacture 3 televisions per hour, but the demand is to manufacture 7 televisions per hour.
Unexpected circumstances – product size on the design becomes different from demanded size.
Failures inside the operation Supply failures Customer failures Design failures Facilities failures Staff failures Why systems fail
B. Facilities failure Failures with machines, equipments, buildings, and fittings. c. People failure People failures come in two types: Errors and Violations Errors – are mistakes in judgments (run motorbike on reserve petrol) Violations – are doing the things contrarily to the operating procedure. (driver avoiding changing the engine oil, causing major problems to engine)
D. Supplier failure Failure in the delivery or quality of goods and services. (a music band of the hotel fails to turn – in) E. Customer failure Misuse of products and services from the production. F. Environmental disruption-related failure All the causes outside the opration. Example hurricanes, floods, lightning, temperature, fire, crime, theft, terrorism.
Measure failures • Failures are usual happening as human failure. For example : • A machine failure may happen due to the poor design or maintenance . • A delivery failure by someone's errors to manage supply schedule. • Customers mistake, because no one to instruct the customer
So, failures can be controlled to an extent, again an organization learn from failures. Thereby we call failures as opportunities. There are three main ways of measuring failure: • Failure rates – checking how often failure occurs. • Reliability - checking the chances of an occurrence of failure. • Availability – checking the amount of available useful operating time.
FR (Failure Rate) measuring The number of failures occurring over a period of time. The failure of an airport security system can be measured by measuring the failure of security breaches. FR= number of failures × 100 total numbers of products tested
Failure over-time – the ‘bath tub’ curve • Failure is a function of time. Different stages the probability for failing will be different. The curve that describes failure probability is called ‘bath-tub’ curve. According to this curve the failure probability is high at beginning and end of the life cycle
There are three distinct stages. The ‘infant-mortality’ or ‘early-life’ stage where early failures occurred by defective parts or improper use. The ‘normal-life’ stage when the failure rate is usually low and constant. The wear-out stage – when the failure rate increases as it reaching the end of its working life.
Normal-life stage Wear-out stage ‘Infant-mortality’ stage How failure is measured Failure rate Time
Reliability measuring It measures the ability of a system, product or service to perform as expected over time. Rs = R1 ×R2 ×R3 ×Rn ….. Rs = reliability of system Here we consider that a single failure in a component of process causing failure to the whole components. So the more the components in a system, the lesser will be the reliability.
MTBF (MEAN TIME BETWEEN FAILURES) MTBF = OPERAITNG HOURS NUMBER OF FAILURES • Availability The degree to which the operation is ready to work. An operation is not available if it has either failed or is being repaired followed by a failure.
Failure prevention and recovery There are three sets of activities which relate to failure: 1. The first – understanding what failures are occurring in the operation and why they are occurring. 2.Second – examine or find the ways to reduce chances for failure and minimize consequences of failures. 3. Third – make plans and procedures to help the organization from recovering when they occur.
Failure detection and analysis Finding out what is going wrong and why Improving system reliability Recovery Stopping things going wrong Coping when things do go wrong The three tasks of failure prevention and recovery
Mechanisms to detect failure There are six techniques to find out the failure: In-process checks – employees check that the service is acceptable during the process itself (restaurants ) Machine diagnostic checks – a machine is tested by putting it through many activities. ( computer service) Point of – departure – interviews - the staff may formally or informally check that the services has been satisfactory. Phone surveys – used to solicit opinions about products or services. Focus groups - group of customers are brought together to discover problems or finding out attitude towards products or services .
Complaint feed back cards and questionnaires Many organizations using them for collecting views about products or services. Failure Analysis Understand why its has occurred. • Accident investigation – specifically trained staff analyze the cases of accident.( airplane, road accident) • Failure traceability - making sure an operation can trace ( fing proof or evidence) • Complaint analysis – analyze the complaints.
CIT or critical incident analysis Finding out the satisfying and non – satisfying factors from customers.
Failure detection mechanisms include: – – in-process checks – point-of-departure interview machine-diagnostic checks Failure analysis procedures include: – accident investigation – failure mode-and-effect analysis – fault-tree analysis How failure is detected and analyzed
Failure mode and effect analysis Identify the product or service or process that are important in determining the effect of failures. Or identifying failures before they happen by providing checklist procedures. It has three steps What is likelihood that failure will occur? What would the consequence of failures be? How likely a failure to be detected before affecting customers?.
Based on the above questions, we use the RPN or Risk Priority number and find out the cause of failure. There are seven steps involved in this Page 629
Severity of consequence Effect on customer Normal operation Failure Degree of severity Likelihood of detection Probability of failure Risk priority number Failure modes effects analysis
Fault-tree analysis It is a logical procedure starting from a failure or potential failure and works back- wards to indentifying all possible causes and origins.
Food is cold Plate is cold Oven malfunction Plate warmer malfunction Plate taken too early from warmer Timing error by chef Ingredients not defrosted Cold plate used Fault-tree analysis for below-temperature foodbeing served to customers Food served to customer is below temperature Key AND node OR node
Improving process reliability The responsibility of this step of operational managers is to prevent failures, we can do it by following 4 steps. • Design out fail points. • Build redundancy • Fail-safeing • Maintenance
a. Design out fail points We can do it by proper product/service designing, by quality planning and control, by process controlling. b. Redundancy Building redundancy to an operation means, having a back-up system. (airplane, kidney, two red lights in cars)
c.fail-safeing • Coming from Japanese methods of operations improvement. It is known as Poka-yoke in Japan, which means prevent. So the Poka-Yoke are devices used against failures.
3.5 inch diskette cannot be inserted unless it is orientated correctly. This is as far as a disk can be inserted upside-down. This feature, along with the fact that the diskette is not square, prohibits incorrect orientation. It is a control method. Warning lights and chimes alert the driver of potential problems. These devices employ a control method and a warning method. Poka-yoke (fail-safing)
Filing cabinets can fall over if too many drawers are pulled out. For some filing cabinets, opening one drawer locks all the rest, reducing the chance of the filing cabinet tipping. It is a control method. The window in the envelope is not only a labour saving device. It prevents the contents of an envelope intended for one person being inserted in an envelope address to another. It is a control method. Poka-yoke (fail-safing)
Examples for Poke-yoke techniques page 633 Maintenance Maintenance is how organizations try to avoid failure by taking care of their physical facilities. Benefits of maintenance 1. it enhances safety 2.It enhances reliability 3. It enhances quality 4. Low operation cost 5. Longer life 6. Higher end value ( can be sued as second hand)
Three basic approaches for maintenance Run to breakdown ( RTB) - operate till something fails and do maintenance. Preventive Maintenance – eliminate or reduce chances of failure by servicing the facilities. Condition-based maintenance – perform maintenance only when facilities required. It is appliccable for expensive facilities.
Use preventive maintenance Use run-to-breakdown maintenance Use condition-based monitoring maintenance A mixture of maintenance approaches is often used –in a motor car, for example
Total productive maintenance Means the productive maintenance carried out by all employees through small group activities. So TPM means maintenance management. Five goals of TPM PAGE 538 Paragraph 2
Reliability-centered maintenance • It is another method of maintenance where different types of maintenance for different parts of a process.
Cutter ‘wear out’ failure pattern Failures Time One part in one process can have several different failure modes, each of which requires a different approach Shredding process Solution Preventive maintenance before end of useful life Cutters
Cutter ‘shake loose’ failure pattern Failures Time One part in one process can have several different failure modes, each of which requires a different approach Shredding process Cutters Solution Ensure correct fitting through training
Recovery The activities designed to adjust with the failures are known as recovery. Failure planning The procedures which allow the operation to recover from failure is called failure planning
Discover Act Learn Plan What’s happened What consequences Inform Contain Follow up Find root cause Engineer out Analyze failure Plan recovery The stages in failure planning
Procedures of business continuity Avoid or recover from failures and keep business going. Page 643