Condition-Based Maintenance will not stop your Machines from Failing
The ultimate goal for anyone in industrial maintenance should be to gain the optimum, full life of our machine assets. To do this we need to make changes to our current maintenance processes or at least the way a lot of plants are doing things. I see a lot of companies have some variation of a condition-based maintenance program but are scratching their heads on why they still get machine failures. They are not wrong in doing condition-based maintenance but that alone will not stop your machines from failing. First, let me explain why condition monitoring works.
The premise behind condition-based maintenance is, that most failures give some warning of the fact that they are about to occur.
The P to F Interval
This warning is called a potential failure and is defined as an identifiable physical condition which indicates that a functional failure is either about to occur or in the process of occurring.
Functional failure is defined as the inability of an item to meet a specified performance standard. The P to F Interval is a well-known illustration (see Figure 1 below).
There are many different techniques to measure and detect potential failures. You choose what’s best for you and your machine. For instance, if you had a slow turning gearbox you may use oil analysis. The most popular instruments to measure potential failures are for vibration, ultrasound, oil analysis and temperature but there are more.
The sooner a potential failure can be detected, the longer the P-F interval can be. Longer P-F intervals mean that inspections need to be done less often and more importantly, more time is acquired to take whatever action is needed to avoid the consequences of the failure.
Does doing this type of Condition-Based Maintenance or Condition Monitoring work?
Yes, because you can avoid the downtime and maybe save some money.
Failure comes at us in many ways and obviously we have many ways to combat it. If you detect the potential failure early enough (and it can be months and months before the actual failure), it means that you can avoid the breakdown. You can schedule an outage to do a repair or maintenance. It’s not a breakdown, the machine hasn’t stopped, it’s not downtime. This is cost avoidance and the plant can save on the interrupted loss of production because of downtime costs. Avoid the downtime, control the outage, schedule the maintenance work. It’s a win.
Think about secondary damage. The seal might go in a gearbox and it costs $1000 to replace (costs are guesses, for arguments sake). If you don’t catch it and the bearing gets contamination in it, it becomes an overhaul of the gearbox for say $5,000. But if the bearing seizes onto a shaft, now you have to replace the shaft and maybe more.
The cost of secondary damage can be huge so yes, Condition Monitoring does work and done right, it saves you lots of time and money. But, there is a problem with Condition Monitoring and it’s the same for Predictive Maintenance as well. We still have machine failure.
Root Cause Analysis and Defect Elimination is a Must
A definition of insanity is to do the same thing over and over and expect a different result. If we just keep replacing bearings and don’t figure out what’s causing the failure are, we crazy?
Are we guilty of only repairing the effect and not finding the cause? To only fix the fault/effect is reactive maintenance. A condition-based maintenance program or any program needs a defect elimination process. Usually done though a root cause analysis, which is the process of defining, understanding and solving a problem.
This fishbone diagram (see Figure 2) is a basic tool used in root cause analysis. In my day we called it breakdown analysis. We know that the “effect” is that the machine is down, but what is the true “cause” of the failure.
The process was to set up a cross functional team so that we could brainstorm the cause of the failure. That’s a good idea but you must make sure that you have people who have direct knowledge of the process being examined. Not just body’s that represent a department. Then we had a step by step process to drill down in order to find the true cause of the failure.
But this was just one tool we used, we also used the “5 Why’s” method which was my favourite. It’s simply to ask the question why enough times until you get down to the root cause of the issue. Of course, you don’t have to limit yourself to asking only 5 questions, you ask as many as necessary.
These are only two of the tools that are available. There are others such as Failure Modes and Effects Analysis (FMEA). Whatever you use, the point is that you need defect elimination as part of your maintenance processes.
Defect Elimination is the removal of that cause, which will give you a longer life of your machine assets. The idea is making sure ‘you fix forever, rather than forever fixing’. So, when something fails you make sure it does not re-occur so over time, you reduce the number of failures and increase your uptime.
After defect elimination, whether your are overhauling, repairing or redesigning, you are re-installing the machine. For this you need to implement precision maintenance skills and techniques.
Precision Maintenance is simple, it means to work to a recognized standard. A set of tolerance that you, and your team agree on. The tighter the tolerance the better the result. But you cannot have a tolerance that you cannot measure.
Precision maintenance means “up-skilling” your people. Getting the right tools but also the right training. Its machinery acceptance standards, precision balancing, alignment, base flatness standards, the removal of machine stress and more. See Figure 3 and 4 for some examples.
Most importantly – it’s commissioning to a standard and documenting the process.
The answer to the question why that was asked in root cause analysis above is usually found in Precision Maintenance.
Controlling Factors in the Life of a Machine
A machines design can have an effect on the machine’s life. However, in maintenance, very often we have to live with the design we have been given. Let’s say it’s a pump that was under designed for the application; this would mean that the pump would begin life in a functional failure state because it does not meet requirements. So obviously, the design has to be done right, otherwise the inevitable re-design is done. A review of the machines design should be a must in any breakdown analysis.
Machines are overhauled (or repaired usually on the run) many, many times throughout its life. It’s extremely important that it be done correctly. Many companies will contract this work out because they do not have the facilities (ie. clean room) to do the work correctly as one of the biggest issues during overhaul is contamination. When a machine is overhauled, the most important aspect is that the OEM specification for machine fits are maintained. The goal is to make it new again.
Installation is the key. It’s the most critical thing for all machines. A well-designed machine or a well overhauled machine can be ruined with poor installation practices. The installation must be done to a standard such as the ANSI/ASA S2.75-2017/Part 1 or the OEM specification if its better.
Commissioning is actually a continuation of the installation. In fact, it should start with the review of the installation documentation. I think it should be done by another group than who did the installation such as the reliability group, for example. Each machine is different so we cannot publish a list of what to do but all of the OEM operation procedures should be followed. When the button is pushed to start the machine, this is where you should be taking measurements for thermal expansion (offline to running) so we know if correction is necessary before putting the machine into service.
When the machine is online, different parameters should be measured such as temperature, sound and vibration as part of your condition-based maintenance program. These measurements are the benchmarks that you will use to compare the new measurements you take throughout the life of the machine. Changes from these results means the machine is deteriorating. However, if you have done a good job at understanding the root causes and using the precision maintenance techniques in the areas that you can control, this should be because the machine is worn out and had a good long life.