Concepts used throughout the world describing improvements to production throughput are often generally referred to as “reliability.” But what does this term actually mean? The answer to this question is not as simple as it may seem.

Once a plant decides to introduce a reliability plan, the actual steps agreed upon by the staff are affected by the definition used, because different interpretations of “reliability” can lead the site in totally different directions. Therefore, before taking on this type of endeavor, managers must concur on many fronts, including what tasks are considered reliability issues, how such tasks will improve profitability, and who (if anyone) should be involved in the reliability project. The absence of such answers may very well limit the potential of future improvements. But before you can make improvements, you must change the way you think. Let's take a look at a few examples to illustrate the transformation in mind-set that's necessary to redefine reliability.

Paradigm No. 1: Reliability means fewer breakdowns.

A common definition of reliability explains equipment as producing fewer breakdowns. Improving reliability is about having the ability to identify issues and repair equipment before the operations department notices anything is wrong. Although the operations department certainly appreciates the shift from unplanned stoppages to planned outages, maintenance still incurs the cost of components and labor required to reinstate the equipment's functionality. Therefore, not much benefit is seen. With this definition, condition monitoring takes center stage, and unplanned stoppages decrease. Frustratingly, however, maintenance costs and labor requirements change very little, if at all.

Further analysis of this situation shows the equipment still needs to be replaced or repaired at the same frequency. So while production reliability does benefit, no practical equipment reliability has been achieved. Labor and material required for repairing the equipment stays largely unaffected, and any savings in reduced consequential damage is usually consumed in the additional inspections required. This shows that a more refined definition of reliability is required — one that not only includes reliability of production between shutdowns, but also reliability of equipment (i.e., fewer needs for the shutdowns to fix the equipment). Maximizing life of equipment means fewer breakdowns and planned shutdowns, lower maintenance costs, less labor requirements, and fewer spares.

Using this definition, the reliability concept should encompass actions that incrementally improve the equipment life currently being attained (e.g., lubrication, cleanliness, alignment, balancing), which, in turn, increases the mean time between failures (MTBF). It becomes apparent from this definition that actions such as condition monitoring are not related to reliability but rather to the minimization of mean time to repair (MTTR).

Redefining Paradigm No. 1: Reliability means less need for intervention.

Paradigm No. 2: Reliability is used to determine equipment performance.

Site management teams are quite aware that equipment is not the only consideration for maintenance. Health and safety, environmental concerns, information management, and planning/scheduling are just a few of the other issues that need to be considered part of normal business. When all of the other aspects that need to be managed are taken into account in maintenance planning, it becomes apparent that reliability is not just the ability to maintain the functionality of equipment but the requirement for all maintenance processes to function properly.

Every time a process needs intervention from employees, cost is incurred. Labor is valuable, so a lower labor requirement in any of these processes is desirable. This is achieved by making the process more reliable. Therefore, reliability expands from referring just to equipment to the entire business.

When you realize that a task taking 10 min. a day adds up to one working week per year, it becomes more important to measure intervention instead of simply production impact. Ten minutes of avoidable attention a day is one week lost per year, which that person could be using to address other issues.

Redefining Paradigm No. 2: Reliability can be used to determine performance of all activities.

Paradigm No. 3: Reliability practices belong to the plant floor.

Traditionally, reliability tasks are believed to be the responsibility of engineers and trades people. This may be the case when the term reliability relates only to equipment, including the relating tasks such as alignment, lubrication, and precision maintenance. Plant managers typically do not see how those tasks relate to their level.

When the reasons that initiatives do not get implemented are studied, however, it becomes apparent that management causes most of the problems. Insufficient communication about what engineers are doing and why reduces the buy-in of other employees. Not creating enough time or not checking the quality of tasks can also affect how fast a task can be completed. These initiatives must be driven by the managers to ensure the organization is delivering all the support team members require in order to implement their tasks properly. Without senior management support and involvement, reliability initiatives will undoubtedly struggle.

Redefining Paradigm No. 3: Reliability practices belong to the boardroom.

Actions speak louder than words

Implementation of a reliability program affects all personnel and starts with the business needs of the facility (e.g., production volume, costs, employee satisfaction, etc.). The issues affecting these objectives need to be clearly understood and prioritized. For example, assume a gearbox failed last night and caused 10 hr of downtime. Of course, the temptation is to investigate it. But how does this failure, which occurs every 6 yr, compare to the most frequent type of equipment failure on the same site (e.g., motors)?

The most common reason rotating equipment fails is due to bearing issues. So how much loss results from bearings compared to a specific equipment type? A gap in the site planning process may result in each job taking 10 min. longer than required. How does this loss, which results in less work being done, compare to the production and cost loss from bearings? Furthermore, how does communication loss compare to planning loss, and, if this is an issue, what actions are currently underway to address it?

When there is no way to compare the losses for the above examples, or the site has not tried comparing them, confusion and disagreement set in. Because different people will have different passions, the result is multiple initiatives clashing for the same limited resources and money. When this happens, progress in all initiatives slows down.

Once reliability is adopted as a measure of overall business loss, it's a lot easier to get management support for reliability initiatives, which is critical for project success. By raising the definition of reliability, it's easier to see how the lack of reliability of equipment or processes contributes to the different key performance indicators or overall equipment effectiveness.

Redefining reliability

One of the key issues in adopting this new philosophy is the difficulty in defining a reliability problem. The first question asked by management is typically, “What is causing the gap in overall equipment effectiveness or cost?” The first level answer is typically easy to see — availability might be the reason effectiveness level is low. The lower and more detailed the level at which a response is required, however, the more difficult providing that response gets due to lack of data.

A site will most likely be able to describe the most obvious availability losses, but is the most frequent type of equipment failure known? It's human nature for people to focus on what they know, which is normally equipment failures, rather than identifying major contributors to loss, such as a notoriously unreliable item like communication.

Before anything gets addressed on a plant site, it generally becomes apparent that the documentation of loss is unreliable. Improving failure codes and understanding what is slowing people down will give a far greater understanding of where focus should be directed. The most successful facilities (Secrets to the Most Successful Sites on page 14) are the ones with a systematic approach to improvement.

Reliability is one topic in which implementation falters due to lack of time. Time is created, however, by understanding that a lot of the current tasks are not getting as many results as the reliability initiatives can achieve — or that some of the initiatives being pursued are focused on issues with too long of a payback period. Luckily, you can typically free up time by simply reprioritizing your company's business needs.

Kleine is a programs manager with ABB Process Automation, South Asia Service, Rotorua, New Zealand. He can be reached at barry.kleine@nz.abb.com.


Sidebar: Secrets to the Most Successful Sites

Start with a business need — Key financial variables that affect the site should be identified. These variables are the ones that will make the largest difference to the profit margin of the company. For example, it's important to understand clearly if maintenance cost needs to be decreased or maintenance cost per unit produced. Many examples are available showing a reduction in spending that resulted in a loss several times that amount in lower production due to decreased reliability. The site strategy needs to emphasize the few variables where the focus should be kept.

Develop management support for the concept before it is started — Many processes are scheduled to be implemented on sites because other sites are doing it or because someone believes they will add value. Unless the senior managers are sold on the fact that these processes are critical to achieving objectives, little focus will be put on them — and they will take many times longer than necessary to implement. Excessive time to implement should be considered a loss, because it means people are not available to work on other initiatives. For this reason, fewer initiatives should be implemented simultaneously — and the ones chosen should be those managers drive and show a personal interest in.

Establish how reliability can satisfy the business needs — As mentioned earlier, there is a distinct difference between reliability (reducing the need for intervention) and consequence minimization (fixing it faster). Many people are passionate about repairing, so focus can easily turn to these issues, and reliability gets neglected. It's critical to understand that improving reliability reduces both time and cost to repair. Most other initiatives address only one or the other at a time.

Select reliability improvements based on their ability to deliver quantifiable business benefits — Many good reliability topics are chosen, but when a given task is selected, often little or no data is provided in terms of a business case or the return on investment predicted. Reliability initiatives are best focused on the most frequent issues, as these are the ones that will give the fastest evidence of improvement. Addressing an issue that only occurs every 5 yr will need another 5 yr to see any benefit.

Sustain momentum by publishing improvements — It's generally accepted that interest in an initiative will halve if no improvement is seen within three months. As the results are observed, these need to be published and shared across the organization to maintain commitment — not only by the team members, but also by senior management. Lack of evidence will result in people looking for alternative initiatives before the current ones gain traction.

Maintain quality — If a process is agreed upon, it's important to follow the process. When things get busy, there is a tendency to try to take shortcuts. This results in poorer results, fewer published improvements, and a drop in interest. Managers must show interest in the quality of work to maintain the standard.