Downtime Numbers You Can Count On

You know your maintenance team gets equipment back online fast, so why do the downtime numbers look so bad? Your job may rest on the correct answer.

You may be doing all the right things to get failed equipment up and running fast. You may be superefficient when equipment is down for preventive maintenance. You may have been doing such a good job for so long that equipment failures are rare. Yet, management says the downtime numbers are not acceptable.

So how can you look better, when you’re already doing “all the right things?” First, let’s look at the components of downtime (see sidebar, right). There’s a whole chain of events in the case of equipment malfunction. First, the equipment stops operating. The operator notices the malfunction and contacts maintenance. Then, maintenance restores the equipment to service and notifies the operator. Finally, maintenance observes the equipment for proper operation.

What often happens is the operator shuts off the equipment for a setup change and calls in maintenance about a noncritical concern. Remember: The time it takes from when the operator shuts off the equipment to the time the tech leaves counts as downtime. What was the actual downtime due to equipment failure? None! Yet, the report to higher management shows maintenance just isn’t keeping the equipment available for production.

How do you correct this situation so your managers get a true picture? The answer requires true organization and communication. We’ll assume you already repair and maintain equipment fast enough. But in case there’s room for improvement, the sidebar, on page 60, gives you some tips.

For the purpose of this article, we’ll assume you have a Windows-based Computerized Maintenance Management System (CMMS). We’ll also assume you have a spreadsheet program, which allows you to copy information from one application and to another for making graphs and charts. If you don’t have these tools, you can still follow the concepts. If you implement these suggestions without a computer, you may want to limit application of them to only your worst downtime areas. These suggestions worked for a 63-person maintenance department in a 7224 operation, and the case histories are from that same operation.

Reduce response time. It may take you 10 min to diagnose and repair a motor controller problem, but the motor waits 12 hr for you to begin repairs. So, the motor is still down 12 hr and 10 min. The plant mentioned above experienced this same problem. Operators would fill out a “work order” form and drop it off on a “compost pile” that sat on the desk of one of the maintenance supervisors. That supervisor was on the plant floor most of the time, and didn’t always sift through every paper on his desk daily. Nor did he put those papers there. They’d just pile up while he was “out with the troops.” You can figure out what happened, as a result.

The plant engineer ordered radios for everyone in maintenance, as well as for each area shift supervisor (production). When an operator had a motor go up in smoke or some other mishap occurred, the operator would tell the supervisor, who would announce the mishap via radio to the maintenance crew leader for that section of the plant. It was a big plant: Each area had its own repair crew, and each crew had a leader. The crew leader answered production maintenance calls by radio, then went to the scene to start repairs. If the leader needed help, he or she would call another crew member. This system gave an immediate response to the production department.

For maximum efficiency, you should have a simple response procedure that’s well communicated to everyone. If you write it out, your procedure should take no more than a few sentences. This plant’s procedure read: “Any operator who has equipment problems should immediately contact the production supervisor, who will call maintenance on the radio. That shift’s crew leader will immediately send someone, or go personally, to the location of the complaint.”

That’s simple, direct, and effective. This type of response works well, because a “first response” person can carry a minimum of equipment to scope out the nature of the problem and possibly fix it in minutes. When a person or small crew follows up on a “first-response” person’s visit, they need to do so with a full ensemble of tools and test equipment so they aren’t running back and forth “to the shop” while equipment is down and awaiting repair. In addition to your response procedure, you need troubleshooting and repair procedures for specific equipment. These will, of course, be as detailed as necessary.

Document every response. Most maintenance technicians would rather have a root canal than spend much time doing paperwork. So, why subject them to long forms that bombard them with meaningless questions? What will you do with the information? And while under pressure to get equipment running again, who has the time to fill out excessive paperwork accurately?

The cure is to use an extremely simple form, asking for only the information your CMMS needs for tracking purposes. This plant used a spreadsheet that had two rows per response call, and had enough pairs of rows to last a typical day. The first row was for “As Found” information: time arrived, time equipment went out of service, type of equipment, and the problem. The second row was for “As Left”: time equipment was ready for service, action taken, and time left.

This is minimal information, but from it, the maintenance department built profiles of where the hot spots were and what problems needed most attention. The sheet’s bottom 2 in. or so had a listing of codes (same ones used in the CMMS, plus a few extra ones), so the technicians didn’t have much writing or guessing to do. All times were in the 24-hr format. Time studies on the form showed a typical 12 sec to fill out the first row (while talking to the operator) and 11 sec to fill out the second row.

Gather the right facts. Notice the two different sets of times in the form just mentioned. Previously, the maintenance people recorded “time left” as the end of downtime, but this was inaccurate. They often stayed to watch the equipment for several minutes of operation. Though the equipment was running, the time spent watching it counted as downtime. Thus, the plant engineer came up with “watch time,” simply by using a formula in a spreadsheet after importing data from the CMMS. Custom macros made data manipulation an automated process that approached “one-click” convenience.

Some facts will show you where you need to improve your maintenance system, which is why the form had a space for “gripes.” The plant engineer, maintenance supervisors, and CMMS administrator all shared the data entry, so all of them handled field reports. When someone came across a gripe, such as “could not identify failed component due to poor markings,” the person would mark the gripe with a highlighter and put the field report in the plant engineer’s in box. After an ad hoc meeting, someone would implement a solution almost right away.

Maintenance did a safety inspection with each repair call. Technicians would look for signs of tampering with safety switches, frayed cables, missing equipment guards, etc. They documented any discrepancies. Each of these resulted in a personal investigation by the responsible maintenance supervisor and/or plant engineer. They carved this policy in stone after a serious injury and resultant fingerpointing. The maintenance logs showed repeated attention to an interlock switch, which the operating supervisors kept disconnecting to “save time.” Following the plant manager’s investigation, the maintenance department got carte blanche authority to take any equipment out of service at any time for safety violations.

Analyze the facts. To improve your ability to fix the most critical items, you must ask certain questions of your data. What kinds of downtime are you incurring? Can you eliminate operator error with equipment modification or training? Can you eliminate an operator through automation or some other modification? Do you have patterns of repetition on some shifts, but not others? What happens when downtime occurs? What problems are the technicians encountering in the field? What training, test equipment, tools, or fixtures might help? What procedures need to change?

To improve your image with management, you must present the right data. In effect, you must use the right facts as skillfully as an army must use the right weapons. You must concentrate your forces in the right places, and always think in terms of production bottlenecks.

It’s better to have 100 hr of downtime on noncritical equipment and no downtime on critical equipment, than 10 hr of downtime on critical equipment and 10 hr of downtime on noncritical equipment. Why? It comes down to the number of units produced per day. When you submit your downtime reports, don’t just show hours here and there. Also show units per day and revenue gained by your improved maintenance responses and preventive maintenance programs.

Back to our plant example. The plant engineer frequently got hammered for excessive downtime, so he put the CMMS information covering operator error into Pareto charts, which are a type of analysis tool. Then he distributed the individual charts to the appropriate operations managers, who quickly realized they didn’t want to see these charts at the weekly staff meeting. Almost overnight, many of the operator errors ceased to happen. Consequently, overall downtime decreased.

This same plant engineer came under fire for excessive downtime on a single, very important line. At the staff meeting, he pulled out his charts and showed the numbers. A machine had been down for 11 hr, due to a simple PLC problem. The operators on two shifts simply parked in the cafeteria and never said a word. Similarly, operations supervisors never said a word either. Logs from the maintenance department showed maintenance, which had recently initiated a daily visit to that line, noticed it was down. The maintenance log showed the arrival time and time back in service. Total repair time was about 5 min. The maintenance technician spent 20 min trying to find an operator, and finally ran the machine himself, while his boss located an operator. They documented all of this.

The outcome was very different than it would have been under the old system. The company fired three operators and demoted a supervisor. The maintenance tech received a raise at his annual review the following month. Talk about timing!