Because the wiring and the inputs and outputs (I/Os) are the most vulnerable elements in a system, be sure to locate and identify these elements in the problem circuit.

Sometimes, the only way to test a circuit is to see how the system reacts to a manual input.

Even the most experienced troubleshooter must rely on a systematic troubleshooting process to simplify today's complex control circuits. At a high level, a good troubleshooting process is simple. Here's the one we'll follow:

  1. Investigate the symptoms.

  2. Identify the possible causes.

  3. Test possible causes.

  4. Follow through by correcting the problem, monitoring the operation, determining the root cause, and completing any required documentation.

Step 1: Investigate the symptoms

Make sure you understand the system. Find any available documentation, including online versions. Look for schematics and piping and instrumentation diagrams (P&IDs), as well as loop sheets. Talk to the operators and anyone else familiar with the operation. Look up operations and maintenance records and control and configuration parameters. Some of this information may be available from the programmable logic controller (PLC), the distributed control system (DCS), or from other online databases.

Because you often don't know where the problem lies, you must keep the big picture in mind. You can break up even the most complex system into five elements:

  1. Process controller — most often involving a microprocessor.

  2. Input field devices — sensors of some type that monitor the process.

  3. Output field devices — receive a command signal from a control element; examples are drives, valves, and alarms.

  4. Connectivity elements — wires, cables, and buses.

  5. Process material(s).

And let's not forget the sixth element: the people who can affect the process and its control system.

Because the wiring and the inputs and outputs (I/Os) are the most vulnerable elements in a system, be sure to locate and identify these elements in the problem circuit. As you talk to people and review information, look for a recurrence or pattern. If you see a pattern, is it related to shift changes, process changes, or any other recurring event? Use your judgment on when to quit gathering information, but be sure the data displayed by the human-machine interface (HMI) match what the operator tells you.

Step 2: Identify possible causes

Analyze the system with an open mind, systematically eliminating components and functional elements from the overall process as unlikely trouble spots. Start by following the logic from input through output. What happens in the cause/effect chain? Compare the current symptoms with the action that the specified decision logic or control algorithm should produce. As you eliminate some process elements as possible causes, you can also start building and prioritizing your list of the most likely causes. You'll want to test the system to eliminate these possibilities as you do this.

You can usually eliminate simultaneous, unrelated problems as being too unlikely. If you can link a problem to one likely cause, do so. At this stage, don't look for interrelated, multiple causes. Your first priority is to get the operation back up and running. Tackle complex situations after a quick fix gets things going. Just don't forget to use your company's work procedures to highlight the open job — operations people often confuse a quick fix with a problem solution.

As you prioritize possible causes, go back to your sources of information. Maintenance records can help you decide if one component has been much more trouble-prone than another. Construction work in the area would lead you to suspect damaged cabling rather than an I/O board failure, because cabling running through the plant is more likely to suffer damage than is an I/O board inside a cabinet.

Step 3: Test possible causes

When you've narrowed your list of probable causes down to a manageable size, you can begin testing. If the process is running, do those tests that don't interrupt operations first. Quick and easy tests can save you time in eliminating potential causes, so do those early in your troubleshooting. In many cases, you need to look, listen, or feel specific components. When working around or with energized equipment, don't take chances with safety. Always follow established and required procedures.

Inputs and outputs are usually the first place you should look for problems. Most inputs and outputs fall into one of two broad categories: They are either discrete devices with two states (on or off); or analog devices that can send and receive continuously varying signals.

Common discrete devices include limit switches, solenoid valves, indicators, and alarms. When PLCs send signals to a master PLC or a DCS, they also count as discrete devices. Common analog devices include resistance temperature devices (RTDs), thermocouples, transmitters (pressure, level, temperature, flow, etc.), valves, analytical field devices (like pH sensors), and variable-speed drives.

Discrete field devices typically use low-voltage DC. A variation in these voltages usually indicates a problem. Some drift is acceptable, but anything more than 5% to 10% in either direction (at either end of the range) calls for a closer look.

Use a scope to check a discrete signal. Rise and fall times that are not instantaneous usually indicate a fault in the sensor itself. This fault is typically sticking contacts in a mechanical switch, or an impending failure in a solid-state device. High signals that are not flat usually indicate loose ground connections, ground loops, or improper shield connections. Low signals that are not flat are often noisier than the high signals and usually indicate a grounding or shield problem. Noisy low signals can also indicate an improperly wired field device.

If a measurement does suddenly dip to a minimum or maximum, then you most likely have a sensor, wiring, or other I/O problem that should be relatively easy to find. The best place to check is often the field termination assembly (FTA). Doing so roughly divides a process loop in half.

More gradual changes could indicate more complex problems like a change in valve stiction (static friction), a subtle change in the process materials, or a drift in instrument calibration. Your job will be much easier if you are working with a DCS, because you can pull up the history of each signal loop and look for changes over time.

Sometimes, the only way to test a circuit is to see how the system reacts to a manual input. When working with PLCs, you are “forcing the contacts.” When working with continuous process control loops, you are “bumping the system.” If you can't manually force the system to respond to your input, you probably have a problem with the outputs. If the outputs respond properly to manual inputs, you can probably eliminate outputs and look more closely at the field transmitters, proximity switches, and other related input devices.

Be careful when testing the process this way. Forcing contacts, adjusting timers and counters, changing set points, or tinkering with loop tuning parameters or the control program is risky business that can have disastrous results. Coordinate closely with the process operator. Be sure you know which limits the process can tolerate so you don't make the system unstable or prone to crash.

Step 4: Follow through

Follow through by carefully replacing faulty parts, monitoring the operation, and documenting what you did according to your plant's requirements. If your action was a quick fix to get equipment up and running, follow your plant's root-cause analysis procedure to get to the bottom of the problem.

Some organizations insist you ask the “five whys” before closing out a maintenance action. That means you must question each cause and effect at least five times to make sure the problem is thought through.

All the sophisticated equipment and software in the world is useless if the troubleshooters who use it don't follow a systematic process and make full use of the tools available. Take the time to understand what you are doing. Don't be afraid to ask for training if you need it. Then, do your troubleshooting methodically all the way through to the root cause. In the end, you will have the respect of management and your peers.

Coleman is director of education for PRIMEDIA Workplace Learning's Industrial Services Group, Chattanooga, TN. Bowden and Frank are instructional designers.