Protecting Critical Systems: The Role of the Engineer

As the blackout of 2003 demonstrated, businesses are more dependent on their critical systems than ever before. In fact, many companies have significantly increased the number of systems they now deem business-critical. But the role of the critical support infrastructure — primarily power and environmental control systems — in ensuring the safe and continuous operation of those systems is still often underappreciated. As a result, some organizations are still not as prepared as they should be to deal with events of this magnitude.

It’s no surprise that critical systems don’t get the attention they deserve. Infrastructure — whether roads and bridges or power systems —only get noticed when they fail or are on the brink of failure. When it comes to the support system infrastructure these failures are noticed before business operations start to fail. That’s a situation few businesses can afford. To make matters worse, two trends are causing critical system support to become even less of a priority.

The first is the increased role IT managers now have in specifying, purchasing, and managing critical support systems. In many cases, they’ve usurped the role of the facilities manager as the primary decision maker in terms of power and environmental protection systems. This is a natural consequence of the shift toward rack-based systems for virtually every IT application. The outcome of this shift in responsibility has meant that individuals who have limited technical experience with these systems and for whom these issues are a secondary concern are more frequently making support system decisions.

The second, more recent, development is the emergence and promotion of “pre-engineered” solutions for room-scale protection. These systems are being marketed to the IT community with the promise they will eliminate the need for engineering expertise in the specification and installation of support systems, regardless of the size or criticality of the application. That has some engineers concerned and with good reason. It relegates the role of the engineer or consultant to nothing more than a product specifier; grossly underestimating the value engineers bring to the process.

Engineers play a critical role in support system design that goes beyond matching products to applications. They educate clients about support system issues, provide the technical expertise to avoid common problems, help ensure scalability and serviceability, and can even serve as a bridge between facility and IT management.

The engineer as teacher. Talk to any consulting engineer involved in data center design work and he’ll say his top priority is making sure his clients understand the role of the support system and the relationship between cost and availability. Consultants focus first on educating clients about different protection schemes and their costs, strengths, and vulnerabilities.

“Initial cost is always a factor,” says Gil Martin, chief electrical engineer at RDK Engineers, a Massachusetts-based building systems engineering firm. “But we want to make sure our clients understand where they are vulnerable. They may be willing to live with increased exposure to some risks, but it is critical to the decision making process that they have a clear understanding of those risks.”

“How important is the data?” says Michael Bertollo, projects director at Comp-u-Site Design, a consulting-engineering firm that specializes in facility infrastructure systems. “That’s really the driver in making decisions about support systems, and it’s the first issue we address with our clients.”

Bertollo also emphasizes the importance of considering operation and service issues in the design stage. “What is it going to be like to live with this system? A design may look great on paper until you consider what’s going to happen when a third-shift operator has to make a split-second decision in the middle of the night. That’s when the simplicity of a good design pays off.”

Donald Chamberland, manager of design services at Johnson Controls Power Technology, says there are also critical issues beyond power that are easy to overlook.

“How will the systems be cooled?,” he says. “Is the fire protection system appropriate for sensitive electronics? We try to make sure our clients have considered all the issues involved in protecting critical systems.”

Avoiding common problems. “Most of the problems we see when performing forensics investigations on failed systems could have been avoided,” Martin says. “There just wasn’t enough attention paid to technical issues related to the infrastructure early in the development of a site.”

Among the most common problems engineers identify when analyzing sites where problems are occurring are grounding, failure to isolate critical systems from general building systems, harmonics, and inadequate cooling. In addition, space and management problems often occur when a room grows rack by rack and designers fail to plan for the future.

“We often see remote sites that never expected to house critical data develop a small data center one piece at a time as their role in the organization becomes more important,” Chamberland says. “Before they know it they have seven or eight mission-critical racks, and each rack has its own UPS. It’s not uncommon to find all the UPSs connected to a common bus or to find the batteries dead on half of them.”

“Small or remote sites may think they can pay less attention to support systems than large data centers,” Bertollo says. “Certainly the scale of the systems is different, but if the data is critical and availability requirements are high, the issues are essentially the same.”

Adding third-party credibility. The equipment manufacturer may not always be the best source to answer questions like, Is the equipment being proposed for a particular application suitable?, Is there a better solution?, or Have service issues been adequately addressed? A consulting engineer can play a valuable role in evaluating the credibility and applicability of the solutions and claims competing manufacturers may make.

Ensuring serviceability and scalability. One of the key issues that the design of computer support systems must address is capacity. How much is needed initially and how will additional capacity be added when it is needed? The experience and expertise of an engineer can be extremely valuable in addressing these issues.

“You need to strike a balance between providing some room for growth versus designing a system that will never operate above 40% capacity,” Bertollo says. “We focus on scalable designs that meet initial capacity requirements while allowing capacity to be added with minimum disruption to the room.”

“Data center space considerations are often overlooked in considerations about future capacity,” Chamberland says. “This is one reason we generally recommend locating the UPS outside the data center. It gives you greater flexibility to add capacity without consuming valuable data center floor space.”

“A service strategy is almost as important as the system design,” Bertollo says. “For business critical systems, I caution clients against a service program that ships replacement parts overnight and expects the client to do the service. A local factory-trained service specialist is preferable. Either way, an appropriate service strategy needs to be developed to ensure the equipment the client is investing in works the way it is supposed to when it is supposed to.” “Space allocation, maintainability, and system expansion all need to be considered early in the process,” Martin says. “We want to make sure clients don’t paint themselves into a corner.”

Facilitating communications and coordination. Friction often exists between facility managers and IT specialists. IT management may believe that facility managers don’t understand and aren’t responsive to their needs. Facility managers may feel that facility issues aren’t adequately considered when new IT systems are specified. Engineers who are experienced in data center design can serve as a liaison between these two groups. In a sense, they speak the language of both the facility manager and the IT manager and can facilitate increased communications and cooperation between the two.

There can be value in the “pre-engineering” of support systems for some applications. But pre-engineering doesn’t eliminate the need for actual engineering involvement in the design of critical support systems. Getting an engineer involved early in the development of a site helps users ensure their protection system meets budget and application requirements and prevent common problems that can reduce system availability and result in costly rework, and it almost always results in a simpler, more effective support system.

Stoll is vice president of power marketing at Liebert, Columbus, Ohio