Maintaining 24/7 operations is an ongoing quest for those who manage and run critical-process facilities. Many times, facility owners enlist consultants to conduct reliability studies. Reliability studies go by a variety of names, including site surveys, power quality reviews, or single-point failure studies. Regardless of what they're called, these studies help identify and solve site problems that are vulnerable to outages. Here's a look at six essential elements of a quality survey, plus some insights into cost and preparation.

The starting point for any reliability study should be documentation. It's best to supply or recreate up-to-date, as-built drawings for the consultants to review. Many times, problems are readily apparent from the documentation alone. Just make sure the drawings are CAD-based so you can update them and document any changes.

At this point, you'll also want to consider the scope of the study. Do you need a review of the whole facility, or just certain systems? Many companies limit their studies to electrical and mechanical systems, but your choice depends on your goals. If your goal is zero downtime for any system at any time, then the study must be more global. But, if your goal is limited to one or two problem systems, then a limited study might be more appropriate.

Finally, decide up front how far into the future the study will extend. If the owners plan to operate the facility indefinitely, then it makes sense to address issues that might crop up in eight to ten years. If, however, they plan on replacing the facility in the next three to five years, then a long-term approach may not be necessary.

As for the study itself, you'll want to make sure the final analysis meets the criteria outlined in the remainder of this article. If it does, you'll have a reliability study that's beneficial and cost-effective.

  1. Review the maintainability/expandability of existing systems. All the components of critical-process or service facilities should be designed for concurrent maintenance and operations. Anything you can't maintain without a shutdown is a serious vulnerability. For example, you need the ability to add a chiller or cooling tower without shutting down the facility. That's why most modern facilities use a modular approach for designing critical systems.

  2. Identify and prioritize areas of concern. The level of detail is one of the big differences in reliability studies. Some document every aspect of a facility, whether it's pertinent or not. For example, I've seen studies that contained thermographic profiles of underfloor airflows for data centers that weren't having cooling problems. Facility owners want the most for their money, so make sure the details provided are commensurate with your areas of concern.

  3. Recommend solutions or suggest appropriate alternatives. Recommendations should be as specific as possible, fully explained (with drawings included), and achievable. Measures that are not practical should be presented as alternatives, not recommended solutions.

  4. Define the risk and level of downtime associated with each recommendation. Solutions that could cause or require an outage should be clearly defined, along with the risks and possible impacts. For example, if the study recommends doing electrical work in energized switchgear, the facility's engineers need to know all possible scenarios. An outage may not occur during the work, but if it does, the results could be catastrophic.

  5. Provide order-of-magnitude cost estimates and life-cycle cost-benefit analyses for the identified solutions. If the study suggests any large capitol improvements, it also should include corresponding life-cycle, cost-benefit analyses. I've seen studies that recommended installing numerous variable-frequency drives without any cost-benefit analyses whatsoever. As an owner, I'd like to know if the payback is two years or twenty years, even if I'm confident the recommendations will improve the system's efficiency.

  6. Summarize everything in an easy-to-understand document. A well-written executive summary explains the study's findings in non-technical terms. Too many summaries are written specifically for engineers. Anyone, including nontechnical personnel, should be able to understand the findings and recommendations included in the report.



Now that you know what you should expect from a reliability study, you're probably wondering about cost. Obviously, the cost is dependent upon the scope of the study, the size of the facility, and the complexity of the facility's systems. It also depends on whether documentation is available. Recreating as-builts is time-consuming and expensive.

Another variable is the consultant's involvement after the study's completion. Some clients end the relationship once the consultant turns the report in. Others involve the consultant in the design and implementation phase. Decisions such as these can significantly impact the cost of a study.

Reliability studies range in cost from $10,000 to $200,000. Higher costs, however, don't necessarily mean higher-quality surveys. Some of the best studies I've seen aren't the most expensive ones. However, you're paying for specialized expertise, and it isn't cheap. Compared to the costs of facilities and outages, though, reliability studies are a bargain.

When you or your facility's owners decide the time is right to invest the money in a reliability study, rest assured; there are plenty of quality firms out there that can meet these expectations. Just remember that it's up to you to implement the recommendations. Too often, these studies are simply filed away, only to be reopened when the next failure occurs.

Ron Hughes is the principal of the California Data Center Design Group, located in Sacramento, Calif. You can reach him at rhughes@cdcdg.com or through his Web site at www.cdcdg.com.