Article 708: Critical Operations Power Systems
Focusing on the complex interplay among mechanical and electrical systems needed to support critical infrastructure, Art. 708 is one of four new Articles added to the 2008 NEC. Reaching deep into the conception, commissioning, and management of such systems, it references eight other standards, annexes, and additional explanatory material in the NFPA portfolio of documents that are bound together — either in whole or in part — by their interlocking emphasis on power security.
In an initial attempt to develop the content for this new Article, Technical Panel 20 debated whether the instruction from the NFPA Standards Council to do more about power security at the building premises level was a directive about an occupancy or a system. As a special occupancy, it belonged in Chapter 5. As a system, however, it belonged in Chapter 7.
In an environment that requires a game-changer to get public safety departments and facility managers thinking fast about the survivability of critical power systems, the point is moot. Survivability requires a facility (a designated critical operations area or DCOA) and a system (critical operations power system or COPS). The desired result does not happen overnight, and electrical professionals cannot simply specify reliability from a table — as we would refer to Table 250-66 when sizing a grounding electrode conductor (see Reliability vs. Availability).
Reliability assessments based on probabilistic methods provide more consistent results, reflecting both the condition of existing equipment and the basis for design. Manufacturing philosophies such as “total-quality management” (TQM) and “six sigma” were widely deployed years ago — often with good results. Technical Panel 20 contemplated a similar deployment in Art. 708. In fact, as a method for improving availability using parametric models to derive design values of reliability from operational values, TQM is cited in new Annex F.
Addressing risk assessment
Adopting jurisdictions will now have to prepare and document a risk assessment. As per 708.4(A), “In critical operations power systems, risk assessment shall be performed to identify hazards, the likelihood of their occurrence, and the vulnerability of the electrical system to those hazards.”
As with other NFPA documents, such as NFPA 110, “Standard for Emergency and Standby Power Systems,” which requires a written log of generator testing, technical committees assume that mandatory documentation forces facility managers to do certain things that would not otherwise be done if a written record were not required. Local public safety organizations may find that just putting people around a table to do a risk assessment adds value to existing, everyday routines — not just during times of operational discontinuity.
One simple method of producing a risk assessment is as follows:
A) Identify control boundaries of the system. Is the facility stand-alone or part of a multi-function building? For example, is it shelter in place, off-site, or system + system?
B) Identify every conceivable disaster for the facility and location. Examples appear in Sec. A.5.3.2 of NFPA 1600, “Standard on Disaster/Emergency Management and Business Continuity Programs.”
C) For each disaster, determine a relative probability (1 = less likely; 5 = more likely). This stage may entail research specific to location, industry, and other organizational characteristics.
D) Assess the business impact of each disaster (1 = moderate; 5 = severe). Consider the effect upon public safety officers, hazardous material handlers, dispatch personnel, etc. — anyone who would be involved in disaster recovery functions.
E) Assess your ability to respond to each disaster in light of resource availability. Does your organization have the requisite resources either in-house or through third parties? (1 = sufficient resources; 5 = absence of resources).
F) Calculate a rigorous risk assessment for each disaster by multiplying columns C, D, and E. Normalize the figures, and rank accordingly.
An example of one county's risk assessment appears in the Table above.
Another more sophisticated method for preparing a documented risk analysis appears in Sec. A.5.3 of the Explanatory Material in Annex A of NFPA 1600, where failure mode and fault-tree analysis are described. The validity of more sophisticated studies will be dependent upon the competence of the expert agency, on the credentials of the team, and on the depth of the team's analysis.
Availability and reliability
Risk assessments should be considered from two perspectives: basic reliability and mission reliability. Basic reliability is an all-series model correlated with a single component or part. Mission reliability can be a series, parallel, standby redundant, or complex network. Both are separate but companion products that are essential to quantify the reliability of a system adequately. The incorporation of redundancies and alternate modes of operation to improve mission availability invariably decreases basic reliability because it increases the demand for maintenance and support.
This is a subtle idea that would be brought to light with further analysis. You can see the basic idea here by looking at how two systems can be identically available but anything but identical. (See Availability Comparison.)
Annex F in NFPA 1600 presents failure mode and effects analysis (FMEA) and fault-tree analysis (FTA). The broad contours of the process is as follows:
-
Develop an equipment tree.
-
Conduct a first run FMEA.
-
Assign maintenance focus levels based on criticality, applying RCM decision logic.
-
Identify maintenance tasks.
-
Re-run the FMEA analysis that matches the availability of maintenance funding.
The number of discrete mechanical and electrical components in a DCOA area requires system reduction to reduce the nested series-parallel complexity of numerous components into smaller groups of critical components — each of which will have its own reliability metric. As with short-circuit calculations, load flow, and other standard power system calculations, much depends upon the assumptions engineers make, the accuracy of input data, and how they simplify complex sub-systems in order to apply the analytic or iterative tools (Figure).
Commercial software that is available to do the risk assessment can get no more granular than the level of detail for which information is available and for which failure rate (or equivalent data) can be applied. It is here where you have to enter a “user-defined library” of data to prepare the fault tree that you finally realize the importance of the U.S. Army Corp of Engineers Power Reliability Enhancement Program (PREP) databases. (See What Matters Gets Measured.)
Avoiding a crisis
This would seem to be a lot of work if you are a Tier I public safety department and simply want to install your first onsite generator. Considering that the cost per square foot of constructing electrical systems in the majority of general occupancies ranges from $10 to $100, it's not unreasonable to assume that the first cost per square foot of constructing a DCOA will run from $100 to $1,000 per square foot. Much depends upon site-specific factors, such as whether the DCOA is freestanding or part of a multi-function building and how many “nines” of availability the mitigation strategy requires. The cost of consulting services from expert agencies that specialize in business continuity facilities could be correlated along these lines. Keep in mind that the cost of A/E services is typically less than 1% of a general occupancy facility life-cycle cost. The cost of an A/E to design a DCOA may be double that amount. Be careful not to exaggerate the need for local adaptation to site-specific factors, however, a core principle of availability lies in standardization.
The NEC was first developed based on a concern for fire safety. In fact, the NFPA itself was founded by insurance companies that needed to manage losses due to fire. Data was essential in efforts to increase the practical safeguarding of persons and property from hazards arising from the use of electricity. Department of Homeland Security officials warn that more than 90% of organizations that suffer a significant data loss, for example, will be out of business within two years of the incident.
The profit-driven power security effort that has taken place in the business continuity industry needs to be conveyed into the public safety sector. The statistical foundation provided by the U.S. Army Corp of Engineers PREP databases provides the electrical industry with a basis for transforming “opinions” about power security into the realm of science. The PREP reliability bases are the only one of their kind providing a comprehensive consolidated source of reliability information for all types of facilities from the service transformer in. The PREP program will continue study into the incremental cost of additional nines of availability. This data will find its way to IEEE for use by electrical professionals who will continue to guide facility management decisions needed to protect our homeland.
Anthony is a senior electrical engineer at the University of Michigan in Ann Arbor, Mich. He represents the Association of Higher Education Facility Officers on Code Panel 1 of the NEC. Arno is director of the Command, Control, Communications, Computer, Intelligence, Surveillance and Reconnaissance (C4ISR) Group at Einhorn, Yaffee and Prescott Mission Critical Facilities in Whitesboro, N.Y., and is chairman of the IEEE committee that produces the “Gold Book,” IEEE 493 - Recommended Practice for the Design of Reliable Industrial and Commercial Power Systems. Stoyas is chief of the Special Mission Office's Power Reliability Enhancement Program at the U.S. Army Corps of Engineers in Fort Belvoir, Va. He is a member of Code Panel 20 that developed Art. 708. He also represents the military facilities industry on NFPA 70B.
Sidebar: What Matters Gets Measured
Nestled in the U.S. Army's Ft. Belvoir complex outside of Washington D.C., the U.S. Army Corps of Engineers (USACE), Special Missions Office (SMO), a small team of engineers have labored on mission-critical facility design, operation, and maintenance for many years. In the early '80s, Secretary of Defense Caspar W. Weinberger signed a memorandum directing the Secretary of the Army, in coordination with the secretaries of the other military departments, to initiate investigative efforts, develop design criteria and standards, and propose solutions and implementation plans for the modernization of electrical power and systems at selected command, control, communications, computer, intelligence, surveillance, and reconnaissance (C4ISR) facilities. The Secretary of the Army, in turn, assigned this project to the U.S. Army Corps of Engineers, Power Reliability Enhancement Program (PREP).
Initially, PREP used survey data and criteria found in the 1991 edition of IEEE 493 because it was the best resource available at the time. Availability data on the pertinent factors (e.g., cause and type of failures, maintenance procedures, repair method, etc.) is necessary to characterize the performance of electrical equipment in service. Produced by volunteer IEEE committees, statistical samples were few and not suitable for military missions. PREP funded a data collection effort similar to the effort undertaken by the aerospace defense industry.
Concurrently, PREP was conducting analyses for various critical-mission customers throughout the Department of Defense. When PREP started using the data, it became obvious that additional data would be needed to reflect new technologies, the difference between calendar and operational time, etc., in many more facilities. It was also observed that new equipment was exhibiting significant increases in availability, with corresponding decreases in required maintenance and the occurrence of failures. Information was obtained on a variety of commercial and industrial facility types (including office buildings, hospitals, water treatment facilities, prisons, utilities, factories, universities, and bank computer centers) with varying degrees of maintenance quality. To provide the best reliability numeric, PREP workgroups recognized that technology was improving reliability as the data collection window was moving.
The PREP data collection effort rolled out into the latest editions of NFPA 70B, “Recommended Practice for Electrical Equipment Maintenance,” and the IEEE “Gold Book” is the culmination of a 24,000 man-hour effort to collect reliability, availability, and maintenance (RAM) data on 239 power generation, power distribution, and HVAC items, including gas turbine generators.
Additional technical manuals covering power security are available at the U.S. Army Corps of Engineer's Web site at http://www.army.mil/usapa/eng/index.html.
Sample risk assessment from the emergency management division of Washtenaw County, Mich. Note that earthquakes in southeastern Michigan are ranked relatively low and that infrastructure hazards are relatively high.
Sidebar: Reliability vs. Availability
Annex F contains a description of reliability and availability. For the purpose of this article, we refine and add to the use of these terms as follows:
reliability (lowercase “r”) — a number, typically expressed as a percentage that reflects the probability and frequency of failures and is expressed as a probability over a given duration of time cycles.
Reliability (uppercase “R”) — A term used in common language that reflects the overall state of a system (the Fine Print Note of Sec. 700.12, for example.
Availability — Always measured in terms of percentage of uptime vs. downtime; the closer to 100% the better.
Sidebar: Availability Comparison
As Annex F in the 2008 NEC indicates, the availability of a critical operations power system (COPS) is measured by the percentage of time that the system is in service. Given a specified level of availability, the reliability and maintainability requirements are then derived based on that availability requirement. Using these equations, you can compare two hypothetical systems. (MTBF = mean time before failure; MTTR = mean time to repair)
Although both systems availability metrics are the same, the systems are not equal.
Alpha: MTBF/(MTBF+MTTR) = 500/(500+0.5) = 0.999001
Bravo: MTBF/(MTBF+MTTR) = 20000/(20000+20) = 0.999001
An outage for one-half hour in system Alpha could be satisfied by a UPS, chilled water storage system, or other means. An outage in system Bravo for 24 hours, depending upon mission requirements, may be unacceptable regardless of the frequency.