RELIABILITY OF SMART GRID SYSTEMS WITH WARM
STANDBY SPARES AND IMPERFECT COVERAGE
Rui Peng
1
, Ola Tannous
2
, Liudong Xing
2
and Min Xie
3
1
University of Science & Technology Beijing, Beijing, China
2
University of Massachusetts Dartmouth, Dartmouth, MA, U.S.A.
3
Department of Systems Engineering & Engineering Management, City University of Hong Kong, Hong Kong
Keywords: Binary Decision Diagram, Fault Tolerance, Imperfect Fault Coverage, Reliability, Smart Grids, Warm
Standby Spares.
Abstract: This paper models the reliability of a smart grid system with warm standby spares and imperfect fault
coverage based on binary decision diagrams (BDD). In order to meet stringent reliability requirement, it is
essential for a smart grid system to be designed with fault tolerance. The Warm standby SParing (WSP) is
an important fault tolerance technique which compromises the energy consumption and the recovery time.
For WSP, the standby units have different failure rates before and after they are used to replace the on-line
faulty units. Furthermore a component failure may propagate through the grid and cause the whole system
to fail if the failure is uncovered. Existing works on systems with warm standby spares and imperfect fault
coverage are restricted to some special cases, such as assuming exponential failure time distribution for all
components or only considering one spare. The BDD approach proposed in this paper can overcome the
limitations of the existing approaches. Examples are shown to illustrate the application.
1 INTRODUCTION
It is crucial for a smart grid system to be designed
with fault tolerance in order to reach high reliability
(Coll-Mayor et al., 2004; Iwayemi et al., 2010; Wu
and Zhou, 2011). There are different techniques to
achieve fault tolerance in grid systems, typically hot,
cold and warm standby sparing to adapt to different
situations (Tannous et al., 2011b). Hot standby
SParing (HSP) is used as a failover mechanism to
provide reliability in system configurations. The hot
spare is active and connected as part of a working
system. This type of sparing is generally used for
applications for which the recovery time is critical.
For Cold standby SParing (CSP) the spare unit is
powered up only when the online unit fails and
needs to be replaced. CSP is typically used for
applications for which energy consumption is
critical. Warm standby SParing (WSP) compromises
the energy consumption and the recovery time; the
spare components are partially powered up when the
primary component is operational and it is fully
powered up only after the primary component fails.
For WSP systems, the standby units have time-
dependent failure behavior; they have different
failure rates, in general, different time-to-failure
distributions before and after they are used to
replace the on-line faulty units.
Existing approaches for analyzing the reliability
of systems with warm standby spares include
Markov-based methods, simulation-based methods,
and combinatorial methods. The Markov methods
suffer from the well-known state space explosion
problem (Ke et al., 2008a) and are typically
applicable to exponential time-to-failure
distributions for the system components. The
simulation-based methods, for instance, Monte-
Carlo simulations, are usually computationally
expensive and time-consuming, especially when
results of high accuracy are desired (Ke et al.,
2008b). A combinatorial approach was proposed by
Lee et al. (2009), which enumerates all the minimal
cut sets or sequences, and then applies the
inclusion/exclusion formula to calculate the system
reliability. The enumeration of the minimal cut
sets/sequences and the inclusion/exclusion
expansion makes the complexity of the method
doubly exponential. Another combinatorial approach
based on binary decision diagrams (BDD) is
proposed (Tannous et al., 2011a) to evaluate the
61
Peng R., Tannous O., Xing L. and Xie M..
RELIABILITY OF SMART GRID SYSTEMS WITH WARM STANDBY SPARES AND IMPERFECT COVERAGE.
DOI: 10.5220/0003947300610066
In Proceedings of the 1st International Conference on Smart Grids and Green IT Systems (SMARTGREENS-2012), pages 61-66
ISBN: 978-989-8565-09-9
Copyright
c
2012 SCITEPRESS (Science and Technology Publications, Lda.)
reliability of WSP systems without consideration of
imperfect fault coverage.
Even in case where a smart grid is designed with
adequate redundancy, a single uncovered failure
may propagate through the system and lead to the
overall system failure (Pepyne, 2007; Dobson et al.,
2007; Aranya and Marija, 2011). This occurrence is
known as imperfect fault coverage (IFC), see
Bouricius et al. (1969), Arnold (1973) and Xing
(2007). Due to the imperfect fault coverage, the
system reliability cannot increase unlimitedly with
the increase of the system redundancy (Amari et al.,
2004; Myers, 2008; Levitin, 2008; Peng et al.,
2011). The simple and efficient algorithm (SEA) is a
well-known approach used to incorporate imperfect
coverage into combinatorial methods (Amari et al.,
1999; Levitin et al., 2012). The SEA approach works
well only for systems with static redundancy but
doesn’t work for systems with time or sequential
dependency. Some researchers have studied the
availability of a WSP system with repair distribution
and imperfect coverage (Ke et al., 2008a; Ke et al.,
2008b; Ke et al., 2010). But those works are limited
to systems with only one spare. Some other studies
of WSP with imperfect coverage are restricted to the
case where the failure time of each component
follows an exponential distribution (Lee et al., 2009;
Hsu et al., 2009; Ke et al., 2008c).
Incorporating the imperfect fault coverage is a
challenging task, especially, for the reliability
analysis of a smart grid system with warm standby
spares which is complex to start with. In this work,
the BDD method (Tannous et al., 2011a) is extended
to study the reliability of a smart grid system with
WSP when imperfect fault coverage exists. Some
new rules are introduced for the BDD construction
and the system unreliability evaluation in order to
capture the effect of imperfect coverage and the
time-dependency of failures. The proposed approach
is general and can be applied to any dynamic
system, in particular the WSP, with components
subject to imperfect coverage. It is not limited to
WSP with only one spare but works as well for WSP
with n-spares having any time-to-failure distribution.
Section 2 introduces the background of BDD.
Section 3 presents the procedures of the BDD-based
approach. A grid system with one warm standby
spare and a grid system with two warm standby
spares are presented in Section 4 to illustrate the
proposed method. Section 5 summarizes the paper
and points out some future directions.
2 BINARY DECISION DIAGRAM
The binary decision diagram (BDD) was initially
developed as a tool for validating VLSI circuitry
design by Bryant (Bryant, 1986). The BDD method
provides an efficient and exact way to analyze static
fault trees. In general, BDD requires less
computational time than other existing fault tree
reliability analysis methods as shown by many
studies (Chang et al., 2005; Xing and Dugan, 2002;
Yeh et al., 2002). BDD uses Shannon decomposition
for its direct acyclic graph as:
),,(..
0101 ====
=+=
xxxx
ffxitefxfxf
(1)
where f represents a Boolean expression for a set of
Boolean random variables X and x being a member
of X. The two terminal nodes labelled “1” and “0” in
the BDD represent the system being in the failure
and operational states respectively. The advantage of
this method is that the two sub expressions are
disjoint. Therefore, the total failure probability of the
system can be calculated as the sum of all the
disjoint paths that lead to the sink node "1". These
paths represent all combinations of the failure and
non-failure of components that lead to the entire
system failure.
The BDD is generated via a bottom-up traversal
of the fault tree, applying the following
manipulation rules (Bryant, 1986):
>
<
=
=
=
)()( ) , ,(
)()( ) , ,(
)()( ) , ,(
),,( ),,(
21
21
2211
2121
yindexxindexHGHGyite
yindexxindexHGHGxite
yindexxindexHGHGxite
HHyiteGGxiteHG
(2)
where G and H represent two Boolean expressions
corresponding to the traversed sub-fault trees. The
logical operation (AND, OR) is represented by .
3 THE BDD-BASED APPROACH
The BDD method is usually used for static systems
and some additional rules need to be applied to
encompass the time dependency of warm spare
failures and the effect of imperfect fault coverage in
the smart grid system.
3.1 BDD Construction
An individual component is represented by BDD as
shown in Figure 1. The BDD is constructed
SMARTGREENS2012-1stInternationalConferenceonSmartGridsandGreenITSystems
62
Figure 1: The representation of primary and spare components in BDD.
iteratively by combining the BDD representing A,
S
1
, …, S
n
in sequence. Besides the basic Shannon
decomposition rules represented by (2), the
following additional rules need to be applied:
1. Since the system fails in case of any global
failure regardless the status of any other
component, the right child of X
G
(which can
be either A
G
, S
iG
(α), or S
iG
(λ)) is always 1.
2. If the primary doesn’t fail, only global
failure of subsequent spares can cause the
system to fail. Thus, following the left
branch of S
iL
(λ) or A
L
, there will only be
S
jG
(α), where j>i.
3. S
jG
(α) cannot exist if all the components
before it (A, S
1
,...,and S
j-1
) have failed
locally. Actually when A, S
1
,..., and S
j-1
have all failed locally, S
j
either has already
failed locally or is powered up. In either
case, S
jG
(α) will not happen.
3.2 System Unreliability Evaluation
The unreliability of the smart grid system with warm
standby spares can be evaluated as the sum of
probabilities of all the disjoint paths from the root
node to sink node "1" in the BDD model.
Specifically we have to distinguish the three kinds of
sequence
()
LL
PS
,
()
GL
SP
, and
()
LL
SP
.
{}{ }
122
00
1
)()(
)(PrPr
1
ττττ
α
α
τ
λ
ddff
SPPPS
LL
S
t
P
LLGLL
∫∫
=
¬=
(3)
{
}
{
}
12
0
12
0
1
1
1
)(1)()(
)()(PrPr
τττττττ
λα
τ
αλ
τ
λ
dddfff
SSPPSP
SS
tt
P
GLLGGL
GL
∫∫
=
¬¬=
(4)
{
}
{
}
12
0
12
0
1
1
1
)(1)()(
)()()(PrPr
τττττττ
λλα
τ
αλ
τ
λ
dddfff
SSSPPSP
SS
tt
P
LGLLGLL
LL
∫∫
=
¬¬¬=
(5)
where “
¬
” denotes logical relationship “negation”.
The probability density function f can be in any
distribution.
4 ILUSTRATIVE EXAMPLES
This section considers a smart grid system with one
warm standby spare and a smart grid system with
two warm standby spares for illustration.
4.1 Warm Standby with One Spare
The BDD for a WSP with one spare can be
constructed by combining the BDD of the primary
and the BDD of the spare, as shown in Figure 2.
RELIABILITYOFSMARTGRIDSYSTEMSWITHWARMSTANDBYSPARESANDIMPERFECTCOVERAGE
63
Figure 2: BDD of a WSP with one spare.
The system unreliability can be obtained by
adding up the probabilities of all the paths leading to
1-terminal as
1
00
221
1212
0
22
00
2211
0
1
])(1)[(
])().)(1(
)()[( )(
)](Pr[
))]}()()(()(
)([Pr{)Pr(
1
1
1
1
ττττ
ττττττ
τττττ
α
λλλα
α
τ
λα
τ
λ
τ
α
τ
αλλ
ddff
ddfdf
dffdf
SAA
SSSS
SAAAUR
t
AS
t
SS
t
SA
t
A
GLG
LGGL
LLGG
G
LLG
∫∫
+
+
∫∫
+
=
¬¬+
¬+¬+
¬+=
(6)
where UR denotes the system unreliability.
Due to imperfect fault coverage, the unreliability
of a WSP with one spare may be even higher than
the unreliability of the system with only the primary
component, if the global failure rate of the spare is
high. In real applications, it is advisable to take the
cost and the uncertainty of the global failure rate and
other parameters into consideration.
4.2 Warm Standby with Two Spares
The BDD for a WSP with two spares can be
obtained by combining the BDD in Figure 2 with
one warm standby spare as in Figure 3.
Figure 3: The BDD of a WSP with two spares.
According to the BDD, the system unreliability
can be evaluated as
3
0
22
00
113
2
00
112
31221
0
221
00
132
12323
0
332
0
332121
121
0
2
0
211
123232
0
332
0
332
00
211
1
0
1
21
1
2111
22222
1111
2222
21
])(1[
])(1)[(
])(1)[(
])(1[
])(1[ )()(
}])().)(1(
)()[(
)({))(1)((
])().)(1(
)( [)()(
)(
))()(Pr(
))(Pr(
))()()()(Pr(
))]})()()(()(
)([
)()()({)(Pr(
))]}()()(()(
)([)(Pr{)Pr(
3
1
3
2
2
1
3
1
33
2
2
2
2
1
1
1
1
11
τττ
τττ
ττττ
ττττ
ττττ
τττττττ
ττττ
τττττ
τττττττ
ττττ
ττ
αα
α
αλλα
λλλαα
λλλα
λλλα
αα
τ
α
τ
λα
τ
λα
τ
τ
λ
τ
α
τ
λα
τ
λ
τ
α
τ
αλ
τ
λ
τ
αλ
τ
λ
τ
α
τ
α
τ
αλ
λ
ddf
dff
ddff
dddf
dfff
dddfdf
dff
fdff
dddfdf
dfff
df
SSAA
SAA
SSSSAA
SSSSS
SSSSAA
SSSS
SSAAAUR
G
S
A
G
S
A
G
S
LG
S
LL
GL
LLL
G
t
t
S
S
t
AS
t
S
SS
t
S
t
SA
t
SS
S
t
SA
t
A
GGLG
GLG
GLGLLG
GGGLL
LGGLLG
GGGL
LLLGG
∫∫
+
∫∫
+
∫∫
+
+
+
∫∫
+
+
∫∫
+
=
¬¬¬¬+
¬¬+
¬¬¬¬+
¬+¬+
¬+¬¬+
¬+¬
+¬+=
(7)
A
G
S
G
(
λ
)
A
L
1
1
0
0
S
G
(α)
1
S
L
(α)
1
S
L
(
λ
)
1
SMARTGREENS2012-1stInternationalConferenceonSmartGridsandGreenITSystems
64
Similarly the WSP with two spares does not
necessarily have a lower unreliability than the WSP
with only one spare or even no spare due to the
propagation of global failure. An extreme case is
that the primary is perfect and spare components
only fail globally. Even in case when two spares are
preferred, the system unreliability is influenced by
the order of the two spares and the primary
component. Parameters of component costs, failure
time distributions, and global failure rate should be
estimated. Sensitivity analysis is also required in real
applications.
5 CONCLUSIONS
This paper studies the reliability of a smart grid
system with warm standby spares and the existence
of imperfect fault coverage. For warm standby
sparing, the standby units have different failure rates
before and after they are used to replace the on-line
faulty units. Furthermore a component failure may
propagate through the grid system and cause the
whole system to fail if the failure is uncovered. It is
a challenging task to incorporate imperfect fault
coverage into systems with warm standby spares.
The existing approaches are restricted to special
cases, such as assuming exponential failure
distribution for all the system components or
limiting the number of spares to be one. A BDD-
based approach is proposed and procedures for BDD
construction and system unreliability evaluation are
presented and illustrated. It can work well for warm
standby systems with n-spares having any arbitrary
type of time-to-failure distributions.
REFERENCES
Amari, S. V., Dugan, J.B., Misra, R.B., 1999. A separable
method for incorporating imperfect fault-coverage into
combinatorial model. IEEE Transactions on
Reliability 48 (3), 267–274.
Amari, S., Pham, H., Dill, G., 2004. Optimal design of k-
out-of-n: G subsystems subjected to imperfect fault-
coverage. IEEE Transactions on Reliability 53, 567-
575.
Aranya, C., Mariya, D., 2011. Control and Optimization
Methods for Electric Smart Grids. Spinger-Verlag,
New York.
Arnold, T. F., 1973. The concept of coverage and its effect
on the reliability model of a repairable system. IEEE
Transactions on Computers 22 (3), 325–339.
Bouricius, W. G., Carter, V., Schneider, P. R., 1969.
Reliability modeling techniques for self-repairing
computer systems. In Proceedings of the 24th
National Conference, 295-309.
Bryant, R., 1986. Graph based algorithms for Boolean
function manipulation. IEEE Transactions on
Computers 35 (8), 677-691.
Chang, Y. R., Amari, S. V., Kuo, S. Y., 2005. OBDD-
based evaluation of reliability and importance
measures for multistate systems subject to imperfect
fault coverage. IEEE Transactions on Dependable and
Secure Computing 2 (4), 336-347.
Coll-Mayor, D., Picos, R., Garcia-Moreno, E., 2004. State
of the art of the virtual utility: the smart distributed
generation network. International Journal of Energy
Research 28 (1), 65-80.
Dobson, I., Carreras, B. A., Lynch, V. E., Newman, D.E.,
2007. Complex systems analysis of series of
blackouts: cascading failure, critical points, and self-
organization. Chaos 17 (2), 026103.
Hsu, Y., Lee, S., Ke, J., 2009. A repairable system with
imperfect coverage and reboot: Bayesian and
asymptotic estimation. Mathematics and Computers in
Simulation 79 (7), 2227-2239.
Iwayemi, A., Yi, P., Liu, P., Zhou, C., 2010. A perfect
power demonstration system. Innovative Smart Grids
Technologies (ISGT) 2010, 1-7.
Ke, J., Huang, H., Lin, C., 2008a. A redundant repairable
system with imperfect coverage and fuzzy parameters.
IEEE Transactions on Reliability 57 (4), 595-606.
Ke, J., Lee, S., Hsu,Y., 2008b. Bayesian analysis for a
redundant repairable system with imperfect coverage.
Applied Mathematical Modelling 32 (12), 2839-2850.
Ke, J., Lee, S., Hsu, Y., 2008c. On a repairable system
with detection, imperfect coverage and reboot:
Bayesian approach. Simulation Modelling Practice
and Theory 16 (3), 353-367.
Ke,J., Su, Z., Wang, K., 2010. Simulation inferences for
an availability system with general repair distribution
and imperfect fault coverage. Simulation Modelling
Practice and Theory 18 (3), 338-347.
Lee, S., Kee, J., Hsu, Y., 2009. Bayesian assessing for a
repairable system with standby imperfect switching
and reboot delay. International Journal of Systems
Science 40 (11), 1149-1159.
Levitin, G., 2008. Optimal structure of multi-state systems
with uncovered failures. IEEE Transactions on
Reliability 57 (1), 140-148.
Levitin, G., Ng, S. H., Peng, R., Xie, M., 2012. Reliability
of systems subjected to imperfect fault coverage.
Chapter 6 in Stochastic Reliability and Maintenance
Modeling -Essays in Honor of Professor Shunji Osaki
on his 70th Birthday (T. Dohi and T. Nakagawa, eds.),
Springer. (to appear).
Myers, A., 2008. Achievable limits on the reliability of k-
out-of-n: G systems subject to imperfect fault
coverage. IEEE Transactions on Reliability 57 (2),
349-354.
Peng, R., Levitin, G., Xie, M., Ng, S. H., 2011. Reliability
modeling and optimization of multi-state systems with
multi-fault coverage. To appear in MMR2011.
Pepyne, D. L., 2007. Topology and cascading line outages
RELIABILITYOFSMARTGRIDSYSTEMSWITHWARMSTANDBYSPARESANDIMPERFECTCOVERAGE
65
in power grids. Journal of Systems Science and
Systems Engineering 16 (2), 202-221.
Tannous, O., Xing, L., Dugan, J. B., 2011a. Reliability
Analysis of Warm Standby Systems using Sequential
BDD. Proc. of The 57th Annual Reliability &
Maintainability Symposium, FL, USA, January 2011.
Tannous, O., Xing, L., Peng, R., Xie, M., Ng, S. H.,
2011b. Redundancy allocation for series-parallel
warm-standby systems. The IEEE International
Conference on Industrial Engineering and
Engineering Management, Singapore, December
2011.
Wu, D., Zhou, C., 2011. Fault-tolerant and scalable key
management for smart grids. IEEE Transactions on
Smart Grids 2 (2), 375-381.
Xing, L., 2007. Reliability evaluation of phased-mission
systems with imperfect fault coverage and common-
cause failures. IEEE Transactions on Reliability 56
(1), 58-68.
Xing, L., Dugan, J. B., 2002. Analysis of generalized
phased-mission systems reliability, performance and
sensitivity. IEEE Transactions on Reliability 51 (2),
199–211.
Yeh, F., Lu, S., Kuo, S., 2002. OBDD-based evaluation of
k-terminal network reliability. IEEE Transactions on
Reliability 51 (4), 443–451.
SMARTGREENS2012-1stInternationalConferenceonSmartGridsandGreenITSystems
66