in our illustrations that follow. Consequently, early
work (Alt, 1985; Doganaksay et al., 1991) required
improvement. Subsequent work under normal theory
considered joint distributions of all subsets of vari-
ables (Mason et al., 1995; Chua and Montgomery,
1992; Murphy, 1987). However, this results in a
combinatorial explosion of possible subsets for even
a moderate number of variables. In (Rencher, 1993)
and (Runger et al., 1996) an approach based on con-
ditional distributions was used that resulted in feasi-
ble computations, again for normally distributed data.
Only one metric was calculated for each variable.
Furthermore, in (Runger et al., 1996) a number of rea-
sonable geometric approaches were defined and these
were shown to result in equivalent metrics. Still, one
metric was computed for each variable. This idea is
summarized briefly in a following section. Although
there are cases where the feasible approaches used in
(Rencher, 1993) and (Runger et al., 1996) are not suf-
ficient, they are effective in many instances, and the
results indicate when further analysis is needed. This
is illustrated in a following section.
The method proposed here is a simple, computa-
tionally feasible approach that can be shown to gen-
eralize the normal-theory methods in (Rencher, 1993)
and (Runger et al., 1996). Consequently, it has the ad-
vantage of equivalence of a traditional solution under
traditional assumptions, yet provides a computation-
ally and conceptually simple extension. In Section 2
a summary is provided of the use of an artificial con-
trast with supervised learning is to generate a control
region. In Section 3 the metric used for contributions
is presented. The following section present illustra-
tive examples.
2 CONTROL REGION DESIGN
Modern data collection techniques facilitate the col-
lection of in-control data. In practice, the joint distri-
bution of the variables for the in-control data is un-
known and rarely as well-behaved as a multivariate
normal distribution. If specific deviations from stan-
dard operating conditions are not a priori specified,
leaning the control region is a type of unsupervised
learning problem. An elegant technique can be used
to transform the unsupervised learning problem to a
supervised one by using an artificial reference distrib-
ution proposed by (Hwang et al., 2004). This is sum-
marized briefly as follows.
Suppose f(x) is an unknown probability density
function of in-control data, and f
0
(x) is a specified
reference density function. Combine the original data
set x
1
, x
2
, ..., x
N
sampled from f
0
(x) and a random
sample of equal size N drawn from f
0
(x).If we as-
sign y = −1 to each sample point drawn from f(x)
and y = 1 for those drawn from f
0
(x), then learning
control region can be considered to define a solution
to a two-class classification problem. Points whose
predicted y are −1 are assigned to the control region,
and classified into the “standard” or “on-target” class.
Points with predicted y equal to 1 are are classified
into the“off-target” class.
For a given point x, the expected value of y is
µ(x) = E(y|x) = p(y = 1|x) − p(y = −1|x)
= 2p(y = 1|x) − 1
Then, according to Bayes’ Theorem,
p(y = −1|x) =
p(y = −1|x)
p(x)
=
p(x| − 1)p(y = −1)
p(x| − 1)p(y = −1) + p(x|1)p(y = 1)
=
f(x)
f(x) + f
0
(x)
(1)
where we assume p(y = 1) = p(y = −1) for train-
ing data, which means in estimating E(y|x) we use
the same sample size for each class. Therefore, an
estimate of the unknown density f(x) is obtained as
ˆ
f(x) =
1 − bµ(x)
1 + bµ(x)
× f
0
(x), (2)
where f
0
(x) is the known reference probability den-
sity function of the random data and ˆµ(x)is learned
from the supervised algorithm. Also, the odds are
p(y = −1|x)
p(y = 1|x)
=
f(x)
f
0
(x)
(3)
The assignment is determined by the value of ˆµ(x).
A data x is assigned to the class with density f (x)
when
bµ(x) < v,
and the class with density f
0
(x) when
bµ(x) > v.
where v is a parameter that can used to adjust the error
rates of the procedure.
Any supervised learner is a potential candidate to
build the model. In our research, a Regularized Least
Square Classifier (RLSC) (Cucker and Smale, 2001)
is employed as the specific classifier. Squared error
loss is used with a quadratic penalty term on the co-
efficients (from the standardization the intercept is
zero). Radial basis functions are used at each ob-
served point with common standard deviation. That
is the mean of y is estimated from
µ(x) = β
0
+
n
X
j=1
β
j
exp
−
1
2
kx − x
j
k
2
/σ
2
= β
0
+
n
X
j=1
β
j
K
σ
(x, x
j
) (4)
ICINCO 2005 - INTELLIGENT CONTROL SYSTEMS AND OPTIMIZATION
4