
attractive for model validation. A test would
compare a sample of observations taken from the
target population against a sample of predictions
taken from the model. Not surprisingly, a number of
statistical tools have been applied to validation
problems. For example, Freese (1960) introduced an
accuracy test based on the standard χ
2
tests.
Ottosson and Håkanson (1997) used R
2
and
compared with so-called highest-possible R
2
, which
are predictions from common units (parallel time-
compatible sets). Jans-Hammermeister and McGill
(1997) used an F-statistic-based lack of fit test.
Landsberg et al. (2003) used R
2
and relative mean
bias. Bartelink (1998) graphed field data and
predictions with confidence intervals. Finally,
Alewell and Manderscheid (1998) used R
2
and
normalized mean absolute error (NMAE).
In practice, simulations are usually validated by
considering not one but several output measures
(e.g., expected waiting time, expected queue length,
etc.). In this case, one could in principle validate the
simulation for each output measure individually, as
discussed previously. However, these output
measures will in general be dependent. In some
cases, it may be possible to model this dependence
explicitly – e.g., using a multivariate normal
distribution. The aim of this study was to develop
and use criteria, which permit an objective
comparison of different models to the observed field
data and to each other. A given model, which
describes a specific system significantly better, will
be declared the ‘valid’ model while the other will be
rejected. The term ‘valid’ is used here in a sense that
any model that could not be proven invalid would be
a valid model for the system.
Real plants are, in general, time-varying for
various reasons, such as plant operating point
changes, component aging, equipment wear, heat
and material transfer degradation effects.
In this paper, we propose an effective technique
for validation of simulation models (static or
dynamic), performing the UMPI test for comparison
of a real process data set and data sets of several
simulation models.
2 TESTING THE VALIDITY OF A
SIMULATION MODEL
Suppose that we desire to validate a kth multivariate
stationary response simulation model of an
observable process, which has p response variables.
Let x
ij
(k) and y
ij
be the ith observation of the jth
response variable of the kth model and the process
under study, respectively. It is assumed that all
observation vectors, x
i
(k)=(x
i1
(k), ..., x
ip
(k))′, y
i
=(y
i1
,
..., y
ip
)′, i=1(1)n, are independent of each other,
where n is a number of paired observations. Let
z
i
(k)=x
i
(k)−y
i
, i=1(1)n, be paired comparisons
leading to a series of vector differences. Thus, for
testing the validity of a simulation model of a real,
observable process, it can be obtained and used a
sample of n independent observation vectors
Z(k)=(z
1
(k), ... ,z
n
(k)). Each sample Z(k), k∈{1, …,
m}, is declared to be realization of a specific
stochastic process with unknown parameters.
In this paper, for testing the validity of the kth
simulation model of a real, observable process, we
propose a statistical approach that is based on the
generalized maximum likelihood ratio. In using
statistical hypothesis testing to test the validity of a
simulation model under a given experimental frame
and for an acceptable range of accuracy consistent
with the intended application of the model, we have
the following hypotheses:
H
0
(k): the kth model is valid for the acceptable
range of accuracy under a given experimental frame;
H
1
(k): the kth model is invalid for the acceptable
range of accuracy under a given experimental frame.
(1)
There are two possibilities for making a wrong
decision in statistical hypothesis testing. The first
one, type I error, is accepting the alternative
hypothesis H
1
(k) when the null hypothesis H
0
(k) is
actually true, and the second one, type II error, is
accepting the null hypothesis when the alternative
hypothesis is actually true. In model validation, the
first type of wrong decision corresponds to rejecting
the validity of the model when it is actually valid,
and the second type of wrong decision corresponds
to accepting the validity of the model when it is
actually invalid. The probability of making the first
type of wrong decision will be called model
builder’s risk (
α
(k)) and the probability of making
the second type of wrong decision will be called
model user’s risk (
β
(k)). Thus, for fixed n, the
problem is to construct a test, which consists of
testing the null hypothesis
H
0
(k): z
i
(k) ∼ N
p
(0,Q(k)), ∀i = 1(1)n, (2)
where Q(k) is a positive definite covariance matrix,
versus the alternative
H
1
(k): z
i
(k) ∼ N
p
(a(k),Q(k)), ∀i = 1(1)n, (3)
where a(k)=(a
1
(k), ... ,a
p
(k))′≠(0, ... ,0)′ is a mean
vector. The parameters Q(k) and a(k) are unknown.
It will be noted that the result of Theorem 1
given below can be used to obtain test for the
hypothesis of the form H
0
: z
i
(k) follows
IDENTIFYING AN OBSERVABLE PROCESS WITH ONE OF SEVERAL SIMULATION MODELS VIA UMPI TEST
153