4 STUDY
The method used to observe the need for a model up-
date was to determine when the model’s average pre-
diction error in the neighborhood of a new observa-
tion (APEN) differed from zero too much compared
with the amount of similar past cases. When there are
plenty of accurately predicted similar past cases, the
APEN is always near zero. When the amount of sim-
ilar past cases decreases, the sensitivity of the APEN
(in relation to measurement variation) increases, also
in situations when the actual model would be accu-
rate. In other words, the relationship between the
sensitivity of the APEN and the number of neighbors
is negatively correlated. The actual updating is also
time-dependent, which means the model is updated
when too many observations have an APEN value that
differs significantly from zero within a certain time
interval.
A limit, called the exception limit, was introduced
to detect the need to update the model. The limit de-
fines how high the absolute value of the average pre-
diction error of a neighborhood (= |APEN|) has to
be in relation to the size of the neighborhood before
it can be considered an exception. This design was
introduced to avoid possible sensitivity issues of the
APEN. In practice, if the size of the neighborhood
was 500 (the area is well known), prediction errors
higher than 8 were defined as exceptions, while with
a neighborhood whose size was 5, the error had to be
over 50. The values of the prediction errors used were
decided by relating them to the average predicted de-
viation,
¯
ˆ
σ
i
(≈ 14.4). The predicted deviations were
acquired by using a regression model (Juutilainen and
R
¨
oning, 2006). The limit is shown in Figure 1.
0 50 100 150 200 250 300 350 400 450 500
5
10
15
20
25
30
35
40
45
50
55
Size of neighborhood
Average estimation error
Figure 1: Exception limit.
A second limit, the update limit, was defined as
being exceeded if 10 percent of the average predic-
tion errors of the neighborhoods within a certain time
interval exceeded the exception limit. The chosen in-
terval was 1000 observations, which represents mea-
surements from approximately one week of produc-
tion. Thus, the model was retrained every time 100 of
the preceding 1000 observations exceeded the excep-
tion limit.
The study was started by training the parameters
of the regression model using the first 50,000 observa-
tions (approximately one year). After that the trained
model was used to give the APENs of new observa-
tions. The point where the update limit was exceeded
the first time was located and the parameters of the
model were updated using all the data acquired by
then. The study was carried on by studying the relia-
bility of the model after each update and repeating the
steps iteratively until the whole data set was passed.
5 RESULTS
First the reliability of the model with and without up-
dating was studied. Figure 2 shows the average pre-
diction errors of the respective neighborhoods. Fig-
ure 2(a) shows the model’s performance without up-
dating. The straight line in the beginning represents
the training data and the rest of the curve shows the
proportional share of the exceedings of the exception
limit, in other words, the percentage of APENs that
exceeded the exception limit. For example, it can be
seen that at the end of the curve the APEN of every
fourth new observation per week has exceeded the ex-
ception limit. The vertical lines mark the points of
iteration where the update limit has been exceeded.
On the other hand, Figure 2(b) presents the reliability
of the model when the parameters are re-estimated at
times indicated by the vertical lines. The curve repre-
sents the error rate of the re-trained model.
The positive effect of the updates on the average
prediction errors of the neighborhoods can be clearly
seen from Figure 2. Therefore, it is evident that the
model’s performance increases when the developed
updating strategy is applied.
Although the reliability information of the model
is very useful, the effect of the update on the actual
prediction error was considered more important. Two
different goodness criteria were used to compare the
actual prediction errors of the updated models. Both
criteria emphasize rare events, because that is the case
when the model is needed the most. Both of them
show the weighted average of the absolute prediction
errors, with different weighting schemes.
In the first goodness criterion the idea is to em-
phasize all the products by approximately an equal
amount. This means that in the calculation of the
average prediction error, the weights of observations
DETECTION OF THE NEED FOR A MODEL UPDATE IN STEEL MANUFACTURING
57