
has been processed. If no matching model is found
for the measurement identifier no action is taken,
and the next measurement is processed.
Once the batch has been processed, and the
resulting predictions returned, the web application
server compares the predictions against the
measurements. A database stores a set of thresholds
for each set of models, and for each measurement
identifier. Each measurement is compared to its
prediction (if one exists) to see if the difference is
greater than a threshold set using the prediction
visualisation interface. If so, an error is streamed to
be returned to the instrument. The measurement is
also compared against an absolute maximum and
minimum value expected. An error message is
streamed if the value is outside this range, even if
the difference between the prediction and
measurement values is within the threshold.
The Prediction Visualisation Display (Figure 2)
interface is used to fine-tune the thresholds. Selected
batches of measurements are coloured according to
their distance from the predicted values. The
threshold values can be adjusted, and the effect on
error detection visualised. The underlying predicted
values can be viewed by moving the mouse over a
measurement. This interface also enables different
model sets to be tested. New models can be added to
the models database, and chosen from the drop down
list-box. The current implementation builds a model
tree for each measurement but it is possible with this
architecture to use a variety of different predictive
models.
The flow of data for the Prediction Visualisation
Display is as described above, except the user
interface provides the measurements to be predicted
from a database of historical measurements. The
web application server builds a web page, colouring
measurements according to their distance from the
prediction values, and if they lie within expected
minimums and maximums. Caution is required when
interpreting results in this system. For example,
suppose the system colours a given measurement as
beyond the threshold from its prediction. This does
not necessarily imply that this measurement is
incorrect. The situation may be due to an incorrect
value for another measurement that is used to predict
the given measurement. Our emphasis is on alerting
errors, the operator has the task of finding out which
measurement(s) is causing the problem.
5 CONCLUSION AND FUTURE
WORK
We have described a system for alerting errors to
operators of an instrumentation system. The system
uses predictive models trained on historical data to
provide a baseline value for a measurement against
which an actual measurement value is compared.
A display tool is used to generate the thresholds
(margins of error) for the predictions, and visualise
their accuracy on historical data.
After a period of time it might be necessary to re-
train the models using the data that has become
available in the interim. The re-training process is
somewhat cumbersome but necessary for many
regression techniques because they are not usually
incremental.
A recent development in the field of data mining
is the idea of learning from data streams. A data
stream is generated from a large, potentially infinite,
data source. A learning algorithm capable of
learning models from a stream must be able to learn
models in a finite amount of memory by looking at
the data only once. These limitations support the
notion of incremental methods but not those that
grow indefinitely.
Finally, the model database contains serialised
model trees, which can be quite large. This may
limit the number of models that can fit into memory.
Further work is needed to explore other regression
algorithms that serialise smaller models.
REFERENCES
Evans, B., and Fisher, D. 2002. Using decision tree
induction to minimize process delays in the printing
industry. In W. Klosgen and J. Zytkow, editors,
Handbook of Data Mining and Knowledge Discovery,
Oxford University Press.
Langley, P., and Simon, H.A. 1995. Applications of
machine learning and rule induction. Communications
of the ACM, 38(11):54-64.
Slocombe, S., Moore, K., and Zelouf, M. 1986.
Engineering expert system applications. In Annual
Conference of the BCS Specialist Group on Expert
Systems.
Wang, Y. and Witten, I.H. 1997. Induction of model trees
for predicting continuous classes. In Proc. of the Poster
Papers of the European Conference on Machine
Learning. Prague: University of Economics, Faculty of
Informatics and Statistics.
Witten, I.H., and Frank, E.F. 2000. Data mining practical
machine learning tools and techniques with java
implementations. Morgan Kaufman, San Francisco.
AN INSTRUMENT CONTROL SYSTEM USING PREDICTIVE MODELLING
295