namely true and false. The inner zone adjacent to
that frontier, specifically that of the edges and
vertices of the polyhedral surface, is the risk
decision zone or the caution zone, and here is where
the frontier must be redefined. An accumulation in
single vector of several variables with values that do
not exceed the hazard maximums but which are near
them, as would be the case with state vectors in the
caution zone, may belong – in principle, at the
judgment of the expert – to a category other than the
one it would be found owing to its position with
respect to the polyhedral frontier (figure 1).
Figure 1: 2-dimensional depiction of natural frontier.
Having modelled the problem in this way,
consideration was given to the method that should
be used to solve it, and we decided to rule out
conventional models based on analytic mathematical
models – i.e., a formula to determine risk – due
mainly to the large degree of subjectivity used by
experts in assessing risk.
Consequently, we decided to use one of the
existing systems with the capacity for supervised
inductive learning. The system should learn from
state vectors that reflect past situations that have
been classified by an expert according to the risk
they entailed. The classification model provided by
the system would induce classification for state
vectors that were not necessarily included in the
learning process; that is, it would neatly trace the
new frontier in the caution zone based on the expert
decisions for the state vectors in the past.
An activity in a given instant in the maritime
work will be identified with a state vector to which a
Boolean class variable will be added with the
possible values of true or false. The new state vector
shall be n-dimensional, where n-1 is the number of
variables that have been defined to assess the risk in
that activity y la n-th the special class variable. An
example or case will be a specific state vector.
Measurements generated by examples are commonly
made at one-hour intervals. Examples that will be
used to train the system will have a special variable
value that classifies each as: true, a situation of high
risk, or false, when the risk is low or at least
acceptable. Classification of these examples will
have been performed – or at least supervised – by an
expert. With a database with this vector type as
entries, learning systems extract models that enable
subsequent classification of new cases. Models are
abstractions of structural patterns that present
vectors classified in one class against those
classified with another: that is, systems will learn to
distinguish high-risk situations from low-risk ones
by using the knowledge accumulated in the learning
process and retained as a model.
The abundance of learning systems means that
multiple solutions or models are possible; usually
more than one per system, as these offer parameters
that, according to their settings, make the system
produce different solutions. An important task shall
be to decide what system of learning and what set of
parameters to use, in addition to studying the
suitability of the variables used and perhaps
reducing or increasing the number of them; in short,
a good job of data mining is needed, (Wittten et al.,
2005).
Following these considerations, discussions and
the pertinent tests, we decided to pre-select two
systems of supervised inductive learning for trials
and a more thorough comparison in our problem:
these were C4.5 (Quinlan, 1993) and Support Vector
Machines (SVMs, hereinafter) (Cortés, Vapnik,
1995), (Cristianini, Shawe-Taylor, 2004).
Conceptually, these systems are quite different:
while the first is based on a heuristic approach, the
second is grounded in a whole mathematical theory
to explain its method. We will now provide a brief
description of each.
3.1 The C4.5 System
C4.5 is a traditional automatic learning system that,
however, remains fully valid (Jaudet et al., 2005),
and needs no introduction. For this paper, its main
feature is that it produces the knowledge learned in
an explicit form, by means of a decision tree or
classification rules; in both cases, these are
comparable to the experience of an expert in the
field, an aspect of the utmost interest to us. C4.5
works with both qualitative and quantitative
variables and is powerful when faced with noise.
C4.5 incrementally generates a decision tree; each
new level is originated by a variable that is selected
for its importance in determining class.
O
X-safety
margin
Polyhedral
frontier
Natural
frontier
Y-safety
margin
X
Y
ICEIS 2009 - International Conference on Enterprise Information Systems
140