The objective here is to handle the high-
dimensional, complex data that is common in mod-
ern sensed systems, and still detect change point that
might occur in only one (or a few) variable among
hundreds. Consequently, we present a two-phased ap-
proach. In the first phase we identify the attributes re-
sponsible for the change point. With a much smaller
subset of attributes to work with in the second phase,
simpler methods can be used to identify the time(s)
at which the change(s) occur. The first phase uses a
novel transformation of the problem to one of super-
vised learning. Such a transformation was explored
by (Li et al., 2006). The work here adds a second
phase, uses a much more powerful feature selection
algorithm, and provides a more challenging example.
In Section 2 the change-point problem is transformed
to a supervised learning problem. Section 3 discusses
feature selection. Section 4 provides a realistic exam-
ple.
2 CHANGE POINTS WITH
SUPERVISED LEARNING
A supervised learning model requires a response or
target variable for the learning. However, no ob-
vious target is present in a change-point problem.
Still, a key element of a data stream is the time at-
tribute that provides an ordering for the measured vec-
tors. In a stationary data stream without any change
points, no relationship is expected between time and
the measured attributes. Conversely, if the distribu-
tion changes over time, such change should allow for
a relationship to be detected between the measured at-
tributes and time (Li et al., 2006). Consequently, our
approach is to attempt to learn a model to predict time
from the measurements in the data stream
t = g(x
1
, ..., x
p
) (1)
where t is the time of an observation vector and g()
is our learned model. If the time attribute can be pre-
dicted, a change in the measurement vectors is avail-
able to predict. Attributes that are scored to be im-
portant to this model are the subset of important vari-
ables. Consequently, phase one of our analysis can be
completed from this model and its interrogation. Any
number of change points can occur in this framework.
A more direct approach might attempt to model
each attribute as a function of time such as x
j
= g(t)
for j = 1, 2, . . . , p. However, separate models do
not use the relationships among the variables. A
change might break the relationships between vari-
ables within a significance difference in each variable
individually. Common examples in data streams de-
pict points that are not unusual for any attribute indi-
vidually, but jointly depict an important change.
Any monotonic function of time can be used as
the target attribute for the learner. The identify func-
tion used here is a simple choice and other functions
can be used to highlight or degrade the detection of
change points in different time periods. Also, any one
of many supervised learners can be applied. Our goal
is to detect a subset of important variables and this is
the primary purpose for our following selection.
Because we are most interested in an abrupt
change in the mean of one or more attributes in the
data stream it is sensible to use a supervised learner
that can take advantage of such an event in the sys-
tem. Furthermore, the phase one objective is to iden-
tify the important variables. Consequently, decision
trees are used as the base learners because they can
effectively use a mean change in only one or few pre-
dictor attributes. They also have intrinsic measures of
variable importance. Ensembles of decision trees are
used to improve the measure of variable importance
for the phase one objective.
3 FEATURE SELECTION
If an attribute changes over time, it should be more
useful to predict time than an attribute that is sta-
tistically stable. Consequently, the phase to iden-
tify changed attributes is based on a feature selection
method for a supervised learner. There are several ap-
proaches such as filter, wrapper, and embedded meth-
ods. An overview of feature selection was provided
by (Guyon and Elisseeff, 2003) and other other pub-
lications in the same issue. Also see (Liu and Yu,
2005). The feature selection phase needs to process
hundreds of attributes and potentially detect a contri-
bution of a few to the model to predict time. Fur-
thermore, in the type of applications of interest here,
the attributes are often related (redundant). Conse-
quently, the effect of one attribute on the model can
be masked by another. Moderate to strong interactive
effects are also expected among the attributes. Conse-
quently, a feature selection methods need high sensi-
tivity and the ability to handle masking and interactive
effects. We use a feature selection methods based on
ensembles of decision trees.
Tree learner are fast, scalable, and able to handle
complex interactive effects and dirty data. However,
the greedy algorithm in a single tree generates an un-
stable model. A modest change to the input data can
make a large change to the model. Supervised ensem-
ble methods construct a set of simple models (called
ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics
360