Intelligent Computing techniques to Time Series
Analysis.
2.1.1 Time Series and their Processing
In the following paragraphs we offer a brief
introduction to time series and their processing. We
do not aim here to make a deep review, but to
establish the context in which the antecedents and
objectives of this research do make sense. For a
more detailed review on this topic, we point the
interested reader to any specific text such as (Box
and Jenkins, 2008) or (Han, Kamber and Pei, 2011).
In general terms, a time series is a series of
values of a n-dimensional variable x that depends on
time, that is {x(t), tϵT}. There are some differences
between the management of unidimensional and
multidimensional data series, that basically come
from the possibility of interdependences among the
components of x. Theoretically, T can be a
continuous interval, but in practice, time is always
considered as discrete. Therefore, the series can be
seen as a list of observations in several time instants.
The distance between those time points is fixed and
it will be determined by the particular problem.
Regarding variable x, x(t) could be the observation at
a time t or the average value during the interval [t-
1,t]. In any case, sampling the series is a crucial task
that has to be performed at the very beginning (Han,
Kamber y Pei, 2011). In the context of this research,
we will suppose that the series have already been
correctly sampled.
Time series are a special case of data series, that
are series of observations over a variable (generally
n-dimensional) indexed by the values of another
unidimensional variable. As it is discussed in (Pyle,
1999), even though we often refer to time series -as
they are the most common- everything about them
can be almost directly applied to data series in
general. In some cases, when the indexing variable
is time, it is just an index and it is not playing any
special role in the series, or inducing any
dependencies.
2.1.2 Objetives of Data Series Processing
Historically, five goals can be identified in the
analysis of data series. The first and the second are
the most widely considered in literature due to their
great practical interest. However, the borders among
them are not well defined, and many times several
are required to solve a particular problem. The
aforementioned objectives are the following:
Prediction of future values of the series.
Classification of the series, globally or
partially, in different categories.
Description of the series according to a model.
Description of the series according to the
values of other series.
Clustering and pattern discovery from time-
series data.
From a formal view point, the prediction task can
be seen as finding a function F which gives an
estimation x’(t+d) of the values of x at time t+d.
That estimation is made from the last k values of
x before t and other external factors, d. In other
words, x’(t+d)=F(x(t), x(t-1)...., x(t-k+1), d).
Usually, d=1, but depending on the concrete
application, a different value might be required.
Almost all traditional methods for time series
analysis and monitorization require F to be
stationary, that is, F only depends on t by means of
observations of x. In other words, F does not directly
depend on the index variable t. From this point of
view, the prediction is formally a problem of
approximating functions; and it that case, a suitable
technique from the static problem can be applied
(see (Garbancho, 1994) and (Duda and Hart 1973)).
As usual, the goodness of the fitting is measured by
means of an error function in the form E =
Σi=1,2,....N e(x’(t-i),x(t-i)), where e is a function that
measures the “difference” between the estimated
value and the observed one, x’(t-i) y x(t-i)
respectively. It is common for e to be a distance
measure, but it could be of other type for particular
applications (Dorffner, 1999).
So far, we have talked about determining the
value of an observation from the former
observations. A different problem is calculating the
value of an observation at a time t from other
observations (in other dimensions) at that same time
point. That is objective 4. To do so in an effective
way, it is mandatory that the different series are not
independent. In fact, a casual relation should exist
between them. The guessing of a value in a series
from the values in other series (casual prediction)
can be formally described, in its simplest version, as:
given the series x(t), y(t), z(t), ..., t=1,2,...,T, find G
so that x(t)=G(t, y(t), z(t),....., h).
Casual description of time series has been
handled as a variant of the task of predicting
multidimensional series. However, we consider that
this description possesses specific problems that are
quite interesting, and because of that, description
will play an important role within this research. In
particular, we believe that finding relations between
series associated with the same phenomena is of
great interest in a context of imperfection. This task
Pattern Characterization in Multivariate Data Series using Fuzzy Logic - Applications to e-Health
125