electroencephalogram exams, weekly cases of a
particular disease, among others; and meteorology
by measuring the daily temperature, the level of
precipitation, and others.
The objective of this work is to propose a
descriptor that can represent the series in a unique
form to facilitate handling and storage of data and
also to find a distance measure that, applied to
summary data, faithfully represents the distance
similarity between them to provide the execution of
similarity queries. In order to validate the proposed
method, meteorological data were used as case study
and the results showed the effectiveness of the
method for finding similarity in series.
The rest of the paper is organized as follows:
Section 2 briefly demonstrates concepts related to
similarity search in time series and the major related
work. The proposed method is explained in Section
3 and the experimental results are discussed in
Section 4. And, finally, in section 5, there are
conclusions.
2 REVIEW AND ANALYSIS
OF RELATED WORK
The representation of the series to facilitate their
knowledge extraction and makes their computation
handling easier, despite preserving the original data,
constitutes a master pillar to research in time series
analysis. In this section, we discuss the concepts
related to the implementation of the proposed
method and the main methods found in the literature
for similarity search in time series.
2.1 Time Series Analysis
A time series can be defined as an ordered sequence
of observations (Wei, 1990). The sorting based on
observation time is very important; however, it is not
only time that can be considered an index to the
measurements and for sorting this sequence any
other index can be used, such as space and depth.
Formally, a time series is a set of observations
{Y (t), t Є T}, wherein Y is the variable of interest,
and T is the set of indexes.
We can classify the series according to 3 basic
types with respect to the range of observations.
Being: i) a discrete series, if the observations are
made at selected times that are generally regular,
T = {t
1
, t
2
,..., t
n
}; ii) a continuous series when
observations are continuous in time and T = {t: t1 <t
< t2 } and; iii) present several multivariate
observations for a common time Y
1
(t), . . . , Y
k
(t), (t
Є T).
The series may be described using their basic
components, which are: trend, cycle and seasonal
(Morettin, 1987) and (Fukunaga, 1990).
Thus, with the analysis of the components and
features of the series, it is possible to analyze their
contents, having as objectives:
Describing the series showing its constitutive
properties such as trend, seasonal, among others;
Understanding the mechanism enabling the series
to find the reasons for their behaviour;
Predicting future values, using data and past
behaviours and also forecasting methods;
Getting control over the process that generates the
observations and thus ensuring that the series has
an expected behaviour.
Moreover, obtaining the relevant characteristics
to the series, one can discover and visualize patterns
in the series, detect anomalies, identify gaps or
similar series, generating clusters, association rules,
among other activities in which the characteristics of
the obtained series can be used as guiding for pattern
identification.
Another important factor to consider in analyzing
series is the reduction of dimensionality. A time
series may be considered a data sequence in which
each point has a given a size (or length) n and if this
is reduced to a dimension k, with k << n, it implies
in reducing the computational complexity of O(n) to
O(k).
2.2 Query by Similarity
Due to the large variability in the data series, it is
almost impossible to find exactaly equal intervals. In
this context, the concept of similarity has wider
applicability than equality.
For the execution of similarity queries, it is
necessary to have a means of measuring the amount
of similarity or dissimilarity between two objects
belonging to the domain, so that the objects are
represented in a metric space.
A metric space M is defined by the pair {S, d},
where S designates the data field and d is a distance
function. This function provides the measure that
expresses how similar or dissimilar an object is from
another one. (Bozkaya, 1999)
For the application of distance functions into
complex data, it is commonly used inherent
characteristics to represent the data instead the
original data itself. These extracted features form the
feature vector.
ICEIS2013-15thInternationalConferenceonEnterpriseInformationSystems
210