length, which is hard to be predefined for the
different kinds of time series, and focus on the
simple statistical features, which only capture the
aggregation characteristics of time series. For
example, PAA and APCA extract the mean values,
PLA extracts the linear fitting slopes, and DSA
extracts the mean values of the derivative
subsequences. If PA-DTW is computed on these
methods, its precision would be influenced.
In this paper, we propose a novel piecewise
factorization model for time series, named piecewise
Chebyshev approximation (PCHA), where a novel
code-based segmentation method is proposed to
adaptively segment time series. Rather than focusing
on the statistical features, we factorize the
subsequences with Chebyshev polynomials, and
employ the Chebyshev coefficients as features to
approximate the raw data. Besides, the PA-DTW
based on PCHA (ChebyDTW) is proposed for the
1NN classification. Since the Chebyshev
polynomials with different degrees represent the
fluctuation components of time series, the local
fluctuation information can be captured from time
series for the ChebyDTW measure. The
comprehensive experimental results show that
ChebyDTW can support the accurate and fast 1NN
classification.
2 RELATED WORK
2.1 Data Representation
In many application fields, the high dimensionality
of time series has limited the performance of a
myriad of algorithms. With this problem, a great
number of data approximation methods have been
proposed to reduce the dimensionality of time series
(Esling et al, 2012; Fu, 2011). In these methods, the
piecewise approximation methods are prevalent for
their simplicity and effectiveness. The first attempt
is the PAA representation (Keogh et al, 2001),
which segments time series into the equal-length
subsequences, and extracts the mean values of the
subsequences as features to approximate the raw
data. However, the extracted single sort of features
only indicates the height of the subsequences, which
may cause the local information loss. Consecutively,
an adaptive version of PAA, named piecewise
constant approximation (APCA) (Chakrabarti et al,
2002), was proposed, which can segment time series
into the subsequences with adaptive lengths and thus
can approximate time series with less error. As well,
a multi-resolution version of PAA, named MPAA
(Lin et al, 2005), was proposed, which can
iteratively segment time series into 2
i
subsequences.
However, both of the variations inherit the poor
expressivity of PAA. Another pioneer piecewise
representation is the PLA (Keogh et al, 2004; Keogh
et al, 1999), which extracts the linear fitting slopes
of the subsequences as features to approximate the
raw data. However, the fitting slopes only reflect the
movement trends of the subsequences. For the time
series fluctuating sharply with high frequency, the
effect of PLA on dimension reduction is not
prominent. In addition, two novel piecewise
approximation methods were proposed recently. One
is the DSA representation (Gullo et al, 2009), which
takes the mean values of the derivative subsequences
of time series as features. However, it is sensitive to
the small fluctuation caused by the noise. The other
is the PWCA representation (Li et al, 2011), which
employs the cloud models to fit the data distribution
of the subsequences. However, the extracted features
only reflect the data distribution characteristics and
cannot capture the fluctuation information of time
series.
2.2 Similarity Measure
DTW (Esling et al, 2012; Fu, 2011; Serra et al,
2014) is one of the most prevalent similarity
measures for time series, which is computed by
realigning the indices of time series. It is robust to
the time warping and phase-shift, and has high
measure precision. However, it is computed by the
dynamic programming algorithm, and thus has the
expensive O(n
2
) computational complexity, which
largely limits its application to the high dimensional
time series (Rakthanmanon et al, 2012). To
overcome this shortcoming, the PA-DTW measures
were proposed. The PAA representation based
PDTW (Keogh et al, 2000) and the PLA
representation based SDTW (Keogh et al, 1999) are
the early pioneers, and the DSA representation based
DSADTW (Gullo et al, 2009) is the state-of-the-art
method. Rather than in the raw data space, they
compute DTW in the PAA, PLA, and DSA spaces
respectively. Since the segment numbers are much
less than the original time series length, the PA-
DTW methods can greatly decrease the
computational complexity of the original DTW.
Nonetheless, the precision of PA-DTWs greatly
depends on the used piecewise approximation
methods, where both the segmentation method and
the extracted features are crucial factors. As a result,
with the weakness of the existing piecewise
approximation methods, the PA-DTWs cannot