
2.2 Attention Based Models
Attention-based models have shown promise in ad-
dressing some of the limitations of CNNs in TSC.
Attention mechanisms allow models to dynamically
weigh the importance of different time steps in the
input sequence, which enables the models to focus
on critical moments that contribute significantly to
the classification task. Models like the Attention
LSTM Fully Convolutional Network (MALSTM-
FCN) (Karim et al., 2019) and TapNet (Zhang et al.,
2020) have been specifically designed to leverage
attention mechanisms in the context of multivariate
time series classification. MALSTM-FCN is an ex-
tension of the ALSTM-FCN (Karim et al., 2017)
model, proposed to deal with multivariate time series
data. It integrates attention within the LSTM frame-
work, allowing it to capture long-range dependencies
while simultaneously focusing on the most pertinent
time steps (Karim et al., 2019). This results in im-
proved accuracy and robustness in classifying com-
plex time series data. Similarly, TapNet (Zhang et al.,
2020) employs an attention mechanism to enhance
feature extraction from multivariate inputs, ensuring
that the model can adaptively learn which variables
and time steps are most influential for the classifica-
tion outcome. By focusing on the most relevant fea-
tures, the attention mechanism of TapNet helps im-
prove the model’s ability to distinguish between dif-
ferent classes, especially in scenarios where labeled
data is limited.
Attention-based models represent a significant ad-
vancement in the field of time series classification,
as they not only improve classification performance
but also enhance interpretability. As mentioned in
(Hsu et al., 2019), the attention weights offer valu-
able insights into the model’s decision-making pro-
cess, highlighting which time steps and features are
considered significant for specific classifications.
2.3 Transformers for MTS
Classification
Transformers are a more recent development in
attention-based models (Zerveas et al., 2021; Devlin
et al., 2019; Liu et al., 2021; Zhang et al., 2023).
Unlike previous attention mechanisms that were of-
ten paired with RNNs (such as ALSTM-FCN), Trans-
formers rely entirely on self-attention mechanisms.
The self-attention mechanism is a variant of the at-
tention mechanism which allows the model to weigh
the significance of different parts of the input rel-
ative to a specific position (Vaswani, 2017). This
mechanism enables the model to process all ele-
ments in a sequence simultaneously, which is cru-
cial for understanding context and semantics (Zhang
et al., 2023). This architecture allows Transform-
ers to capture long-range dependencies and global
context more effectively than CNNs or traditional
attention-augmented models. In the context of MTS
classification, the authors in (Liu et al., 2021) pro-
posed a transformer-based approach named ”Gated
Transformer Networks (GTN)”. This approach com-
bines the strengths of Transformer Networks with gat-
ing mechanism which merges two towers of Trans-
former networks and captures both channel-wise and
step-wise correlations in multivariate time series data
(Liu et al., 2021). Other transformer-based models
have been proposed in the context of MTS classifi-
cation such as (Zerveas et al., 2021) which proposed
a transformer-based framework for unsupervised rep-
resentation learning of multivariate time series and
(Yang et al., 2024) which proposes a transformer-
based dynamic architecture with a hierarchical pool-
ing layer to decompose time series into subsequences
representing different frequency components to fa-
cilitate time series classification. Although these
transformer-based architectures are effective, they are
complex and require large amounts of data for train-
ing. In addition, they involve an unsupervised learn-
ing phase to achieve optimal performance.
3 PROBLEM FORMULATION
In time series analysis, understanding the structure
and characteristics of the data is crucial before
tackling the problem of classification. Below, we
first define univariate and multivariate time series,
followed by a detailed explanation of the multivariate
time series classification problem.
A Univariate Time Series: is a sequence of ob-
servations collected over time from a single variable
or feature. Mathematically, it can be represented as
a one-dimensional vector x = (x
1
, x
2
, . . . , x
T
) ∈ R
T
,
where T denotes the sequence length (i.e., the num-
ber of time steps). Each value x
t
corresponds to the
observation at time step t ∈ 1, 2, . . . , T .
A multivariate time series (MTS), on the other
hand, consists of multiple variables or features
recorded simultaneously over time. Each sample in an
MTS dataset is structured as a two-dimensional vector
with a shape of (n f , T ), where n f denotes the number
of features (variables), and T indicates the sequence
length (time step). A data sample can be represented
as X = (x
1
, x
2
, . . . , x
n f
) ∈ R
n f ×T
, where each feature
vector x
i
= (x
i,1
, x
i,2
, . . . , x
i,T
) represents the sequence
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
420