arm is the corresponding MI task. The models trained
in this work can be used to recognize EEG signals for
BCI as well as for the diagnosis of neurological dis-
orders by learning patterns in the EEG MI task data.
There has been some significant work in the field
of EEG MI task classification using deep learning re-
cently. Some of the methods used for these classifica-
tion tasks have a consistent pattern in the use of pre-
processing techniques as well as the methodology for
the classification process. Whenever the dataset in-
cludes a significant number of subjects, it appears
there is minimal need for preprocessing. There is also
a consistent use of pattern recognition methods that
use both spatial and temporal pattern learning tech-
niques in a fusion architecture.
Roots et al. worked with the BCI Competition IV
dataset with 103 subjects (Roots, Muhammad, & Mu-
hammad, 2020). They used bandpass and notch filters
on their time series data and used a fusion architecture
to classify MI Right Fist versus MI Left Fist. Their
model, which uses fusion of spatial and temporal fea-
tures achieved 83% validation accuracy for the binary
model. Wang et al. used the PhysioNet dataset for
their 2-class, 3-class and 4-class classification models
(Wang, et al., 2020). This work used no preprocessing
on the full 109 subject dataset. Their model was based
on the EEGNet structure. It used Conv2D layers to
learn spatiotemporal information with fusion struc-
ture. Their models achieved 75.07% and 82.50% val-
idation accuracy on the 3-class and 2-class models re-
spectively on MI Right Fist, MI Left Fist and MI Feet
labels. Dose et al. also used the full PhysioNet dataset
with 109 subjects (Dose, Møller, & Iverson, 2018).
They used no preprocessing method either. Their
model was trained on the global dataset and then fine-
tuned for each subject separately. Their 3 class clas-
sification had 68.82% validation accuracy while their
binary classification had 80.38% accuracy on their
global classifier.
In this study, we used the EEG Motor Move-
ment/Imagery Dataset that is a collection of 14 exper-
imental runs (Schalk, McFarland, Hinterberger,
Birbaumer, & Wilpaw, 2004). Each run was a motor
imagery recording performed by 109 subjects. This
dataset provides more than 1,500 such EEG record-
ings and is considered the largest EEG motor move-
ments and imagery dataset available (Goldberger, et
al., 2000).The subjects’ brain activity was recorded
while performing each of the four tasks:
1. Open and close the right or left fist
2. Imagine opening and closing the right or left fist
3. Open and close both fists or both feet
4. Imagine opening and closing both fists or both
feet
This paper is organized as follows: the previous
section introduced the problem, described the dataset
and explained some related work performed in the area
of EEG task classification; followed by the next chap-
ter that goes over the tranformation of raw EEG signals
into 3 dimensional image sequences representing each
MI task. The next chapter also describes the structure
of the multi-view hierarchical fusion model. The third
chapter goes over the results and discussions. Finally
the last chapter draws a conclusions and describes
some possible future direction for this work.
2 METHODS
2.1 Creating 2D Spatiotemporal EEG
Image Sequences
The raw EEG signals consist of multiple 1D time se-
ries data that show the electrical activity at specific
locations on the skull. The placement of the elec-
trodes is based on the international 10-10 system as
shown in Figure 1.
This collection of 1D series data is then trans-
formed into a time series of 2D data. The signal ac-
quired over a period [t, t+N] from each channel of the
EEG system can be represented by
E
i
=ൣc
t
i
, c
t+1
i
, c
t+2
i
, c
t+3
i
, …, c
t+N
i
൧ (1)
where i is the index of the channel and is the EEG
data acquired from the ith channel at time t. EEG data
collected from n number of channels over a period [t,
t+N] can be represented by matrix S as provided in
Figure 2. Each row of the matrix S is corresponding
to EEG data collected from a single channel over the
period [t, t+N], and each column of the matrix S is
corresponding to EEG data collected through all
channels at time t.
These new spatiotemporal images were created by
transforming each column matrix S into a 2D image,
as shown in Figure 2. This was done by mapping c
t
i
to
c
t
n
into a 9x9 matrix based on the actual location of
the electrodes on the head where the data was ac-
quired, as shown in Figure 1. This is the standard 10-
10 system of placement of electrodes for recording
EEG data. For example, the data acquired from the
first channel at time t is placed in the 3rd row and the
2nd column of the matrix S, which is the same loca-
tion where the first electrode is placed on the skull. In
the same figure, the pixel values marked as x are
empty values as there are no electrodes corresponding
to them. These are placeholders. This transformation
process is illustrated in Figure 2.