Uniﬁed CNN-Transformer Model for Mental Workload Classiﬁcation

Using EEG

Fiza Parveen

and Arnav Bhavsar

Indian Institute of Technology Mandi, Mandi, India

Keywords:

Mental Workload Classiﬁcation, EEG, Convolutional Neural Network, Attention Mechanism, Transformers,

Ensemble Majority Voting.

Abstract:

The cognitive effort required for tasks requiring memory, attention, and decision-making is referred to as

mental workload. Preventing cognitive overload and increasing task efﬁciency rely on a reliable assessment

of mental workload. In this study, we present a CNN-Transformers hybrid model that uses EEG data for

multi-level Mental Workload classiﬁcation. Our model uses 1D-CNN to extract spatial features from win-

dowed EEG signals followed by Transformers to capture temporal correlation.This combination improves our

comprehension of mental workload situations by capturing local spatial and both long-range temporal aspects.

We use a majority voting technique on the window based predictions to increase prediction reliability, making

sure the ﬁnal accuracy represents a thorough assessment of mental workload at signal level. A rigorous 5-fold

cross-validation technique is used to evaluate the model on publicaly available STEW dataset.

1 INTRODUCTION

Mental workload refers to the cognitive effort re-

quired for executing tasks that involve attention,

memory, and decision-making processes. Assess-

ing cognitive stress particularly in real-time, is cru-

cial for enhancing task efﬁciency and avoiding cog-

nitive overwhelm. Electroencephalography (EEG) is

commonly employed for this objective because of

its capacity to record brain activity in real-time with

a high level of temporal precision. Machine learn-

ing and deep learning techniques, including Convo-

lutional Neural Networks (CNNs), have been effec-

tively used to classify EEG data for assessing men-

tal workload. CNNs have shown encouraging perfor-

mance in diverse applications related to EEG data in-

terpretation. They have a major advantage in being

able to automatically extract spatial characteristics

from raw EEG data by acquiring intricate hierarchical

patterns within the signals. CNNs are very proﬁcient

at capturing spatial hierarchies and local dependen-

cies in EEG signals (Schirrmeister et al., 2017).

Nevertheless, CNNs encounter substantial obsta-

cles when used with time-series data, despite their

notable capabilities. The biomarkers of mental work-

https://orcid.org/0009-0003-2672-4656

https://orcid.org/0000-0003-2849-4375

load may involve not only the local spatial patterns of

brain activity but also the temporal evolution of these

patterns. EEG signals possess an inherent sequential

nature, characterized by long range temporal depen-

dencies that manifest from the dynamic characteris-

tics of cognitive processes. Consequently, CNNs may

have difﬁculties in comprehending the sequential na-

ture of EEG data, which is crucial for appropriately

classifying levels of mental workload, especially in

activities that need constant monitoring and real-time

evaluation (Wang et al., 2020).

In order to address the limitations of CNNs in cap-

turing long range temporal relationships, researchers

have progressively shifted towards employing trans-

former models. Transformers, ﬁrst designed for nat-

ural language processing, have fundamentally trans-

formed sequence modeling through the introduction

of the self-attention mechanism. The mechanism de-

scribed enables transformers to effectively capture re-

lationships that span across all time steps in a se-

quence, resulting in a thorough comprehension of

temporal dynamics (Vaswani et al., 2017). Trans-

formers have been shown reliability in classifying

cognitive states by considering several time periods

in the sequence and modeling how they evolve over

time (Roy et al., 2019).

Furthermore, the capacity of Transformers to han-

dle sequences concurrently, in contrast to the sequen-

928

Parveen, F. and Bhavsar, A.

Uniﬁed CNN-Transformer Model for Mental Workload Classiﬁcation Using EEG.

DOI: 10.5220/0013190900003911

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2025) - Volume 1, pages 928-934

ISBN: 978-989-758-731-3; ISSN: 2184-4305

tial operation of RNNs, provides a notable computa-

tional beneﬁt. The capacity of transformers to per-

form parallel processing allows for quicker training

and inference, making them highly suitable for real-

time applications that require rapid and precise evalu-

ations of mental exertion (Vaswani et al., 2017).

Combining transformers with CNNs has shown

signiﬁcant promise in improving the effectiveness of

mental workload classiﬁcation models. This hybrid

approach utilizes CNNs to extract spatial character-

istics from EEG data. CNNs are particularly effec-

tive in capturing local patterns and spatial hierarchies

which can lead to ﬁltered embedding instead of us-

ing raw signals. The spatial features from CNN,

are subsequently fed into a transformer, which ana-

lyzes the sequence of spatial features and captures the

temporal relationships over the entire series (Huang

et al., 2019). This combination enables the model to

leverage the advantages of both CNNs and transform-

ers: CNNs offer a good quality spatial representation,

while transformers guarantee correct capture of tem-

poral correlations. A hybrid model is more capable of

managing the intricacies of EEG signals, resulting in

enhanced accuracy in classiﬁcation and better gener-

alization across various tasks and people.

2 RELATED WORK

In this article, we examine recent studies that focus on

classifying mental workloads. The research by (Lim

et al., 2018) introduces the STEW dataset, which is

utilized in our analysis. This dataset is categorized

into three distinct classes: low, moderate, and high

workloads, based on ratings from 45 participants us-

ing a scale from 1 to 9. Power Spectral Density

(PSD) features are derived using Fast Fourier Trans-

form (FFT) across the delta, theta, alpha, and beta

frequency bands, applying a sliding window tech-

nique. The data is divided into training and testing

sets, allocating 80% (36 subjects) for training and

20% (9 subjects) for testing. Neighborhood Compo-

nent Analysis (NCA) is conducted to select features

from the training dataset for regression, utilizing 5-

fold cross-validation. The top 75% of features iden-

tiﬁed across all folds are used in Support Vector Re-

gression (SVR), which predicts ratings on the test data

that are later transformed into class labels.

In (Parveen and Bhavsar, 2023), the authors clas-

sify mental workloads into two and three levels. For

three-class classiﬁcation, the data is segmented into

three mental workload categories based on the ratings

provided by subjects. Additionally, the dataset is split

into training and testing sets in a 9:1 ratio. After pre-

processing, the data is divided into windows of 512

samples with an overlap of 128 samples. These win-

dowed are input into a 1D-CNN network that includes

multiple horizontal and vertical CNN blocks for ex-

tracting spatial and temporal features from the EEG

data. A majority voting technique is employed to ob-

tain results for the entire signal rather than individual

windows. The model is validated using a 5-fold cross-

validation method.

In a different study by (Sharma and Ahirwal,

2021), the authors classify the STEW dataset into 2

and 3 classes based on ratings. For three-class classi-

ﬁcation, the dataset is divided into low, moderate, and

high workload categories. The low workload class

consists of respondents with ratings between 4 and

5, the moderate workload class includes subjects with

ratings between 6 and 7, and the high workload class

comprises individuals with ratings between 8 and 9.

The data is augmented by employing a window size of

512 with an overlap of 128 samples, generating a to-

tal of 6,615 samples. The classiﬁcation process splits

the data into training (5,622 samples) and testing sets

(993 samples), maintaining a ratio of 85% for train-

ing and 15% for testing. A cascaded 1D-CNN and

Bidirectional Long Short-Term Memory (Bi-LSTM)

model are used for classiﬁcation, utilizing both 5-fold

and 7-fold cross-validation techniques.

In another study (Kingphai and Moshfeghi, 2023),

a sliding window approach with a size of 512 and an

overlap of 128 samples is utilized. Power Spectral

Density (PSD) is calculated for the alpha and theta

frequency bands. These features are extracted from

all 14 channels, resulting in a total of 84 features (14

channels multiplied by 6 features). A Bidirectional

Gated Recurrent Unit (BGRU-GRU) model is applied

to the time series EEG data, using the Rolling and

Expanding Window method for analysis.

Recent work demonstrated that transformers can

greatly improve the ability of EEG-based models to

analyze and understand changes over time, resulting

in more detailed and dependable evaluations of men-

tal workload (Sun et al., 2021). (Roy et al., 2019)

showed that a model built on transformers could sur-

pass typical recurrent neural networks (RNNs) in cap-

turing long-term relationships in EEG data, leading to

more accurate estimations of cognitive load. The pa-

per (Siddhad et al., 2024) explores the application of

transformer networks for classifying raw EEG data,

aiming to enhance accuracy while reducing the need

for extensive pre-processing. The study employed a

transformer model with a multi-head attention mech-

anism, embedding techniques, and positional encod-

ing. The STEW data, minimally pre-processed and

segmented into 0.25-second epochs, was used to train

Uniﬁed CNN-Transformer Model for Mental Workload Classiﬁcation Using EEG

929

the transformer network. The model incorporated em-

bedding techniques to represent the input data and

positional encoding to maintain the order of the se-

quences. The transformer network was trained on the

raw EEG data for classiﬁcation tasks.

In another work (Wang et al., 2024), the ap-

proach focus on the development of an Attention-

Based Recurrent Fuzzy Network (ARFN) to assess

mental workload using EEG data. EEG signals are

ﬁrst processed through a fuzzy recursive module that

captures temporal dependencies, then passed into an

Long Short-Term Memory (LSTM), which produces

a feature vector. This vector is averaged across time

steps and fed into a fully connected layer with a Soft-

max function for classiﬁcation. The model is trained

using a Mean Squared Error (MSE) loss function with

a regularization term to prevent overﬁtting, and the

optimization is handled by the Adaptive Moment Es-

timation (ADAM) algorithm. Performance was eval-

uated on STEW using k-fold cross-validation.

3 DATA DESCRIPTION

This study utilizes the open access STEW (Lim

et al., 2018) dataset, which includes EEG record-

ings of rest and multitasking mental workload activ-

ity. The recordings were obtained utilizing the Vi-

enna Test System (Bratﬁsch and Hagman, 2008) dur-

ing a single-session simultaneous capacity (SIMKAP)

experiment involving 48 male participants. The

SIMKAP activity is speciﬁcally developed to assess

an individual’s competence in effectively handling

numerous tasks in a challenging work environment.

The experiment is divided into two segments. At

the beginning, the participants are provided with spe-

ciﬁc instructions to adopt a calm position and en-

sure that their vision is not blocked for a period of

3 minutes. They are also instructed not to partici-

pate in any activities or tasks during this time. After-

wards, the participants engaged in the SIMKAP task,

in which they were directed to eliminate matching

panels, while also responding to auditory questions

that involved arithmetic, comparison, or data retrieval.

Certain auditory inquiries may require the subject to

delay their response, forcing them to keep an eye on

a clock in the upper right corner.

The task has a duration of three minutes. The se-

quence of questions in the challenge remains consis-

tent across all subjects. EEG was measured through-

out both the period of rest and the period of task per-

formance. To minimize the effect of any intermediate

task activity, the initial and concluding 15 seconds of

data from each recording are excluded. Out of the

48 participants, 45 individuals provided input on both

rest and task difﬁculty using a scale that ranged from

1 to 9. A numerical value ranging from 1 to 3 in-

dicates a workload that is considered low, whereas a

value ranging from 4 to 6 indicates a workload that is

considered moderate, and a rating ranging from 7 to 9

indicates a burden that is considered high.

The EEG data is acquired utilizing the Emotiv

EPOC EEG headset, which possesses a sampling fre-

quency of 128 Hz and a resolution of 16 bits analog-

to-digital conversion. The gadget is outﬁtted with 14

electrodes that are positioned according to the 10-20

international system at speciﬁed locations: AF3, F7,

F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4.

In our study, which focuses on rating based com-

parisons between low, medium, and high levels, we

have excluded data from subjects S05, S24, and S42

due to the unavailability of ratings for these individu-

als.

4 METHODOLOGY

4.1 Pre-Processing

Pre-processing of EEG signals is an essential step

that improves the quality of the signals and stream-

lines data analysis. The accuracy and dependability

of the model may be jeopardized by noise and ﬂuc-

tuations in the EEG data. EEG consists of discrete

frequencies, spatial patterns, and associations to dif-

ferent states of brain activity.

The commonly employed frequency ranges in-

clude delta (0.5 to 4 Hz), theta (4 to 7 Hz), alpha (8

to 12 Hz), and beta (13 to 30 Hz) (Lim et al., 2018).

These frequency ranges correspond to speciﬁc brain

states, including deep sleep, alertness, and anxiety.

Several ﬁltering techniques are available for removing

artifacts, as shown in references (Chandrashekar and

Sahin, 2014), (Nitschke et al., 1998), (Iriarte et al.,

2003), (Lai et al., 2018). In this study, we utilize a 6th

order Butterworth ﬁlter to isolate frequencies ranging

from 1 to 60 Hz

Furthermore, we utilize the ADJUST algorithm

(Mognon et al., 2011) to reduce any artifacts. Over-

all, we implement the following sequential proce-

dures during the pre-processing phase (Parveen and

Bhavsar, 2023).

• EEG signals are typically recorded with respect to

one or more reference electrodes. In this study,

we utilized the technique of average referenc-

ing, which involves calculating the mean value of

each channel and then subtracting it from all data

points corresponding to that channel.

BIOSIGNALS 2025 - 18th International Conference on Bio-inspired Systems and Signal Processing

930

• EEG signals comprise a wide range of frequency

components that correspond to different types of

brain activity. We utilized a 6th order Butterworth

band-pass ﬁlter, especially an inﬁnite impulse re-

sponse (IIR) ﬁlter, with a frequency range of 1-60

Hz. The strong cutoff in the stopband of this ﬁlter

effectively reduces low frequency drifts and high

frequency noise. We encompass the entire range

of frequencies, spanning from delta to gamma.

• To mitigate the disruption produced by line noise,

we utilize a 50 Hz notch ﬁlter on our EEG data.

• Eye blinks, muscular movements, and ambient

inﬂuences are examples of EEG artifacts that

come from sources besides brain activity. Arti-

fact removal is a crucial step in the processing of

EEG signals. This study employed the ADJUST

method (Mognon et al., 2011), which utilizes in-

dependent component analysis (ICA) to segregate

the EEG data into discrete components. Every el-

ement symbolizes a unique neural source of activ-

ity, some of which might involve artifacts.

4.2 Windowing

After pre-processing, we utilize sliding windows con-

sisting of 512 samples, with an overlap of 128 sam-

ples, over the entire duration of EEG data from each

participant. Each participant contributed a total of 147

windows, with each window measuring 14 units in

height and 512 units in width. We then divided the

data into three workload categories according to the

evaluations provided by 45 participants. We catego-

rized the EEG data according to the ratings, assign-

ing a label of ”low workload” to data with ratings 1-

3, ”moderate workload” to data with ratings 4-6, and

”high workload” to data with ratings 7-9. If the work-

load is moderate, we acquire 6,174 windows of EEG

data, with each window having dimensions of 14 x

512. When the workload is modest, we have a grand

total of 3,381 windows. Each window has dimensions

of 14 units in height and 512 units in width. In situ-

ations of high workload, we possess a grand total of

3,675 EEG samples that are windowed. These sam-

ples have dimensions of 14 x 512.

4.3 Model Description

The proposed framework, Figure 1, for classifying

mental workload combines the advantages of a Con-

volutional Neural Network (1D-CNN) with a Trans-

former network to accurately capture both local and

global relationships in the data. The model adheres

to a systematic pipeline to handle the incoming data,

extract pertinent features, and categorize the mental

workload into one of three types. Here is an elaborate

explanation of each constituent and its function in the

structure of the model.

4.3.1 CNN Based Features

In the initial phase of the model, an attention-based

1D-CNN is used to extract hierarchical features from

both the temporal and spectral dimensions, effectively

capturing the underlying patterns in the data. Max

pooling is utilized subsequent to the last convolu-

tional layer in order to decrease the size of the feature

maps, hence minimizing the computational complex-

ity while preserving the most essential information.

The extracted features exhibited the shape (11,613 x

3 x 252 x 50) and (1,617 x 3 x 252 x 50) for train

and test dataset respectively. In order to ensure com-

patibility with our proposed transformer network, the

retrieved features were reshaped by manipulating the

feature tensor. The tensor was reshaped to conform

to the input speciﬁcations of the transformer. Specif-

ically, it was transformed into the format (samples,

sequence length, feature dimension), where sequence

length is the result of multiplying the number of chan-

nels by the number of time samples, where sequence

length corresponds to the time samples, and feature

dimension is the product of channels and CNN ﬁlters,

resulted in features of shape (252x150). This reshap-

ing ensured that each time sample was represented by

a feature vector that encapsulated information across

all channels and CNN ﬁlters, making it suitable for

processing by the transformer network and feature di-

mension corresponds to the number of CNN ﬁlters.

The Transformer model was trained using the

following feature shapes (11,613 x 252 x 150) and

(1,617 x 252 x 150) for train and test respectively.

4.3.2 Transformer Block

We utilize only the encoder part of the Transformer in

order to capture information within the EEG signals.

Presence of a Multi-Head Attention block in the trans-

former network enables the model to simultaneously

focus on distinct segments of the sequence. In the

model, residual connections are also used which also

alleviate the issue of disappearing gradients and fa-

cilitate the propagation of gradients during backprop-

agation. Also the use of Layer Normalization makes

sure that the inputs for each layer is normalized, hence

enhancing the stability and speed of the training pro-

cess.

Following the Transformer block, the features un-

dergo ﬂattening and then pass through a sequence of

fully connected layers. These layers have the purpose

of adding another level of abstraction to the informa-

Uniﬁed CNN-Transformer Model for Mental Workload Classiﬁcation Using EEG

931

Figure 1: Proposed CNN-Transormer model.

tion and getting it ready for the classiﬁcation task.

The output from the fully connected layer goes to a

softmax function, resulting in a probability distribu-

tion across the three classes.

4.3.3 Window Based Majority Voting

EEG is non-stationary in nature. Therefore to avoid

computing embeddings on complete signals, we uti-

lize window based estimates. The model produces

several predictions for each participant by using slid-

ing windows over the EEG signals. However, it is

more realistic to have the ﬁnal result on entire signal.

So, we employ a majority voting approach to deter-

mine the ﬁnal forecast based on the collective predic-

tions made across the windows for each EEG signal.

4.3.4 Summary of the Overall Approach

In summary the proposed model follows the following

sequence of steps:

• The model receives EEG segments that are dis-

played in separate windows.

• A 1D-CNN is utilized to extract local temporal in-

formation from the EEG inputs.

• Features are extracted from the last convolutional

layer following the process of max-pooling.

• The retrieved features are transformed into a suit-

able format for sequence processing.

• The modiﬁed characteristics undergo processing

in the Transformer block, which incorporates

multi-head attention, residual connections, layer

normalization, and a feed-forward network.

• The result of the Transformer block is ﬂatten and

fed to fully connected layers.

• A softmax layer generates probability distribu-

tions for each class.

• The ﬁnal class label for each subject is determined

by performing majority voting on the window-

level estimates.

5 EXPERIMENTS

For the three-class classiﬁcation, we divided the data

into training and testing datasets with a 9:1 ratio be-

tween them. Following this, windowing has been

done on both the training and testing data, yielding

a combined total of 11,613 training EEG samples and

1,617 testing EEG samples.

This validation process employed a 5-fold cross-

validation technique.

6 RESULTS

The results of classiﬁcation is presented in Table 1.

For three class level mental workload classiﬁcation,

by integrating CNN with the transformer network,

the classiﬁcation accuracy signiﬁcantly increases to

85.46% on the 5-fold cross validation test data. This

demonstrates the substantial impact of the proposed

transformer network incorporating CNN features.

Table 1: Classiﬁcation accuracy on cross-validation dataset

with proposed model.

Cross validation set Accuracy

CV-1 90.90%

CV-2 81.81%

CV-3 81.81%

CV-4 90.90%

CV-5 81.81%

Mean 85.46%

7 DISCUSSION AND

COMPARISSIONS

Table 2 presents a comparison of our results with

those utilizing the STEW dataset (Lim et al., 2018).

The ﬁndings demonstrate a signiﬁcant improvement

BIOSIGNALS 2025 - 18th International Conference on Bio-inspired Systems and Signal Processing

932

Table 2: Performance comparision with related work.

Paper Classiﬁer Best Accuracy Average Accu-

racy

Remarks

(Lim et al., 2018) SVR 69.2% NA Subject wise train-test split and sig-

nal level classiﬁcation

(Parveen and

Bhavsar, 2023)

Attention based

1D-CNN

81.81% 79.98% Subject wise train-test split and sig-

nal level classiﬁcation

(Kingphai and Mosh-

feghi, 2023)

BGRU-GRU 84.56% NA Train-test split at window level

(Sharma and Ahir-

wal, 2021)

Cascaded 1D-

CNN and Bi-

LSTM mod

95.36% 94.68% Train-test split at window level

(Siddhad et al., 2024) Transformers 89.40% NA Subject wise train-test split, classi-

ﬁcation at window level

(Wang et al., 2024) LSTM 94.47% NA Test accuracy on single test data

Proposed Model Uniﬁed CNN-

Transformers

90.90% 85.46% Subject-wise train-test split along

with signal level classiﬁcation

achieved by our proposed approach, which utilizes a

modern deep learning framework, as anticipated. Our

comparison focuses exclusively on three level mental

workload classiﬁcation. Classiﬁcation accuracy pro-

vided by (Lim et al., 2018) is 69.2%

(Parveen and Bhavsar, 2023) utilized CNN along

with attention on both time and channel axis in or-

der to extract features and classify small windows of

EEG data. In order to provide accuracy at the sig-

nal level, majority voting on windows is applied. An

average test accuracy of 79.98% is achieved on three

mental workload levels (low vs medium vs high), re-

spectively. This comparison indicates the clear advan-

tage of incorporating a CNN- Transformer approach.

The methodology employed in (Siddhad et al.,

2024) exhibits resemblances to our approach, specif-

ically in utilizing transformer neural networks for

the classiﬁcation of STEW dataset. Authors in-

vestigate the application of transformer networks to

classify raw EEG data. It achieved an accuracy

of 89.40%. Although the approaches can be com-

pared, there are signiﬁcant distinctions in the evalu-

ation process and results. The study presented ﬁnd-

ings based on a conventional dataset, without employ-

ing cross-validation. Furthermore, our research uti-

lized a stricter evaluation process by employing a 5-

fold cross-validation dataset. This approach ensures

a more reliable assessment of the model’s capacity

to generalize. Hence, we compare our best accuracy

with approach. Our model attained the highest ac-

curacy of 90.90%, surpassing the performance of the

referenced approach.

The study (Wang et al., 2024) involves an

Attention-Based Recurrent Fuzzy Network (ARFN)

to evaluate mental workload through the analysis of

EEG data. The model employs an EEG signal pro-

cessing technique that integrates a fuzzy recursive

module and an LSTM layer. Performance evalua-

tion is carried out using k-fold cross-validation, based

on which, the best model is selected which is further

trained with entire training data (training and valida-

tion). Provided test accuracy of 94.47% on a single

test dataset. On the contrary, we have provided best as

well as averaged accuracy on 5-fold cross validation

dataset showcasing better generalization and reliabil-

ity of our approach.

While other studies, such as (Sharma and Ahirwal,

2021), and (Kingphai and Moshfeghi, 2023), have

also utilized the same dataset, they appear to divide

the training and testing data at the window level. This

means that when data is split at the window level, the

training and testing sets can include samples from the

same participants, which is impractical. In contrast,

our study, along with the above mentioned works,

partitions the data at the subject level. Although our

classiﬁcation is conducted at the window level—with

signal-level labels determined through majority vot-

ing—we maintain a subject-based training and test-

ing split. This method poses greater challenges due to

inter-subject variability, but it reﬂects a more realistic

scenario.

Thus, overall the proposed approach is among

the better performing methods, and also one of the

few approaches which provide results based on cross-

validation in this domain. This, we believe that our

performance is encouraging enough to explore the po-

tential of such mixed models further.

8 CONCLUSION

In this work, we utilized a novel strategy to address

the difﬁculties of EEG data by introducing a hybrid

CNN-transformer model for mental workload classi-

Uniﬁed CNN-Transformer Model for Mental Workload Classiﬁcation Using EEG

933

ﬁcation from EEG signals. Our model effectively an-

alyzes EEG signals by utilizing the strengths of CNNs

for extracting spatial features and Transformers for

capturing long-range temporal relationships.

Our methodology uses 1D-CNN to extract re-

silient characteristics from the EEG data which then

fed to the Transformer block. We implemented a ma-

jority voting method following the acquisition of pre-

dictions from the Transformer model. This method

guarantees that the ultimate precision reﬂects an ex-

tensive view of the mental workload condition, min-

imizing disruption and probable inaccuracies from

classiﬁcations based on windows. The suggested

model was assessed using a thorough 5-fold cross-

validation technique.

In summary, this study showcases the efﬁcacy of

integrating spatial and temporal modeling methodolo-

gies, which enables the advancement of EEG-based

mental workload evaluation and creates opportunities

for further research in cognitive state classiﬁcation.

Future studies could delve into additional enhance-

ments to the model and its utilization in various cog-

nitive activities, while also examining the feasibility

of real-time implementation for practical purposes in

monitoring and regulating mental workload.

REFERENCES

Bratﬁsch, O. and Hagman, E. (2008). Simkap–

simultankapazit

at/multi-tasking. M

odling: Schuhfried

GmbH.

Chandrashekar, G. and Sahin, F. (2014). A survey on feature

selection methods. Computers & Electrical Engineer-

ing, 40(1):16–28.

Huang, G., Zhang, Y., and Zhang, L. (2019). Cross-subject

mental workload assessment using transfer learning

with cnn. Frontiers in Neuroscience, 13:827.

Iriarte, J., Urrestarazu, E., Valencia, M., Alegre, M., Ma-

landa, A., Viteri, C., and Artieda, J. (2003). Indepen-

dent component analysis as a tool to eliminate artifacts

in eeg: a quantitative study. Journal of clinical neuro-

physiology, 20(4):249–257.

Kingphai, K. and Moshfeghi, Y. (2023). On time series

cross-validation for deep learning classiﬁcation model

of mental workload levels based on eeg signals. In

Machine Learning, Optimization, and Data Science:

8th International Workshop, LOD 2022, Certosa di

Pontignano, Italy, September 19–22, 2022, Revised

Selected Papers, Part II, pages 402–416. Springer.

Lai, C. Q., Ibrahim, H., Abdullah, M. Z., Abdullah, J. M.,

Suandi, S. A., and Azman, A. (2018). Artifacts and

noise removal for electroencephalogram (eeg): A lit-

erature review. In 2018 IEEE Symposium on Com-

puter Applications & Industrial Electronics (ISCAIE),

pages 326–332. IEEE.

Lim, W., Sourina, O., and Wang, L. P. (2018). Stew: Simul-

taneous task eeg workload data set. IEEE Transac-

tions on Neural Systems and Rehabilitation Engineer-

ing, 26(11):2106–2114.

Mognon, A., Jovicich, J., Bruzzone, L., and Buiatti, M.

(2011). Adjust: An automatic eeg artifact detector

based on the joint use of spatial and temporal features.

Psychophysiology, 48(2):229–240.

Nitschke, J. B., Miller, G. A., and Cook, E. W. (1998). Dig-

ital ﬁltering in eeg/erp analysis: Some technical and

empirical comparisons. Behavior Research Methods,

Instruments, & Computers, 30:54–67.

Parveen, F. and Bhavsar, A. (2023). Attention based 1d-

cnn for mental workload classiﬁcation using eeg. In

Proceedings of the 16th International Conference on

PErvasive Technologies Related to Assistive Environ-

ments, pages 739–745.

Roy, Y., Banville, H., Albuquerque, I., Gramfort, A., Falk,

T. H., and Faubert, J. (2019). Deep learning-based

electroencephalography analysis: a systematic review.

Journal of neural engineering, 16(5):051001.

Schirrmeister, R. T., Springenberg, J. T., Fiederer, L. D. J.,

Glasstetter, M., Eggensperger, K., Tangermann, M.,

Hutter, F., Burgard, W., and Ball, T. (2017). Deep

learning with convolutional neural networks for eeg

decoding and visualization. Human Brain Mapping,

38(11):5391–5420.

Sharma, V. and Ahirwal, M. K. (2021). Quantiﬁca-

tion of mental workload using a cascaded deep

one-dimensional convolution neural network and bi-

directional long short-term memory model.

Siddhad, G., Gupta, A., Dogra, D. P., and Roy, P. P. (2024).

Efﬁcacy of transformer networks for classiﬁcation of

eeg data. Biomedical Signal Processing and Control,

87:105488.

Sun, S., Chen, S., Wang, H., Zhang, S., and Zhang, H.

(2021). A transformer-based deep neural network for

eeg signal classiﬁcation. IEEE Transactions on Neu-

ral Systems and Rehabilitation Engineering, 29:1574–

1584.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,

L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.

(2017). Attention is all you need. In Advances in

Neural Information Processing Systems, volume 30.

Wang, Y., Wang, L., Wang, X., and Wang, W. (2020). Deep

learning-based classiﬁcation of eeg signals for cog-

nitive workload assessment. IEEE Access, 8:74262–

74272.

Wang, Z., Ouyang, Y., and Zeng, H. (2024). Arfn: An

attention-based recurrent fuzzy network for eeg men-

tal workload assessment. IEEE Transactions on In-

strumentation and Measurement.

BIOSIGNALS 2025 - 18th International Conference on Bio-inspired Systems and Signal Processing

934