Early Detection of Chronic Stress Using Wearable Devices: A Machine

Learning Approach with the WESAD Database

Amaia Calvo

1 a

, Julen Martin

and Cristina Martin

1,2,3 b

Fundaci

on Vicomtech, Basque Research and Technology Alliance (BRTA), Mikeletegi 57, Donostia-San Sebasti

an,

20009, Basque Coutry, Spain

Faculty of Engineering, University of Deusto, Avda. Universidades, 24, Bilbao, 48007, Basque Coutry, Spain

BioGipuzkoa Health Research Institute (Bioengineering Area), eHealth Group, 20014 Donostia-San Sebasti

an, Spain

Keywords:

Chronic Stress, Wearable Devices, Machine Learning, Deep Learning, Stress Detection, Physiological

Signals, WESAD Database, Subject-Dependent Models, Health Monitoring.

Abstract:

Stress disorders have experienced a signiﬁcant increase in recent years, impacting individual health. This

study explores the feasibility of detecting this mental condition through the analysis of physiological signals

captured by wearable devices using machine learning algorithms. An exhaustive review of relevant public

databases was conducted and WESAD database was identiﬁed as the most suitable one. A detailed examina-

tion was conducted using two different conﬁgurations for building AI models: in one approach, a single model

was created using data from all participants, while in the other, personalized models were developed for each

individual participant. This approach evaluated the effectiveness of different preprocessing methods and AI

algorithms, as well as identiﬁed the physiological signals most informative about stress. Convolutional Neural

Networks (CNN) achieved the highest accuracy in stress detection, with an overall accuracy of 99.8% for the

single model conﬁguration and 99.6% for personalized models. The analysis also highlighted electrocardio-

gram (ECG) and electrodermal activity (EDA) as the most informative signals for predicting stress.

1 INTRODUCTION

Stress can be deﬁned as a natural physiological and

psychological response activated by situations per-

ceived as threatening or dangerous. It performs an es-

sential role in human alarm and defense mechanisms.

Despite this, when these stressful emotions become

frequent, they can have harmful effects on mental and

physical health, increasing the risk of developing var-

ious illnesses, such as cardiovascular diseases, mood

disorders, or sleep disorders (Slavich, 2020). Alarm-

ingly, it is estimated that approximately 1 in 4 adults

experience stress regularly, and some studies indi-

cate a 30% increase in reported stress levels over the

past decade, particularly among younger individuals

(American Psychological Association, 2017).

The economic burden of stress-related illnesses

on modern societies is substantial, costing healthcare

systems billions each year in treatment, lost produc-

tivity, and decreased quality of life. According to

https://orcid.org/0009-0009-2806-5344

https://orcid.org/0000-0002-3919-2738

a report by the World Health Organization (WHO)

(Depression, 2017), it is estimated that mental dis-

orders, including stress and anxiety, can cost global

economies up to $1 trillion annually in lost produc-

tivity. Additionally, a study by Gallup (Gallup, 2017)

revealed that employees experiencing stress tend to

be less productive, which can negatively impact com-

pany proﬁts and the economy as a whole. Work-

related stress contributes signiﬁcantly to productivity

loss (Giorgi et al., 2020).

Taking this into account, the early detection of

stress becomes crucial to avoid its negative effects

(Kivim

aki and Steptoe, 2018). Research has empha-

sized the importance of timely stress detection and the

development of preventive solutions to address this

growing issue (Slavich, 2020). Furthermore, it has

been pointed out that it is essential to create accessi-

ble solutions for the entire population to ensure that

no one is left without support (Patel et al., 2018). Ad-

ditionally, studies have indicated that primary care is

overwhelmed and that mental health issues continue

to rise, highlighting the urgency of implementing ef-

fective alternatives (Moise et al., 2021). Traditionally,

Calvo, A., Martin, J. and Martin, C.

Early Detection of Chronic Stress Using Wearable Devices: A Machine Learning Approach with the WESAD Database.

DOI: 10.5220/0013209700003938

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 11th International Conference on Information and Communication Technologies for Ageing Well and e-Health (ICT4AWE 2025), pages 189-196

ISBN: 978-989-758-743-6; ISSN: 2184-4984

189

stress is measured using self-reported questionnaires

or by visiting a mental health practitioner. However,

these techniques lack objectivity and are not compat-

ible with everyday situations, highlighting the need

for alternative methods to detect stress. Additionally,

psychological consultations can be very expensive,

making them inaccessible for many people. Stress can

develop into more severe mental health conditions if

it is not detected in time. Therefore, it is essential

to develop affordable, objective, and practical meth-

ods for early stress detection to prevent the progres-

sion of mental health disorders (Espeleta et al., 2018).

The WHO emphasizes that preventive solutions and

early interventions are critical components in manag-

ing stress and related disorders (World Health Orga-

nization, 2018).

In this context, the possibility of detecting stress

with wearable devices emerges (Lupton, 2020).

Wearables are able to monitor a wide variety of phys-

iological signals such as heart rate, skin temperature

and galvanic skin response in an objective, continu-

ous, and affordable way. Furthermore, the recent rise

in popularity of these devices, combined with their ca-

pacity for continuous monitoring, suggests they could

play a key role in future healthcare (Baig et al., 2019)

by offering a scalable solution to stress detection that

bridges the gap left by traditional methods.

This study aims to investigate the potential for de-

tecting stress using wearable devices, utilizing data

from the public WESAD database. A secondary ob-

jective of the study is to identify which sensors pro-

vide the most critical information for stress detection,

enabling the design of experiments ad-hoc to speciﬁc

requirements for stress prediction in the future.

Through this study, we aim not only to advance

the technical performance of stress detection models

but also to contribute to the broader goal of develop-

ing practical, scalable solutions for early stress detec-

tion, with the potential to mitigate the growing burden

of stress-related disorders on global health.

The structure of the paper is organized as follows.

In Section 2 reviews the related work on physiological

signals and notable studies in stress detection. Section

3 describes the WESAD dataset utilized and the pre-

processing steps taken. Section 4 outlines the method-

ology, including the subject-dependent models and

preprocessing strategies. Section 5 presents the ex-

periments and results, comparing the performance of

various machine learning algorithms. Section 7 dis-

cusses the explainability of the models, highlighting

the importance of feature relevance. Finally, Section

8 concludes the study by summarizing the key ﬁnd-

ings and suggesting future research directions.

2 RELATED WORK

2.1 Physiological Signals for Stress

Detection

The detection of stress through physiological sig-

nals has been extensively studied, leveraging various

types of data to assess stress levels accurately. One

of the primary physiological indicators of stress is

heart rate (HR). Stress typically triggers an increase

in HR due to heightened sympathetic nervous system

activity. This response is commonly monitored us-

ing electrocardiograms (ECG) and photoplethysmo-

grams (PPG). HR measurements are useful in iden-

tifying stress, but they need to be complemented by

additional metrics for a more comprehensive analysis

(Dinh et al., 2020).

Another critical parameter is heart rate variabil-

ity (HRV), which measures the variation in time be-

tween successive heartbeats. HRV is an essential indi-

cator of the autonomic nervous system’s responsive-

ness and adaptability. A reduction in HRV is gener-

ally associated with higher stress levels. This measure

is derived from the analysis of RR intervals in ECG

signals, offering valuable insights into an individual’s

stress state (Dinh et al., 2020).

Galvanic skin response (GSR), also known as

electrodermal activity (EDA), is another widely used

physiological signal for stress detection. Stress in-

duces sweating, which changes the skin’s electrical

conductance. Monitoring GSR can provide signiﬁ-

cant information about stress levels, especially when

used in conjunction with ECG data. However, GSR

measurement can be inﬂuenced by various factors,

including ambient temperature and humidity, which

may affect its accuracy (Affanni, 2020) (Eren and

Navruz, 2022).

Blood pressure (BP) is also a relevant physiologi-

cal signal in stress research. Elevated BP can indicate

stress, although it may also be inﬂuenced by physical

exertion and other health conditions. Continuous BP

monitoring presents challenges, often requiring indi-

rect measurement techniques such as infrared photo-

plethysmography (Dinh et al., 2020). While BP data

can be informative, it does not always provide a clear

distinction between stress-induced and other types of

hypertension.

Pupil diameter (PD) has emerged as a promis-

ing measure of stress, as stress can cause rapid

ﬂuctuations in pupil size. Techniques like video-

pupillography are used to measure these changes, but

they are often costly and time-consuming, limiting

their practical application in real-time stress monitor-

ing (Dinh et al., 2020).

ICT4AWE 2025 - 11th International Conference on Information and Communication Technologies for Ageing Well and e-Health

190

Respiration variability (RESP) is another physi-

ological parameter that reﬂects stress levels. Stress

can alter both the rate and depth of breathing, mak-

ing RESP measurements valuable for stress assess-

ment. Sensors that track thoracic expansion are used

to capture this variability, providing additional data

for stress detection (Dinh et al., 2020).

Accelerometers are commonly integrated into

wearable devices to monitor involuntary movements

such as tremors, which can correlate with stress lev-

els. These devices offer a practical approach to de-

tecting stress-related physical responses and are often

used in combination with other physiological mea-

sures to enhance detection accuracy (Dinh et al.,

2020).

2.2 Notable Studies

Several studies have made substantial contributions

to the ﬁeld. For instance, a study published by

IEEE in 2012 achieved an 81% accuracy rate in dis-

tinguishing between stressed and non-stressed states

using a wearable device that measured ECG, GSR,

electromyography (EMG), and respiratory frequency

(Can et al., 2020). This research focused on detecting

acute stress rather than chronic stress, inducing stress

in participants through psychophysiological tasks de-

signed to elicit speciﬁc mental states. The study in-

volved a relatively small sample of 20 participants,

including both men and women, monitored continu-

ously for over 13 hours. The logistic regression model

used for classiﬁcation demonstrated the potential of

wearable devices for continuous stress detection, al-

though the accuracy suggests a need for more sophis-

ticated models to enhance the precision.

Another notable study conducted by Bogazici

University and the University of Milan in 2020

achieved a 94.52% accuracy rate in classifying stress

levels using a hybrid artiﬁcial intelligence approach

(Can et al., 2020). This study also targeted acute

stress but included a wider range of stress levels, dif-

ferentiating between low, moderate, and high stress.

Stress was induced during a structured event compris-

ing baseline, lecture, exam, and recovery sessions,

allowing researchers to analyze how stress manage-

ment techniques, speciﬁcally guided mindfulness, af-

fected stress levels. The sample consisted of 32 par-

ticipants, with demographic details not extensively

discussed. The dataset was collected across various

sessions, enhancing the practical applicability of the

ﬁndings. The use of everyday wearable devices such

as smartwatches allowed for unobtrusive and continu-

ous monitoring, improving accuracy through person-

alized stress clustering and decision-level smoothing

techniques to correct misclassiﬁcations.

Additionally, research from the University of Vigo

investigated wearable devices for stress and sleep

monitoring, achieving a 90% accuracy rate with vari-

ous machine learning models (Dalmeida and Masala,

2021). This study focused on acute stress experi-

enced by 27 young, healthy participants while driv-

ing. The dataset included physiological signals mea-

sured during different driving conditions—rest, high-

way, and city driving—utilizing physiological sig-

nals to develop predictive models. Multiple machine

learning algorithms, including K-Nearest Neighbor

(KNN) and Support Vector Machines (SVM), were

tested, with SVM achieving the highest performance

at 83.33% accuracy. While this study provided in-

sights into real-world stress detection in driving sce-

narios, it faced challenges typical of real-life appli-

cations, such as variations in accuracy compared to

laboratory settings.

Despite these advancements, several challenges

remain in the ﬁeld of stress detection using physio-

logical signals (Dalmeida and Masala, 2021). One

signiﬁcant limitation is the precision and sensitivity

of wearable devices, which can vary widely and be in-

ﬂuenced by factors unrelated to stress. Additionally,

the cost of high-quality wearables can be prohibitive,

limiting their accessibility compared to clinical de-

vices. Finally, the speciﬁcity of different devices and

measurement techniques can lead to inconsistencies

in stress detection results, highlighting the need for

standardized approaches.

This review underscores the signiﬁcant progress

made in physiological signal analysis and machine

learning for stress detection. While advancements

continue to enhance the accuracy and practicality

of stress monitoring, ongoing research is needed to

address current limitations, such as generalizabil-

ity across different populations and the integration

of multi-modal physiological signals. Additionally,

while accuracy is an important metric, it should not

be the sole focus; other performance indicators like

recall and precision are essential for evaluating model

robustness in real-world applications.

3 DESCRIPTION OF THE

DATASET

The dataset chosen for this study is the publicly avail-

able WESAD (Wearable Affect and Stress Detection)

dataset, designed for the analysis of acute stress re-

sponses rather than chronic stress. This dataset was

selected for its rich physiological data, making it suit-

able for studying short-term stress detection through

Early Detection of Chronic Stress Using Wearable Devices: A Machine Learning Approach with the WESAD Database

191

wearable devices.

The dataset includes data from 17 volunteers who

underwent stress induction procedures in a controlled

laboratory. After excluding 2 subjects due to data in-

terference, the ﬁnal dataset contains 15 participants:

12 men and 3 women, with an average age of 27.4

years.

It records physiological signals across three emo-

tional states: baseline, stress, and amusement. The

baseline phase lasted 20 minutes, followed by a 10-

minute Trier Social Stress Test (TSST) to induce

stress, and ﬁnally, 6 minutes of comical videos to

elicit amusement. Although amusement data is avail-

able, it is not used in this study, which focuses solely

on stress detection.

Each participant has approximately 36 minutes of

data. Data was collected using the RespiBAN Pro-

fessional chest band and the Empatica E4 smartband.

The RespiBAN captures higher-quality data with a

700 Hz sampling rate for respiratory rate, accelerom-

eter, ECG, EDA, EMG, and temperature. The Em-

patica E4, with lower sampling rates, recorded blood

pressure, EDA, temperature, and accelerometer data.

Due to numerous missing values in the Empatica E4

signals, only data from the RespiBAN is included in

this study.

4 METHODOLOGY

This study veriﬁes the possibility of predicting stress-

ful emotional states using physiological signals from

two different conﬁgurations of subject dependent

models. The objective is to learn how physiological

signals of a person can be used to detect her/his stress

level.

4.1 Subject-Dependent Models

A subject-dependent approach utilizes data from the

same individual for training, validation, and testing

phases of model creation. One advantage of this strat-

egy is that it allows the model to become more person-

alized by learning the unique characteristics of each

person. On the contrary, when trying to identify the

stress of another (different) individual they might not

generalize well.

In this study, the data was divided into a training,

validation and test subset while maintaining the tem-

poral structure of the signals data. The training subset

is composed of the ﬁrst 70% of the data, the validation

subset consisted of the next 15%, and the test subset

included the ﬁnal 15%.

Figure 1: Data partitioning into subject-dependent models

with all participants.

Subject-dependent models were further divided

into two conﬁgurations:

• One Single Model for all Participants: In this

conﬁguration, data from all participants are used

in training, validation, and testing phases. This re-

sults in a model that attempts to generalize across

multiple individuals while maintaining the tempo-

ral structure of the signals (see Figure 1).

• One Personalized Model per Participant: In

this conﬁguration, a separate model is trained for

each individual participant. This allows for a

highly personalized approach, where the model

only learns from the speciﬁc individual’s data.

The same 70-15-15 temporal split is used for

training, validation, and testing, but exclusively

with the data from one subject at a time (see Fig-

ure 2).

The main goal of this comparison is to assess

whether using data from multiple participants en-

hances model performance by providing a wider

range of variability, or if personalized models that fo-

cus on individual patterns yield better predictive ac-

curacy due to their speciﬁcity to one subject.

4.2 Preprocessing Strategies

In addition to optimize the performance of the

subject-dependent models, this study will evaluate the

effect of three different preprocessing strategies:

• P1: Applies Min-Max normalization, which

scales the data into the range [0, 1]. This process

help mitigate the impact of features with larger

values by scaling all variables to a common range.

• P2: Normalizes the data similarly to P1, but fol-

lowed by SMOTE (Synthetic Minority Oversam-

pling Technique) to address class imbalance by

ICT4AWE 2025 - 11th International Conference on Information and Communication Technologies for Ageing Well and e-Health

192

Figure 2: Data partitioning for personalized models.

generating synthetic examples for the minority

class.

• P3:This technique initially applies normalization

to the data, followed by the implementation of

PCA (Principal Component Analysis) to reduce

the dimensionality of the data and capture the

most important features while discarding redun-

dant information.

4.3 Evaluation Metrics

The evaluation of the models was performed using

two key metrics: accuracy and F1-score.

• Accuracy is deﬁned as the proportion of correctly

classiﬁed cases out of the total. It is commonly

used when all classes are equally important.

• F1-score is the harmonic mean of precision and

recall, and is especially useful when minimizing

false negatives is critical, such as in medical ap-

plications. Precision represents the proportion of

true positive predictions out of all positive predic-

tions, while recall (or sensitivity) represents the

proportion of true positives out of the actual posi-

tive cases.

The results for each combination of model and

preprocessing technique will be presented below.

5 EXPERIMENTS AND RESULTS

5.1 Algorithms Used

Machine learning includes a wide range of algo-

rithms for classifying new instances. This study aims

to compare the effectiveness of several algorithms.

Among the machine learning algorithms that will

be evaluated are Decision Trees, Random Forests,

Support Vector Machines (SVM), Adaboost, Logis-

tic Regression, XGBoost, Linear Discriminant Anal-

ysis (LDA), and K-Nearest Neighbours. In addition,

the performance of deep learning algorithms, such as

LSTM Recurrent Neural Networks and Convolutional

Neural Networks (CNN), see Table 1.

5.2 Results

The goal is to provide a comparative analysis to iden-

tify which algorithms offer the best performance in

terms of accuracy and F1-score for this dataset. Table

1 allows a better selection of the optimal algorithm to

make more accurate predictions about the stress state

of patients. It is worth noting that the study focuses

on binary classiﬁcation (stress and relaxation).

The AI models trained are used for the two differ-

ent model conﬁgurations and provide the following

metrics:

• One Single Model for all Participants: The

best-performing algorithm was the Convolutional

Neural Networks (CNN). CNN achieved the high-

est performance in binary classiﬁcation, with an

accuracy of 99.8% and F1-score of 0.998. The

results compared with P1 indicate that CNN not

only provides the best option in terms of accu-

racy and F1 of the deep learning algorithms but

also highlights the best option among the machine

learning algorithms.

• One Personalized Model per Participant: Per-

sonalized models were built for each of the 15

subjects, and the average performance across

these models was calculated. Once again, CNN

proved to be the top performer in binary classiﬁ-

cation, achieving an accuracy of 96.4% and an F1-

score of 0.962. These results indicate that CNN

not only stands out as the best option in terms of

accuracy and F1 for personalized models but also

reinforces its position as the leading choice among

machine learning algorithms.

In terms of overall performance, models trained

with data from all subjects (general models) tended to

outperform personalized models. This suggests that

a more diverse dataset improves the model’s ability

to generalize and classify new instances more effec-

tively. While CNN performed exceptionally well (An

accuracy of 99.8% with P1 and an F1-score of 0.998.),

the results indicate that tree-based models like Ran-

dom Forest may not yield the same level of accuracy

in this binary context. Moreover, deep learning mod-

els such as CNN appeared to beneﬁt more from per-

sonalization and appropriate preprocessing, reinforc-

Early Detection of Chronic Stress Using Wearable Devices: A Machine Learning Approach with the WESAD Database

193

Figure 3: Feature Importance.

ing the notion that an individualized approach can be

advantageous when tailored correctly.

6 EXPLAINABILITY

The analysis of explainability is conducted for the

general model designed using Convolutional Neu-

ral Networks (CNN). As the integration of machine

learning applications in society increases, the explain-

ability of predictive models is becoming an essential

aspect. Explainability provides transparency in model

decisions, which is crucial in the ﬁeld of medicine.

In this context, explainability can help identify the

most relevant variables for predicting stress, thereby

enabling the design of more effective and personal-

ized interventions for its management and reduction.

To evaluate the explainability of the obtained model,

the importance of features and SHapley Additive ex-

Planations (SHAP) values are assessed.

Feature importance assigns a score to each fea-

ture, indicating its relevance in model construction.

Features with higher scores are considered more im-

portant. The results of the feature importance analy-

sis are presented below. Figure 3 illustrates that the

most important features are derived from electrocar-

diogram and electrodermal activity sensors. This pro-

vides insight into which variables are most affected by

stressful situations, indicating which sensors are most

useful as biomarkers.

On the other hand, SHAP values offer a method

for explaining a predictive model’s response based on

game theory. They measure how much each variable

contributes to the prediction of a given observation,

allowing for a more detailed and precise interpretation

of how individual features affect predictions. One ad-

vantage of SHAP values is that they indicate whether

each variable has a positive or negative impact on pre-

dictions based on its values. Another beneﬁt is that

SHAP values enable local interpretability; that is, one

can arbitrarily select an instance to examine which

factors were most relevant in predicting that speciﬁc

Figure 4: SHAP Values.

case.

Figure 4 displays the SHAP values for all predic-

tions from the last fold of cross-validation. A high

SHAP value indicates that the variable signiﬁcantly

impacts the model’s prediction, while values close to

0 reveal that the variable has little inﬂuence on the re-

sults. Analyzing this image shows that the most inﬂu-

ential variables are the mean and standard deviation

of heart rate, the mean of accelerometer readings, and

the mean, range, maximum, and standard deviation

of electrodermal activity (EDA). Notably, while both

methods indicate the same ﬁve main variables for pre-

diction, the feature importance analysis does not in-

clude the mean of accelerometer readings (a mean),

which does appear in the SHAP values. This discrep-

ancy may reﬂect differences in how each method eval-

uates feature relevance.

Considering the results from both explainability

techniques, it can be concluded that the most relevant

sensors for stress detection are the ECG, EDA, and

accelerometer.

7 RESULTS DISCUSSION

After analyzing stress detection using the WESAD

dataset, several signiﬁcant conclusions were drawn.

The initial step involved exploring the data to under-

stand its distribution and the potential relevance of

each sensor in the prediction. Subsequently, the time

ICT4AWE 2025 - 11th International Conference on Information and Communication Technologies for Ageing Well and e-Health

194

Table 1: Comparison of Classiﬁcation Models.

Subject-dependent models

P1 P2 P3

Accuracy F1-score Accuracy F1-score Accuracy F1-score

Decision trees 0.922 0.922 0.934 0.933 0.935 0.935

Random Forest 0.993 0.993 0.996 0.996 0.996 0.996

SVM 0.973 0.973 0.973 0.973 0.974 0.974

Adaboost 0.990 0.990 0.987 0.987 0.919 0.919

Logistic Regression 0.893 0.892 0.885 0.885 0.859 0.859

XGBoost 0.949 0.949 0.943 0.943 0.963 0.963

LDA 0.878 0.876 0.849 0.849 0.865 0.862

KNN 0.973 0.973 0.973 0.973 0.993 0.993

LSTM 0.965 0.965 0.953 0.953 0.945 0.945

CNN 0.998 0.998 0.983 0.983 0.979 0.979

Personalized models

P1 P2 P3

Accuracy F1-score Accuracy F1-score Accuracy F1-score

Decision trees 0.899 0.889 0.760 0.708 0.944 0.933

Random Forest 0.985 0.985 0.978 0.978 0.906 0.958

SVM 0.979 0.980 0.959 0.957 0.962 0.962

Adaboost 0.932 0.933 0.760 0.708 0.958 0.949

Logistic Regression 0.977 0.975 0.882 0.875 0.945 0.935

XGBoost 0.912 0.898 0.971 0.971 0.916 0.903

LDA 0.978 0.979 0.959 0.959 0.956 0.954

KNN 0.965 0.962 0.956 0.956 0.963 0.960

LSTM 0.916 0.911 0.946 0.941 0.925 0.921

CNN 0.964 0.962 0.949 0.947 0.996 0.996

series data was preprocessed and transformed into a

tabular format using the sliding window technique,

which facilitated the extraction of features.

The data was then divided into subject-dependent

conﬁgurations, and various machine learning algo-

rithms were applied to determine the most effective

one. Regarding the initial hypothesis of the project, it

can be stated that it is feasible to develop a stress pre-

diction model using information collected from wear-

able devices. After analyzing several machine learn-

ing algorithms, the one offering the best results was

selected for both subject-dependent models. In the

subject-dependent models, the Convolutional Neural

Network (CNN) achieved an accuracy of 99.8% for

binary classiﬁcation.

The analysis of subject-dependent conﬁgurations

revealed that a dataset with more users generally

yields better results than personalized models. This

indicates that, despite belonging to different subjects,

the inclusion of a larger volume of data provides

generalizable information, improving the accuracy of

predictions. Notably, these general models can be

likened to a laboratory setting, where diverse partici-

pants contribute to a richer dataset, similar to a foot-

ball team training together. In contrast, personalized

models—tailored for individual subjects—provide a

more precise approach, akin to customized care plans

in primary care settings. This understanding of indi-

vidual variability enhances the effectiveness of inter-

ventions.

Additionally, it was observed that the treatment

of data imbalance did not signiﬁcantly inﬂuence

the results as anticipated; in many cases, the best-

performing model was the one that did not apply

SMOTE. In the subject-dependent models, the im-

portance of selecting the appropriate window size for

feature extraction was highlighted. The best results

were obtained with 4-minute windows and a step size

of 1 second, which are relatively wide for this type

of case. This allows for a more comprehensive view

of the time series, capturing the global characteristics

of each class and reducing signal noise. However, a

large window size may overlook important details in

different types of signals and can increase the compu-

tational complexity of processing. The optimization

phase also concluded that, despite having very differ-

ent characteristics, a higher number of subjects in the

study improves model performance.

Finally, the study of explainability provided in-

sights into the relevance of each variable in the model.

Two methods were employed: feature importance and

SHAP values. The results from both methods indi-

Early Detection of Chronic Stress Using Wearable Devices: A Machine Learning Approach with the WESAD Database

195

cated that the most impactful variables for prediction

were those derived from the ECG, accelerometer, and

EDA sensors.

8 CONCLUSIONS

This study conﬁrms the feasibility of developing an

effective stress prediction model using information

collected from wearable devices. The ﬁndings under-

score the importance of leveraging diverse datasets

to enhance predictive accuracy, as demonstrated by

the Convolutional Neural Network achieving an accu-

racy of 99.8% for a binary classiﬁcation that identiﬁes

relaxed-stressed situations using two subject depen-

dent models. By building AI models upon WESAD

dataset we learned that ECG and EDA signals provide

the most valuable information to predict stress. The

results obtained in this research work will be used in

an observational study that will build a new dataset to

predict the stress suffered by Vicomtech profession-

als.

The potential applications of these models extend

to real-world settings, where early stress detection

can lead to timely interventions and improved mental

health outcomes. Future research could focus on op-

timizing personalized models for individual subjects

and exploring the integration of additional physiolog-

ical data from commercial wearables to advance early

stress detection.

ACKNOWLEDGMENT

We would like to acknowledge QOLIFE - Multimodal

real world capture and processing for quality of life

assessment project, funded as internal project of Fun-

daci

on Vicomtech.

REFERENCES

Affanni, A. (2020). Wireless sensors system for stress de-

tection by means of ecg and eda acquisition. Sensors,

20(7):2026.

American Psychological Association (2017). Stress in

america: The state of our nation. Technical report,

American Psychological Association.

Baig, M. M., Aﬁﬁ, S., GholamHosseini, H., and Mirza, F.

(2019). A systematic review of wearable sensors and

iot-based monitoring applications for older adults–a

focus on ageing population and independent living.

Journal of medical systems, 43:1–11.

Can, Y. S., Chalabianloo, N., Ekiz, D., Fernandez-Alvarez,

J., Riva, G., and Ersoy, C. (2020). Personal stress-

level clustering and decision-level smoothing to en-

hance the performance of ambulatory stress detection

with smartwatches. IEEE Access, 8:38146–38163.

Dalmeida, K. M. and Masala, G. L. (2021). Hrv features as

viable physiological markers for stress detection using

wearable devices. Sensors, 21(8):2873.

Depression, W. (2017). Other common mental disorders:

global health estimates. Geneva: World Health Orga-

nization, 24(1).

Dinh, T., Nguyen, T., Phan, H.-P., Nguyen, N.-T., Dao,

D. V., and Bell, J. (2020). Stretchable respiration

sensors: Advanced designs and multifunctional plat-

forms for wearable physiological monitoring. Biosen-

sors and Bioelectronics, 166:112460.

Eren, E. and Navruz, T. S. (2022). Stress detection with

deep learning using bvp and eda signals. In 2022

International Congress on Human-Computer Interac-

tion, Optimization and Robotic Applications (HORA),

pages 1–7. IEEE.

Espeleta, H. C., Brett, E. I., Ridings, L. E., Leavens, E. L.,

and Mullins, L. L. (2018). Childhood adversity and

adult health-risk behaviors: Examining the roles of

emotion dysregulation and urgency. Child Abuse &

Neglect, 82:92–101.

Gallup (2017). State of the american workplace: Employee

engagement insights for u.s. business leaders. Techni-

cal report, Gallup.

Giorgi, G., Lecca, L. I., Alessio, F., Finstad, G. L., Bon-

danini, G., Lulli, L. G., Arcangeli, G., and Mucci,

N. (2020). Covid-19-related mental health effects

in the workplace: a narrative review. International

journal of environmental research and public health,

17(21):7857.

Kivim

aki, M. and Steptoe, A. (2018). Effects of stress

on the development and progression of cardiovascular

disease. Nature Reviews Cardiology, 15(4):215–229.

Lupton, D. (2020). Wearable devices: Sociotechnical imag-

inaries and agential capacities.

Moise, N., Wainberg, M., and Shah, R. N. (2021). Primary

care and mental health: Where do we go from here?

World Journal of Psychiatry, 11(7):271–276.

Patel, V., Saxena, S., Lund, C., Thornicroft, G., Baingana,

F., Bolton, P., Chisholm, D., Collins, P. Y., Cooper,

J. L., Eaton, J., et al. (2018). The lancet commission

on global mental health and sustainable development.

The lancet, 392(10157):1553–1598.

Slavich, G. M. (2020). Social safety theory: a biologically

based evolutionary perspective on life stress, health,

and behavior. Annual review of clinical psychology,

16(1):265–295.

World Health Organization (2018). Mental health: strength-

ening our response. Technical report, World Health

Organization.

ICT4AWE 2025 - 11th International Conference on Information and Communication Technologies for Ageing Well and e-Health

196