
5 CONCLUSION
Due to the minimal differences in the data ranges
and the rapid variations between them, distinguish-
ing their characteristics using basic statistical calcu-
lations, such as mean and standard deviation with the
bootstrapping method, is challenging. Despite apply-
ing the FDR correction, this method remains unsta-
ble. In contrast, the distribution density of the data
is somewhat more robust; however, due to the nar-
row range of the data, it has difficulties in differen-
tiating between depressed and healthy data based on
the IMF, especially when using the standard deviation
as a statistic.
The Gaussian kernel, while not complex or com-
putationally expensive, is better at distinguishing
characteristics by accounting for variance and per-
forming local weighting through the filter. It again
highlights the first IMFs as relevant for differentiat-
ing features between depression and health.
We believe that future research should focus on
applying a more specific kernel, considering the range
and rapid fluctuations in voice data. Additionally, in-
corporating a larger database with samples from both
genders would be beneficial to analyze if there are
gender differences and whether these overlap with the
discrepancies between depression and health. In any
case, identifying depression through voice is a very
promising field where, as we have seen, we are likely
to establish a diagnostic differentiation method poten-
tially useful in digital screening of depression.
Although it is not the aim of this study, we be-
lieve that it could be interesting to compare Gaussian
analysis of IMFs with machine learning methods or
to incorporate the results into such models to test pre-
diction methods. Furthermore, the integration of this
type of diagnosis into the clinical setting would be
crucial, allowing for the real-time assessment of pa-
tients with depression in hospital environments. Ethi-
cal applicability must be considered, as it involves the
collection and analysis of patients voices, which re-
quires their consent for the collection and handling of
their voice data.
ACKNOWLEDGEMENTS
X.S.C. carried out this work as part of the PhD
programme in Experimental Sciences and Technol-
ogy at the University of Vic - Central University
of Catalonia. We would like to thank the Univer-
sity of Southern California for providing voice data
and questionnaire information, without which this re-
search would not have been possible. We finally
thank the support of the Spanish Ministry of Sci-
ence and Innovation/ISCIII/FEDER (PI21/01148));
the Secretaria d’Universitats i Recerca del Departa-
ment d’Economia i Coneixement of the Generalitat de
Catalunya (2021 SGR 01431); the CERCA program
of the I3PT; the Instituto de Salud Carlos III; and the
CIBER of Mental Health (CIBERSAM).
CONFLICT OF INTEREST
D.P. has received grants and also served as a consul-
tant or advisor for Rovi, Angelini, Janssen and Lund-
beck, with no financial or other relationship relevant
to the subject of this article. The other authors declare
no conflicts of interest.
REFERENCES
Akkaralaertsest, T. and Yingthawornsuk, T. (2019). Classi-
fication of depressed speech samples with spectral en-
ergy ratios as depression indicator. In 2019 IEEE Sym-
posium Series on Computational Intelligence (SSCI).
Alghowinem, S., Goecke, R., Wagner, M., Epps, J.,
Gedeon, T., Breakspear, M., and Parker, G. (2013).
A comparative study of different classifiers for de-
tecting depression from spontaneous speech. In 2013
IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), pages 8022–8026.
IEEE.
Audacity (2023). http://sourceforge.net/projects/audacity/.
(accessed 1 October 2023).
Chen, L., Wang, C., Chen, J., Xiang, Z., and Hu, X. (2021).
Voice disorder identification by using hilbert-huang
transform (hht) and k nearest neighbor (knn). Jour-
nal of Voice, 35(6):932.e1–932.e11.
Espinola, C. W., Gomes, J. C., Pereira, J. M. S., and dos
Santos, W. P. (2022). Detection of major depressive
disorder, bipolar disorder, schizophrenia and general-
ized anxiety disorder using vocal acoustic analysis and
machine learning: An exploratory study. Research on
Biomedical Engineering, 38(3):813–829.
Esposito, A., Faundez-Zanuy, M., Esposito, A. M., Cor-
dasco, G., Drugman, T., Sol
´
e-Casals, J., and Mora-
bito, F. C. (2016). Recent Advances in Nonlin-
ear Speech Processing: Directions and Challenges.
Springer.
Gratch, J., Artstein, R., Lucas, G., Stratou, G., Scherer,
S., Nazarian, A., Wood, R., Boberg, J., DeVault, D.,
Marsella, S., Traum, D., Rizzo, S., and Morency,
L.-P. (2014). The distress analysis interview corpus
of human and computer interviews. In LREC 2014
- Ninth International Conference on Language Re-
sources and Evaluation, pages 3123–3128. European
Language Resources Association (ELRA).
Krishnan, P., Joseph Raj, A., and Rajangam, V. (2021).
Emotion classification from speech signal based on
Analyzing Male Depression Using Empirical Mode Decomposition
891