
ACKNOWLEDGEMENTS
We would like to thank the AI SoundLab team for
their efforts to deploy this platform on the ship. In
addition, we thank Martin H
¨
ammerle for his sup-
port through user experience design, who provided
clear and graspable information material for the ship’s
crew. Further, we thank all the participants who do-
nated their speech samples for this study, in particular
the two captains of the ship, who supported the data
collection greatly.
REFERENCES
Alberdi, A., Aztiria, A., and Basarab, A. (2016). Towards
an automatic early stress recognition system for office
environments based on multimodal measurements: A
review. Journal of Biomedical Informatics, 59:49–75.
Amiriparian, S., Christ, L., Kathan, A., Gerczuk, M.,
M
¨
uller, N., Klug, S., Stappen, L., K
¨
onig, A., Cam-
bria, E., Schuller, B., et al. (2024). The muse
2024 multimodal sentiment analysis challenge: So-
cial perception and humor recognition. arXiv preprint
arXiv:2406.07753.
Baevski, A., Zhou, Y., Mohamed, A., and Auli, M. (2020).
wav2vec 2.0: a framework for self-supervised learn-
ing of speech representations. Advances in Neural In-
formation Processing Systems, 33:12449–12460.
Baird, A., Triantafyllopoulos, A., Z
¨
ankert, S., Ottl, S.,
Christ, L., Stappen, L., Konzok, J., Sturmbauer,
S., Meßner, E.-M., Kudielka, B. M., Rohleder, N.,
Baumeister, H., and Schuller, B. W. (2021). An eval-
uation of speech-based recognition of emotional and
physiological markers of stress. Front. Comput. Sci.,
3:750284.
Barr
´
e, R., Brunel, G., Barthet, P., and Laurencin-Dalicieux,
S. (2017). The visual analogue scale: An easy and
reliable way of assessing perceived stress. Quality in
Primary Health Care, 1(1).
Brooks, S. K. and Greenberg, N. (2022). Mental health and
psychological wellbeing of maritime personnel: a sys-
tematic review. BMC Psychol, 10(1):139.
Chen, T. and Guestrin, C. (2016). Xgboost: A scalable
tree boosting system. In Proceedings of the 22nd acm
sigkdd international conference on knowledge discov-
ery and data mining, pages 785–794.
Christ, L., Amiriparian, S., Baird, A., Kathan, A., M
¨
uller,
N., Klug, S., Gagne, C., Tzirakis, P., Stappen, L.,
Meßner, E.-M., K
¨
onig, A., Cowen, A., Cambria, E.,
and Schuller, B. W. (2023). The muse 2023 multi-
modal sentiment analysis challenge: Mimicked emo-
tions, cross-cultural humour, and personalisation. In
Proceedings of the 4th on Multimodal Sentiment Anal-
ysis Challenge and Workshop: Mimicked Emotions,
Humour and Personalisation, page 1–10. Association
for Computing Machinery.
Cohen, S. (1988). Perceived stress in a probability sam-
ple of the united states. The social psychology of
health/Sage.
Cohen, S., Kamarck, T., and Mermelstein, R. (1983). A
global measure of perceived stress. Journal of health
and social behavior, pages 385–396.
Conneau, A., Baevski, A., Collobert, R., Mohamed, A.,
and Auli, M. (2020). Unsupervised cross-lingual rep-
resentation learning for speech recognition. arXiv
[preprint] arXiv:2006.13979.
D
´
efossez, A., Synnaeve, G., and Adi, Y. (2020). Real Time
Speech Enhancement in the Waveform Domain. In
Proc. Interspeech 2020, pages 3291–3295.
Eyben, F., Scherer, K. R., Schuller, B. W., Sundberg,
J., Andre, E., Busso, C., Devillers, L. Y., Epps, J.,
Laukka, P., Narayanan, S. S., and Truong, K. P.
(2016). The geneva minimalistic acoustic parameter
set (GeMAPS) for voice research and affective com-
puting. IEEE Trans. Affective Comput., 7(2):190–202.
Eyben, F., Weninger, F., Gross, F., and Schuller, B. (2013).
Recent developments in opensmile, the munich open-
source multimedia feature extractor. In Proceedings
of the 21st ACM International Conference on Multi-
media, MM ’13, page 835–838, New York, NY, USA.
Association for Computing Machinery.
Ferrer, L. and Riera, P. (2023). Confidence intervals for
evaluation in machine learning. Accessed: 2024-02-
27.
Giddens, C. L., Barron, K. W., Byrd-Craven, J., Clark, K. F.,
and Winter, A. S. (2013). Vocal indices of stress: A
review. Journal of Voice, 27(3):390.e21–390.e29.
Hecker, P., Steckhan, N., Eyben, F., Schuller, B. W., and
Arnrich, B. (2022). Voice analysis for neurologi-
cal disorder recognition–a systematic review and per-
spective on emerging trends. Front. Digit. Health,
4:842301.
Higuchi, M., Nakamura, M., Shinohara, S., Omiya, Y.,
Takano, T., Mitsuyoshi, S., and Tokuno, S. (2020). Ef-
fectiveness of a voice-based mental health evaluation
system for mobile devices: Prospective study. JMIR
Form Res, 4(7):e16455.
Hsu, W.-N., Sriram, A., Baevski, A., Likhomanenko, T.,
Xu, Q., Pratap, V., Kahn, J., Lee, A., Collobert, R.,
Synnaeve, G., et al. (2021). Robust wav2vec 2.0: An-
alyzing domain shift in self-supervised pre-training.
arXiv preprint arXiv:2104.01027.
Jiang, L., Gao, B., Gu, J., Chen, Y., Gao, Z., Ma, X.,
Kendrick, K. M., and Woo, W. L. (2019). Wearable
long-term social sensing for mental wellbeing. IEEE
Sensors Journal, 19(19):8532–8542.
Joinson, A. N., Reips, U.-D., Buchanan, T., and Schofield,
C. B. P. (2010). Privacy, trust, and self-disclosure on-
line. Human–Computer Interaction, 25(1):1–24.
Kroenke, K. and Spitzer, R. L. (2002). The phq-9: a new
depression diagnostic and severity measure.
Laugwitz, B., Held, T., and Schrepp, M. (2008). Construc-
tion and evaluation of a user experience questionnaire.
In Holzinger, A., editor, HCI and Usability for Educa-
tion and Work, volume 5298, pages 63–76. Springer
Berlin Heidelberg.
Mental Wellbeing at Sea: A Prototype to Collect Speech Data in Maritime Settings
39