functionalities include user-generated tags for a more
customized filtering experience that would be avail-
able to other users; automatic lyrics for the full song
scraped from an available API, e.g., Genius; and Mu-
sic Emotion Variation Detection (MEVD) prediction
support, including visualization with the same color
code used in the plot. A standalone desktop applica-
tion is also planned without the cross-user features.
in addition to implementing these upcoming features,
We plan to conduct in-depth user experience studies
to gain a more comprehensive understanding of the
system’s efficacy and user satisfaction.
Validation experiments on two recently proposed
datasets are provided alongside a thorough system de-
scription, relaying insights into the obtained results.
These are still below the categorical approach pre-
sented in Louro et al. (2024b) due to the already dis-
cussed semi-automatic AV mapping approach in Sec-
tion 3.4. Despite this, the predictions are a good start-
ing point to be further adjusted to the user’s percep-
tion.
Regarding the actual model, neither feature learn-
ing portion may be ideal for the problem at hand
since they were originally developed for a categori-
cal problem. Developing more suitable architectures
should thus be considered future work. Furthermore,
the data representations, especially the word embed-
dings, may also be further improved, considering that
the pre-trained model used is limited to a context win-
dow of 512 tokens.
To conclude, we believe the proposed app might
be useful for music listeners. Although there is room
for improvement (as the attained classification results
show), the personalization mechanism is a useful fea-
ture for handling prediction errors and subjectivity.
Finally, the personalization feature and the multi-user
environment have the potential to acquire quality user
annotations, leading to a future larger and more robust
MER dataset.
ACKNOWLEDGEMENTS
This work is funded by FCT - Foundation for Sci-
ence and Technology, I.P., within the scope of
the projects: MERGE - DOI: 10.54499/PTDC/CCI-
COM/3171/2021 financed with national funds (PID-
DAC) via the Portuguese State Budget; and project
CISUC - UID/CEC/00326/2020 with funds from the
European Social Fund, through the Regional Opera-
tional Program Centro 2020. Renato Panda was sup-
ported by Ci2 - FCT UIDP/05567/2020.
We thank all reviewers for their valuable sugges-
tions, which help to improve the article.
REFERENCES
Cardoso, L., Panda, R., and Paiva, R. P. (2011). Moodetec-
tor: A prototype software tool for mood-based playlist
generation. In Simp
´
osio de Inform
´
atica - INForum
2011, Coimbra, Portugal.
Delbouys, R., Hennequin, R., Piccoli, F., Royo-Letelier, J.,
and Moussallam, M. (2018). Music Mood Detection
Based On Audio And Lyrics With Deep Neural Net.
In Proceedings of the 19th International Society for
Music Information Retrieval Conference, pages 370–
375, Paris, France.
Hu, X., Downie, J. S., Laurier, C., Bay, M., and Ehmann,
A. F. (2008). The 2007 Mirex Audio Mood Classifi-
cation Task: Lessons Learned. In Proceedings of the
9th International Society for Music Information Re-
trieval Conference, pages 462–467, Drexel University,
Philadelphia, Pennsylvania, USA.
Louro, P. L., Redinho, H., Malheiro, R., Paiva, R. P., and
Panda, R. (2024a). A Comparison Study of Deep
Learning Methodologies for Music Emotion Recog-
nition. Sensors, 24(7):2201.
Louro, P. L., Redinho, H., Santos, R., Malheiro, R., Panda,
R., and Paiva, R. P. (2024b). MERGE – A Bimodal
Dataset for Static Music Emotion Recognition.
McFee, B., Raffel, C., Liang, D., Ellis, D., McVicar, M.,
Battenberg, E., and Nieto, O. (2015). Librosa: Audio
and Music Signal Analysis in Python. In Python in
Science Conference, pages 18–24, Austin, Texas.
O’Malley, T., Bursztein, E., Long, J., Chollet, F., Jin, H.,
Invernizzi, L., et al. (2019). Keras Tuner. https:
//github.com/keras-team/keras-tuner.
Panda, R., Malheiro, R., and Paiva, R. P. (2020). Novel
Audio Features for Music Emotion Recognition. IEEE
Transactions on Affective Computing, 11(4):614–626.
Pyrovolakis, K., Tzouveli, P., and Stamou, G. (2022).
Multi-Modal Song Mood Detection with Deep Learn-
ing. Sensors, 22(3):1065.
Russell, J. A. (1980). A circumplex model of affect. Journal
of Personality and Social Psychology, 39(6):1161–
1178.
Warriner, A. B., Kuperman, V., and Brysbaert, M.
(2013). Norms of valence, arousal, and dominance for
13,915 English lemmas. Behavior Research Methods,
45(4):1191–1207.
KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval
166