Predicting a Song Title from Audio Embeddings on a Pretrained Image-captioning Network
Avi Bleiweiss
2020
Abstract
Finding the name of a song from a piece played without the lyrics remains a long-standing challenge to music recognition services. In this work, we propose the use of a neural architecture that combines deep learned image features and sequence modeling to automate the task of predicting the song title from an audio time series. To feed our network with a visual representation, we transform the sound signal into a two-dimensional spectrogram. Our novelty lies in model training on the state-of-the-art Conceptual Captions dataset to generate image descriptions, jointly with inference on the Million Song and Free Music Archive test sets to produce song titles. We present extensive quantitative analysis of our experiments and show that using k-beam search our model achieved an out-domain BLEU score of 45.1 compared to in-domain performance of 61.3.
DownloadPaper Citation
in Harvard Style
Bleiweiss A. (2020). Predicting a Song Title from Audio Embeddings on a Pretrained Image-captioning Network. In Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-395-7, pages 483-493. DOI: 10.5220/0008939004830493
in Bibtex Style
@conference{icaart20,
author={Avi Bleiweiss},
title={Predicting a Song Title from Audio Embeddings on a Pretrained Image-captioning Network},
booktitle={Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2020},
pages={483-493},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008939004830493},
isbn={978-989-758-395-7},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Predicting a Song Title from Audio Embeddings on a Pretrained Image-captioning Network
SN - 978-989-758-395-7
AU - Bleiweiss A.
PY - 2020
SP - 483
EP - 493
DO - 10.5220/0008939004830493