loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Author: Avi Bleiweiss

Affiliation: BShalem Research, Sunnyvale, U.S.A.

Keyword(s): Image Captioning, Audio Spectrogram, Neural Networks, Long Short-term Memory, Beam Search.

Abstract: Finding the name of a song from a piece played without the lyrics remains a long-standing challenge to music recognition services. In this work, we propose the use of a neural architecture that combines deep learned image features and sequence modeling to automate the task of predicting the song title from an audio time series. To feed our network with a visual representation, we transform the sound signal into a two-dimensional spectrogram. Our novelty lies in model training on the state-of-the-art Conceptual Captions dataset to generate image descriptions, jointly with inference on the Million Song and Free Music Archive test sets to produce song titles. We present extensive quantitative analysis of our experiments and show that using k-beam search our model achieved an out-domain BLEU score of 45.1 compared to in-domain performance of 61.3.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.218.95.236

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Bleiweiss, A. (2020). Predicting a Song Title from Audio Embeddings on a Pretrained Image-captioning Network. In Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART; ISBN 978-989-758-395-7; ISSN 2184-433X, SciTePress, pages 483-493. DOI: 10.5220/0008939004830493

@conference{icaart20,
author={Avi Bleiweiss.},
title={Predicting a Song Title from Audio Embeddings on a Pretrained Image-captioning Network},
booktitle={Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART},
year={2020},
pages={483-493},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008939004830493},
isbn={978-989-758-395-7},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART
TI - Predicting a Song Title from Audio Embeddings on a Pretrained Image-captioning Network
SN - 978-989-758-395-7
IS - 2184-433X
AU - Bleiweiss, A.
PY - 2020
SP - 483
EP - 493
DO - 10.5220/0008939004830493
PB - SciTePress