REFERENCES
Berenzweig, A. L., & Ellis, D. P. (2001). Locating singing
voice segments within music signals. In Proceedings of
the 2001 IEEE Workshop on the Applications of Signal
Processing to Audio and Acoustics (WASPAA), pp.
119–122. IEEE.
Gupta, C., Yilmaz, E., & Li, H. (2020). Automatic lyrics
alignment and transcription in polyphonic music: does
background music help? In ICASSP 2020 - 2020 IEEE
International Conference on Acoustics, Speech and
Signal Processing (ICASSP), pp. 496–500. IEEE.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual
learning for image recognition. In Proceedings of the
IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pp. 770–778. IEEE.
Hsieh, T., Cheng, K., Fan, Z., Yang, Y., & Yang, Y. (2020).
Addressing the confounds of accompaniments in singer
identification. In ICASSP 2020-2020 IEEE
International Conference on Acoustics, Speech and
Signal Processing (ICASSP), pp. 1–5. IEEE.
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation
networks. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), pp.
7132–7141. IEEE.
Huang, H. M., Chen, W. K., Liu, C. H., & You, S. D.
(2018). Singing voice detection based on convolutional
neural networks. In 2018 7
th
International Symposium
on Next Generation Electronics (ISNE), pp. 1–4. IEEE.
Krause, M., Müller, M., & Weiß, C. (2021). Singing voice
detection in opera recordings: a case study on
robustness and generalization. Electronics, 10(10),
1214. MDPI.
Lee, K., Choi, K., & Nam, J. (2018). Revisiting singing
voice detection: a quantitative review and the future
outlook. In Proceedings of the 19
th
International
Society for Music Information Retrieval Conference
(ISMIR), pp. 506–513. Elsevier.
Leglaive, S., Hennequin, R., & Badeau, R. (2015). Singing
voice detection with deep recurrent neural networks. In
IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), pp. 121–125. IEEE.
Lehner, B., Schlüter, J., & Widmer, G. (2018). Online,
loudness-invariant vocal detection in mixed music
signals. In IEEE/ACM Transactions on Audio, Speech,
and Language Processing, 26(8), pp. 1369 – 1380.
IEEE.
Lin, L., Kong, Q., Jiang, J., & Xia, G. (2021). A unified
model for zero-shot music source separation,
transcription and synthesis. In Proceedings of 22
nd
International Conference on Music Information
Retrieval (ISMIR). Elsevier.
Nobutaka, O., Kenichi, M., Jonathan, L. R., Hirokazu, K.,
& Shigeki, S. (2008). Separation of a monaural audio
signal into harmonic/percussive components by
complementary diffusion on spectrogram. In 16
th
European Signal Processing Conference (EUSIPCO),
pp. 1–4. IEEE.
Rocamora, M., & Herrera, P. (2007). Comparing audio
descriptors for singing voice detection in music audio
files. In Proceedings of 11
th
Brazilian Symposium on
Computer Music, pp. 187–196.
Schlüter, J. (2016). Learning to pinpoint singing voice from
weakly labeled examples. In International Society for
Music Information Retrieval (ISMIR), pp. 44 – 50.
Elsevier.
Schlüter, J., & Grill, T. (2015). Exploring data
augmentation for improved singing voice detection
with neural networks. In International Society for
Music Information Retrieval (ISMIR), pp.121 – 126.
Elsevier.
Simonyan, K., & Zisserman, A. (2014). Very deep
convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1704.02216.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,
Anguelov, D., Erhan, D., Vanhoucke, V., &
Rabinovich, A. (2015). Going deeper with
convolutions. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR),
pp. 1–9. IEEE.
Voigtlaender, P., Krause, M., Osep, A., Luiten, J., Sekar, B.
B. G., Geiger, A., & Leibe, B. (2019). MOTS: Multi-
object tracking and segmentation. In Proceedings of the
IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), pp. 7942–7951. IEEE.
Zeiler, M. D., & Fergus, R. (2014). Visualizing and
understanding convolutional networks. In European
Conference on Computer Vision (ECCV), pp. 818–833.
Springer.
Zhao, S., Li, Q., He, T., & Wen, J. (2022). A step-by-step
gradient penalty with similarity calculation for text
summary generation. Neural Processing Letters, 1–16.
Zhao, S., Liang, Z., Wen, J., & Chen, J. (2022). Sparsing
and smoothing for the seq2seq Models. IEEE
Transactions on Artificial Intelligence, 1–10.
Zhao, S., Zhang, T., Hu, M., Chang, W., & You, F. (2022).
AP-BERT: Enhanced pre-trained model through
average pooling. Applied Intelligence, 52(14), 15929–
15937.