REFERENCES
Abeßer, J. (2020). A review of deep learning based methods
for acoustic scene classification. Applied Sciences,
10(6).
Amiriparian, S., Gerczuk, M., Ottl, S., Cummins, N., Fre-
itag, M., Pugachevskiy, S., Baird, A., and Schuller,
B. W. (2017). Snore sound classification using image-
based deep spectrum features. In INTERSPEECH,
volume 434, pages 3512–3516.
Amiriparian, S., Schmitt, M., Ottl, S., Gerczuk, M., and
Schuller, B. (im Druck / in print). Deep unsupervised
representation learning for audio-based medical appli-
cations. In Nanni, L., Brahnam, S., Ghidoni, S., Brat-
tin, R., and Jain, L., editors, Deep Learners and Deep
Learner Descriptors for Medical Applications.
Clevert, D.-A., Unterthiner, T., and Hochreiter, S. (2016).
Fast and accurate deep network learning by exponen-
tial linear units (elus). arxiv 2015. arXiv preprint
arXiv:1511.07289.
Cummins, N., Amiriparian, S., Hagerer, G., Batliner, A.,
Steidl, S., and Schuller, B. W. (2017). An image-based
deep spectrum feature representation for the recogni-
tion of emotional speech. In Proceedings of the 25th
ACM international conference on Multimedia, pages
478–484.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). ImageNet: A Large-Scale Hierarchical
Image Database. In CVPR09.
Duman, T. B., Bayram, B., and
˙
Ince, G. (2019). Acoustic
anomaly detection using convolutional autoencoders
in industrial processes. In International Workshop on
Soft Computing Models in Industrial and Environmen-
tal Applications, pages 432–442. Springer.
Fonseca, E., Plakal, M., Font, F., Ellis, D. P. W., and Serra,
X. (2019). Audio tagging with noisy labels and min-
imal supervision. In Submitted to DCASE2019 Work-
shop, NY, USA.
Grollmisch, S., Abeβer, J., Liebetrau, J., and Lukashe-
vich, H. (2019). Sounding industry: Challenges and
datasets for industrial sound analysis. In 2019 27th
European Signal Processing Conference (EUSIPCO),
pages 1–5. IEEE.
Gwardys, G. and Grzywczak, D. (2014). Deep image
features in music information retrieval. Interna-
tional Journal of Electronics and Telecommunica-
tions, 60(4):321–326.
Hayashi, T., Komatsu, T., Kondo, R., Toda, T., and Takeda,
K. (2018). Anomalous sound event detection based on
wavenet. In 2018 26th European Signal Processing
Conference (EUSIPCO), pages 2494–2498. IEEE.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D.,
Wang, W., Weyand, T., Andreetto, M., and Adam,
H. (2017). Mobilenets: Efficient convolutional neu-
ral networks for mobile vision applications. arXiv
preprint arXiv:1704.04861.
Huang, G., Liu, Z., Weinberger, K., and van der Maaten, L.
(2016). Densely connected convolutional networks.
arxiv 2017. arXiv preprint arXiv:1608.06993.
Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K.,
Dally, W. J., and Keutzer, K. (2016). Squeezenet:
Alexnet-level accuracy with 50x fewer parameters
and¡ 0.5 mb model size, 2016. arXiv preprint
arXiv:1602.07360, 1(10).
Ioffe, S. and Szegedy, C. (2015). Batch normalization: Ac-
celerating deep network training by reducing internal
covariate shift. In International Conference on Ma-
chine Learning, pages 448–456.
Jiang, Y., Li, C., Li, N., Feng, T., and Liu, M. (2018).
Haasd: A dataset of household appliances abnormal
sound detection. In Proceedings of the 2018 2nd In-
ternational Conference on Computer Science and Ar-
tificial Intelligence, CSAI ’18, page 6–10, New York,
NY, USA. Association for Computing Machinery.
Kawaguchi, Y., Tanabe, R., Endo, T., Ichige, K., and
Hamada, K. (2019). Anomaly detection based on an
ensemble of dereverberation and anomalous sound ex-
traction. In ICASSP 2019 - 2019 IEEE International
Conference on Acoustics, Speech and Signal Process-
ing (ICASSP), pages 865–869.
Koizumi, Y., Saito, S., Uematsu, H., and Harada, N. (2017).
Optimizing acoustic feature extractor for anomalous
sound detection based on neyman-pearson lemma. In
2017 25th European Signal Processing Conference
(EUSIPCO), pages 698–702. IEEE.
Koizumi, Y., Saito, S., Uematsu, H., Harada, N., and
Imoto, K. (2019). Toyadmos: A dataset of miniature-
machine operating sounds for anomalous sound de-
tection. In 2019 IEEE Workshop on Applications of
Signal Processing to Audio and Acoustics (WASPAA),
pages 313–317. IEEE.
Koizumi, Y., Saito, S., Yamaguchi, M., Murata, S., and
Harada, N. (2019). Batch uniformization for minimiz-
ing maximum anomaly score of dnn-based anomaly
detection in sounds. In 2019 IEEE Workshop on Ap-
plications of Signal Processing to Audio and Acoustics
(WASPAA), pages 6–10.
Kornblith, S., Shlens, J., and Le, Q. V. (2019). Do better
imagenet models transfer better? In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 2661–2671.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
agenet classification with deep convolutional neural
networks. In Advances in neural information process-
ing systems, pages 1097–1105.
Liu, F. T., Ting, K. M., and Zhou, Z. (2008). Isolation for-
est. In 2008 Eighth IEEE International Conference on
Data Mining, pages 413–422.
Marchi, E., Vesperini, F., Eyben, F., Squartini, S., and
Schuller, B. (2015). A novel approach for automatic
acoustic novelty detection using a denoising autoen-
coder with bidirectional lstm neural networks. In 2015
IEEE international conference on acoustics, speech
and signal processing (ICASSP), pages 1996–2000.
IEEE.
Meire, M. and Karsmakers, P. (2019). Comparison of deep
autoencoder architectures for real-time acoustic based
Acoustic Anomaly Detection for Machine Sounds based on Image Transfer Learning
55