Hu, R., Hang, B., Ma, Y., and Dong, S. (2010). A bottom-up
audio att ention model for surveillance. In 2010 IEEE
International Conference on Multimedia and Expo.
IEEE.
Itti, L., Koch, C., and Niebur, E . (1998). A model of
saliency-based visual attention for rapid scene analy-
sis. IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence, 20(11):1254–1259.
Kalinli, O. and Narayanan, S. (2008). A top-down auditory
attention model for learning task dependent influences
on prominence detection in speech. In 2008 IEEE In-
ternational Conference on A coustics, Speech and Sig-
nal Processing. IEEE.
Kaya, E. M. and Elhilali, M. (2012). A temporal saliency
map for modeling auditory attention. In 2012 46th An-
nual Conference on Information Sciences and Systems
(CISS). IEEE .
Kayser, C., Petkov, C. I., Lippert, M., and Logothetis, N. K.
(2005). Mechanisms for allocating auditory atten-
tion: An auditory saliency map. Current Biology,
15(21):1943–1947.
Knudsen, E. I. (2007). Fundamental components of atten-
tion. Annual Review of Neuroscience, 30(1):57–78.
Li, J., D ai, W., Metze, F., Qu, S., and Das, S. (2017). A
comparison of deep learning methods for environmen-
tal sound detection. In 2017 IEEE International Con-
ference on Acoustics, Speech and Signal Processing
(ICASSP). IEEE.
Li, X., Tao, D., Maybank, S. J., and Yuan, Y. (2008). Visual
music and musical vision. Neurocomputing, 71(10-
12):2023–2028.
Marchi, E., Vesperini, F., Squartini, S., and Schuller, B.
(2017). Deep recurrent neural network-based autoen-
coders for acoustic novel ty detection. Computational
Intelligence and Neuroscience, 2017:1–14.
Mesaros, A., Heittola, T., Eronen, A., and Vi rtanen, T.
(2010). Acoustic event detection in real life recor-
dings. In Signal Processing Conference, 2010 18th
European, pages 1267–1271. IEEE.
Mohamed, A.-R., Dahl, G. E., and Hinton, G. (2012).
Acoustic modeling using deep belief networks. IEEE
Transactions on Audio, Speech, and Language Pro-
cessing, 20(1):14–22.
Mosadeghzad, M., Rea, F., Tata, M. S., Brayda, L., and San-
dini, G. (2015). Saliency based sensor fusion of bro-
adband sound localizer for humanoids. In 2015 IEEE
International Conference on Multisensor Fusion and
Integration for Intelligent Systems (MFI ) . IEEE.
Parascandolo, G., Huttunen, H., and Virtanen, T. (2016).
Recurrent neural networks for polyphonic sound event
detection in r eal life recordings. In 2016 IEEE Inter-
national Conference on Acoustics, Speech and Signal
Processing (ICASSP). IEEE.
Patcha, A. and Park, J.-M. (2007). An overview of ano-
maly detection techniques: Existing solutions and
latest technological trends. Computer Networks,
51(12):3448–3470.
Piczak, K. J. (2015). Environmental sound classification
with convolutional neural networks. In 2015 IEEE
25th International Workshop on Machine Learning for
Signal Processing ( MLSP). IEEE.
Principi, E., Squartini, S., Bonfigli, R., Ferroni, G., and
Piazza, F. (2015). An integrated system for voice
command recognition and emergency detection based
on audio signals. Expert Systems w ith A pplications,
42(13):5668–5683.
Sainath, T. N., Weiss, R. J., Senior, A., Wilson, K. W.,
and Vinyals, O. (2015). Learning the speech f ront-
end with raw waveform cldnns. In Sixteenth Annual
Conference of the International Speech Communica-
tion Association.
Salamon, J. and Bello, J. P. (2017). Deep convolutional neu-
ral networks and data augmentation for environmental
sound classification. IEEE Signal Processing Letters,
24(3):279–283.
Schauerte, B. and Stiefelhagen, R. ( 2013). ”wow!” bayesian
surprise for salient acoustic event detection. In 2013
IEEE International Conference on Acoustics, Speech
and Signal Processing. IEEE.
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recognition.
arXiv preprint arXiv:1409.1556.
Surace, C. and Worden, K. (2010). Novelty detection in
a changing environment: A negative selection ap-
proach. Mechanical Systems and Signal Processing,
24(4):1114–1128.
Takahashi, N., Gygli, M., and Gool, L. V. (2018). AENet:
Learning deep audio features for video analysis. IEEE
Transactions on Multimedia, 20(3):513–524.
Wang, J., Zhang, K., Madani, K., and Sabourin, C. (2015).
Salient environmental sound detection framework for
machine awareness. Neurocomputing, 152:444–454.
Zwicker, E. and Fastl, H. (2013). Psychoacoustics: Facts
and models, volume 22. Springer Sci ence & Business
Media.