ACKNOWLEDGEMENTS
This work is funded by the European Regional Deve-
lopment Fund (ERDF) and the Free State of Saxony
under the grant number 100-241-945.
REFERENCES
Bodla, N., Singh, B., Chellappa, R., and Davis, L. S. (2017).
Soft-nms - improving object detection with one line of
code. In IEEE International Conference on Compu-
ter Vision, ICCV 2017, Venice, Italy, October 22-29,
2017, pages 5562–5570.
Cinaroglu, I. and Bastanlar, Y. (2014). A direct approach
for human detection with catadioptric omnidirectio-
nal cameras. In Signal Processing and Communicati-
ons Applications Conference (SIU), 2014 22nd, pages
2275–2279. IEEE.
Cohen, T. S., Geiger, M., Köhler, J., and Welling, M.
(2018). Spherical CNNs. In International Conference
on Learning Representations.
Dai, J., Li, Y., He, K., and Sun, J. (2016). R-FCN: object de-
tection via region-based fully convolutional networks.
In Advances in Neural Information Processing Sys-
tems 29: Annual Conference on Neural Information
Processing Systems 2016, December 5-10, 2016, Bar-
celona, Spain, pages 379–387.
del Blanco, C. R. and Carballeira, P. (2016). The piropo da-
tabase (people in indoor rooms with perspective and
omnidirectional cameras). https://sites.google.com/
site/piropodatabase/, unpublished dataset.
Demiröz, B. E.,
˙
Ismail Ari, Ero
˘
glu, O., Salah, A. A., and
Akarun, L. (2012). Feature-based tracking on a multi-
omnidirectional camera dataset. In 2012 5th Interna-
tional Symposium on Communications, Control and
Signal Processing, pages 1–5.
Everingham, M., Gool, L. J. V., Williams, C. K. I., Winn,
J. M., and Zisserman, A. (2010). The pascal visual
object classes (VOC) challenge. International Journal
of Computer Vision, 88(2):303–338.
Findeisen, M., Meinel, L., Heß, M., Apitzsch, A., and Hirtz,
G. (2013). A fast approach for omnidirectional sur-
veillance with multiple virtual perspective views. In
Proceedings of Eurocon 2013, International Confe-
rence on Computer as a Tool, Zagreb, Croatia, July
1-4, 2013, pages 1578–1585.
Girshick, R. B., Donahue, J., Darrell, T., and Malik, J.
(2013). Rich feature hierarchies for accurate ob-
ject detection and semantic segmentation. CoRR,
abs/1311.2524.
Hartley, R. and Zisserman, A. (2006). Multiple view geome-
try in computer vision (2. ed.). Cambridge University
Press.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep re-
sidual learning for image recognition. In 2016 IEEE
Conference on Computer Vision and Pattern Recog-
nition, CVPR 2016, Las Vegas, NV, USA, June 27-30,
2016, pages 770–778.
Krams, O. and Kiryati, N. (2017). People detection in top-
view fisheye imaging. In 2017 14th IEEE Internatio-
nal Conference on Advanced Video and Signal Based
Surveillance (AVSS), pages 1–6.
Krasin, I., Duerig, T., Alldrin, N., Ferrari, V., Abu-El-
Haija, S., Kuznetsova, A., Rom, H., Uijlings, J., Po-
pov, S., Veit, A., Belongie, S., Gomes, V., Gupta,
A., Sun, C., Chechik, G., Cai, D., Feng, Z., Naray-
anan, D., and Murphy, K. (2017). Openimages: A
public dataset for large-scale multi-label and multi-
class image classification. Dataset available from
https://github.com/openimages.
Lee, S., Sung, J., Yu, Y., and Kim, G. (2018). A memory
network approach for story-based temporal summari-
zation of 360
◦
videos. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recogni-
tion, pages 1410–1419.
Li, W., Wang, L., Li, W., Agustsson, E., and Gool, L. V.
(2017). Webvision database: Visual learning and un-
derstanding from web data. CoRR, abs/1708.02862.
Lin, T., Maire, M., Belongie, S. J., Hays, J., Perona, P.,
Ramanan, D., Dollár, P., and Zitnick, C. L. (2014).
Microsoft COCO: common objects in context. In
Computer Vision - ECCV 2014 - 13th European Con-
ference, Zurich, Switzerland, September 6-12, 2014,
Proceedings, Part V, pages 740–755.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S. E.,
Fu, C., and Berg, A. C. (2016). SSD: Single Shot Mul-
tiBox Detector, pages 21–37.
Redmon, J., Divvala, S. K., Girshick, R. B., and Farhadi, A.
(2016). You only look once: Unified, real-time object
detection. In 2016 IEEE Conference on Computer Vi-
sion and Pattern Recognition, CVPR 2016, Las Vegas,
NV, USA, June 27-30, 2016, pages 779–788.
Redmon, J. and Farhadi, A. (2017). YOLO9000: better, fas-
ter, stronger. In 2017 IEEE Conference on Computer
Vision and Pattern Recognition, CVPR 2017, Hono-
lulu, HI, USA, July 21-26, 2017, pages 6517–6525.
Ren, S., He, K., Girshick, R. B., and Sun, J. (2015). Fas-
ter R-CNN: towards real-time object detection with
region proposal networks. In Cortes, C., Lawrence,
N. D., Lee, D. D., Sugiyama, M., and Garnett, R.,
editors, Advances in Neural Information Processing
Systems 28: Annual Conference on Neural Informa-
tion Processing Systems 2015, December 7-12, 2015,
Montreal, Quebec, Canada, pages 91–99.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.,
Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bern-
stein, M. S., Berg, A. C., and Li, F. (2015). Imagenet
large scale visual recognition challenge. International
Journal of Computer Vision, 115(3):211–252.
Szeliski, R. (2010). Computer Vision: Algorithms and
Applications. Springer-Verlag New York, Inc., New
York, NY, USA, 1st edition.
Improved Person Detection on Omnidirectional Images with Non-maxima Supression
481