ensures privacy of the individuals being recorded. We
evaluated different downscaled resolutions to deter-
mine the optimal trade-off between resolution and de-
tection accuracy.
To avoid the need for time-consuming and ex-
pensive manual annotations we proposed an appro-
ach that is able to automatically generate new trai-
ning data for the low resolution networks, based on
the high resolution input images. The validity of this
approach was proven when comparing with true ma-
nual annotations.
Extensive accuracy experiments were performed.
On test sets based on known scenes, the models sho-
wed an acceptable performance for all resolutions.
When tested on a similar scene with unseen data an
evident declining performance with lower resolution
is witnessed. However, because of the proposed auto-
matic annotation pipeline it remains easily possible
to add additional training data for each scene. In-
deed, a sensor that is newly installed in a certain room
can easily acquire during the first hours some high-
resolution footage, with which a room-specific low-
resolution detector can quickly be (transfer) learned.
Based upon our results we conclude that, despite
the extremely low input resolution of our lowest-
resolution model (96×96px), our YOLOv2-based de-
tection pipeline is still able to efficiently detect per-
sons, even though they are not recognisable by human
beings. Our framework thus is able to serve as an ef-
ficient occupancy detection system.
Furthermore, the low input resolution allows for a
lightweight network which thus is easily implementa-
ble on embedded systems while still maintaining high
processing speeds.
Although the current approach is suitable to be
used by the industry as is, we believe that we have
not yet reached the extreme lower limit and deem it
possible to decrease even further in resolution.
ACKNOWLEDGEMENT
This work is partially supported by the VLAIO via the
Start to Deep Learn project.
REFERENCES
Ba, J. and Caruana, R. (2014). Do deep nets really need to
be deep? In Advances in neural information proces-
sing systems, pages 2654–2662.
Butler, D. J., Huang, J., Roesner, F., and Cakmak, M.
(2015). The privacy-utility tradeoff for remotely te-
leoperated robots. In ACM/IEEE International Con-
ference on Human-Robot Interaction, pages 27–34.
ACM.
Cao, Z., Simon, T., Wei, S.-E., and Sheikh, Y. (2017). Re-
altime multi-person 2d pose estimation using part af-
finity fields. In CVPR.
Chen, J., Wu, J., Konrad, J., and Ishwar, P. (2017). Semi-
coupled two-stream fusion convnets for action recog-
nition at extremely low resolutions. In WACV, 2017
IEEE Winter Conference on, pages 139–147. IEEE.
Dai, J., Li, Y., He, K., and Sun, J. (2016). R-fcn: Object de-
tection via region-based fully convolutional networks.
In Advances in neural information processing sys-
tems, pages 379–387.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resi-
dual learning for image recognition. In CVPR, pages
770–778.
Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling
the knowledge in a neural network. arXiv preprint
arXiv:1503.02531.
Lee, S. H., Kim, D. H., and Song, B. C. (2018). Self-
supervised knowledge distillation using singular value
decomposition. In ECCV, pages 339–354. Springer.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,
Ramanan, D., Doll
´
ar, P., and Zitnick, C. L. (2014).
Microsoft coco: Common objects in context. In Euro-
pean conference on computer vision, pages 740–755.
Springer.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
Fu, C.-Y., and Berg, A. C. (2016). Ssd: Single shot
multibox detector. In ECCV, pages 21–37. Springer.
Mesnil, G., Dauphin, Y., Glorot, X., Rifai, S., Bengio, Y.,
Goodfellow, I., Lavoie, E., Muller, X., Desjardins, G.,
Warde-Farley, D., et al. (2011). Unsupervised and
transfer learning challenge: a deep learning approach.
In Proceedings of the 2011 International Conference
on Unsupervised and Transfer Learning workshop-
Volume 27, pages 97–111. JMLR. org.
Neubeck, A. and Van Gool, L. (2006). Efficient non-
maximum suppression. In ICPR 2006, volume 3, pa-
ges 850–855. IEEE.
Redmon, J. and Farhadi, A. (2017). Yolo9000: better, faster,
stronger. arXiv preprint.
Ryoo, M. S., Rothrock, B., Fleming, C., and Yang, H. J.
(2017). Privacy-preserving human activity recogni-
tion from extreme low resolution. In AAAI, pages
4255–4262.
Wei, S.-E., Ramakrishna, V., Kanade, T., and Sheikh, Y.
(2016). Convolutional pose machines. In CVPR.
Wong, W. K., ShenPua, W., Loo, C. K., and Lim, W. S.
(2011). A study of different unwarping methods for
omnidirectional imaging. In ICSIPA, 2011, pages
433–438. IEEE.
How Low Can You Go? Privacy-preserving People Detection with an Omni-directional Camera
637