We can only guess towards the actual reasons, but
we are convinced that one of the issues could well be
the sampling of the actual eye tracker video material,
which contains huge amounts of motion blur, espe-
cially when people tend to move their head fast, for a
quick look down the isle. Better eye-tracker hardware
could be a possible solution here, or replacing the
validation data with only clear eye-tracker data, re-
moving the actual motion blurred evaluation frames.
We also reckon our model will never be able to detect
motion blurred advertisement boards, since we never
used them as actual training data. Adding them to the
actual training data might also improve the general-
ization capabilities of the deep learned model.
A challenge for the package detection and classifi-
cation case exists in expanding the multi-class model.
For now our model still needs an overnight training
step. For most applications this is feasible, but there
are still applications where this is not feasible at all.
We should thus investigate how we can optimally ex-
pand existing models with an extra class, at a minimal
processing cost. This research field is called incre-
mental learning and already has several applicational
fields, like object detection in video sequences, as de-
scribed in (Kuznetsova et al., 2015).
Finally every augmented training sample is ran-
domly flipped around the vertical axis in the Dark-
net data augmentation pipeline. While in most cases
this helps to build robustness inside the detector, there
are cases where this can worsens the actual model.
We therefore suggest training models without these
random flips, and comparing them to our already ob-
tained results, to investigate their actual influence.
ACKNOWLEDGEMENTS
This work is supported by the KU Leuven, Cam-
pus De Nayer and the Flanders Innovation & En-
trepreneurship (AIO).
REFERENCES
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z.,
Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin,
M., et al. (2016). Tensorflow: Large-scale machine
learning on heterogeneous distributed systems. arXiv
preprint arXiv:1603.04467.
Bengio, Y. (2012). Deep learning of representations for
unsupervised and transfer learning. In Proceedings
of ICML Workshop on Unsupervised and Transfer
Learning, pages 17–36.
Bradski, G. et al. (2000). The opencv library. Doctor Dobbs
Journal, 25(11):120–126.
Dauphin, G. M. Y., Glorot, X., Rifai, S., Bengio, Y., Good-
fellow, I., Lavoie, E., Muller, X., Desjardins, G.,
Warde-Farley, D., Vincent, P., et al. (2012). Unsuper-
vised and transfer learning challenge: a deep learning
approach. In Proceedings of ICML Workshop on Un-
supervised and Transfer Learning, pages 97–110.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei,
L. (2009). Imagenet: A large-scale hierarchical image
database. In Computer Vision and Pattern Recogni-
tion, 2009. CVPR 2009. IEEE Conference on, pages
248–255. IEEE.
Gan, Z., Henao, R., Carlson, D., and Carin, L. (2015).
Learning deep sigmoid belief networks with data aug-
mentation. In Artificial Intelligence and Statistics,
pages 268–276.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J.,
Girshick, R., Guadarrama, S., and Darrell, T. (2014).
Caffe: Convolutional architecture for fast feature em-
bedding. In Proceedings of the 22nd ACM inter-
national conference on Multimedia, pages 675–678.
ACM.
Kuznetsova, A., Ju Hwang, S., Rosenhahn, B., and Sigal,
L. (2015). Expanding object detector’s horizon: in-
cremental learning framework for object detection in
videos. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pages 28–
36.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
Fu, C.-Y., and Berg, A. C. (2016). Ssd: Single shot
multibox detector. In European conference on com-
puter vision, pages 21–37. Springer.
Redmon, J. (2013–2016). Darknet: Open source neural net-
works in c. http://pjreddie.com/darknet/.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.
(2016). You only look once: Unified, real-time object
detection. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pages 779–
788.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster
r-cnn: Towards real-time object detection with region
proposal networks. In Advances in neural information
processing systems, pages 91–99.
Yosinski, J., Clune, J., Bengio, Y., and Lipson, H. (2014).
How transferable are features in deep neural net-
works? In Advances in neural information processing
systems, pages 3320–3328.
Building Robust Industrial Applicable Object Detection Models using Transfer Learning and Single Pass Deep Learning Architectures
217