Figure 5: Efficiency of proposed correction step. The red
blobs in image (a) present the inputs of the correction step
and the color ones in image (b) are the outputs.
Table 4: Description of running optimized detectors on the
Jetson TX2 board.
Detector Running time
YOLO V3 (Baseline) 2.5 FPS
Optimized YOLO V3 9 FPS
Tiny YOLO V2 (Baseline) 17 FPS
Optimized Tiny YOLO V2 23 FPS
the application of our optimization step. The first
column of Table 3 represents the DCNN detector
specifications before and after the optimization step.
The second column shows the DCNN detector
running speed in FPS. The results show that the
optimization step based on calibrating the parameters
of DCNN models significantly improves the running
speed of the DCNN detector.
For the hardware components, we used the Jetson
Tegra TX2 embedded platform. The latter is a
recent technology developed by NVIDIA. This device
delivers the performance of the NVIDIA Maxwell
architecture with 256 CUDA cores delivering more
than 1 Tera FLOP performance, 64-bit processors and
a 5 mega pixel camera. To run our adapted DCNN
detector on the NVIDIA Jetson TX2 device, we use
the Darknet deep learning framework with the C ++
programming language.
In Table 4, we summarize the running speed of
our adapted and optimized detectors on the NVIDIA
Jetson TX2. According to Table 4, we test various
adapted and optimized DCNN detector compared to
the baseline ones on our embedded system with the
NVIDIA Jetson TX2, we chose the optimized adapted
Tiny YOLO V2 detector to obtain an embedded
system which can be run in real-time.
Our approach of detector adaptation towards a
specific scene and an embedded platform, allow to
provide an embedded system with a good detection
performance and which can work in real time for an
autonomous vehicle.
5 CONCLUSION
This article introduces an approach for pedestrian
detection that consists first of all in proposing a new
domain adaptation technique from a DCNN detector
to a specific scene by adapting a generic detector to
an urban traffic scene without labeling. The method
gave good results on real world-scenarios compared
to the generic DCNN detector in term of mAP and
running time. Furthermore, the proposed approach
presents the first domain adaptation approach which
apply to adapt deep detector for mobile cameras.
The experiment results obtained on an Jetson TX2
embedded platform have shown that adapted detector
presents very interesting performance in term of
real-time running time and the future works include
the extension to other types of detectors as the
proposed approach is generic and flexible.
ACKNOWLEDGEMENTS
This work is part of a master 2 internship in
Artificial Perception and Robotics at Clermont
Auvergne University (France). It is sponsored
by the French government research program
”Investissements d’avenir” through the IMobS3
Laboratory of Excellence (ANR-10-LABX-16-01),
by the European Union through the program Regional
competitiveness and employment 2007-2013 (ERDF
- Auvergne region), and by the Auvergne region.
REFERENCES
Duan, L., Tsang, I. W., Xu, D., and Maybank, S. J. (2009).
Domain transfer svm for video concept detection. In
CVPR, pages 1375–1381. IEEE.
Htike, K. K. and Hogg, D. C. (2014). Efficient non-iterative
domain adaptation of pedestrian detectors to video
scenes. In 2014 22nd International Conference on
Pattern Recognition (ICPR), pages 654–659. IEEE.
Kuhn, H. W. (2005). The hungarian method for the
assignment problem. Naval Research Logistics (NRL),
52(1):7–21.
Li, J., Liang, X., Shen, S., Xu, T., Feng, J., and Yan,
S. (2018). Scale-aware fast r-cnn for pedestrian
detection. IEEE Transactions on Multimedia,
20(4):985–996.
Li, X., Ye, M., Fu, M., Xu, P., and Li, T. (2015). Domain
adaption of vehicle detector based on convolutional
neural networks. International Journal of Control,
Automation and Systems, pages 1020–1031.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona,
P., Ramanan, D., Doll
´
ar, P., and Zitnick, C. L.
(2014). Microsoft coco: Common objects in context.
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
326