together. The bounding boxes for these objects were
always chosen very similarly, which leads to this clo-
sely spaced localization pattern. In contrast, the boun-
ding boxes for the cardboard boxes often had different
sizes. This results in the wider spread of the localiza-
tions. In the bottom maps of Figure 7 also some false
measurements are present. These measurements are
mostly caused by the SO algorithm, due to false lo-
calizations of other objects in the room. We reach an
average detection rate of approximately 7.5 Hz, which
is sufficient for a real time application.
8 CONCLUSION AND FUTURE
WORK
In this contribution, we presented an infrastructural
stereo camera system for real-time object detection,
classification and localization. By using three diffe-
rent complementary approaches for object detection,
the system has the ability to detect almost every object
in its field of view. With our proposed fusion algo-
rithm for bounding boxes we improved the detection
and classification results, as presented in the evalua-
tion. The localization approach based on the stereo
camera shows satisfying results. However, further re-
search concerning the localization of the whole ob-
ject, not only a point, is of high interest. In the fu-
ture, we plan to improve the detections by training
the CNN with images captured by the system. Furt-
hermore, improvements concerning the algorithm for
salient object detection are planned. Additionally, we
will expand the system by a tracking algorithm.
REFERENCES
Einsiedler, J., Becker, D., and Radusch, I. (2014). Ex-
ternal visual positioning system for enclosed car-
parks. In Positioning, Navigation and Communication
(WPNC), 2014 11th Workshop on, pages 1–6. IEEE.
Einsiedler, J., Radusch, I., and Wolter, K. (2017). Vehicle
indoor positioning: A survey. In Positioning, Naviga-
tion and Communications (WPNC), 2017 14th Works-
hop on, pages 1–6. IEEE.
Everingham, M., Van Gool, L., Williams, C. K., Winn, J.,
and Zisserman, A. (2010). The pascal visual object
classes (voc) challenge. International journal of com-
puter vision, 88(2):303–338.
Felzenszwalb, P. F. and Huttenlocher, D. P. (2004). Effi-
cient graph-based image segmentation. International
journal of computer vision, 59(2):167–181.
Feng, D., Barnes, N., You, S., and McCarthy, C. (2016).
Local background enclosure for rgb-d salient object
detection. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages
2343–2350.
Hartley, R. and Zisserman, A. (2003). Multiple view geome-
try in computer vision. Cambridge university press.
Ibisch, A., Houben, S., Michael, M., Kesten, R., and
Schuller, F. (2015). Arbitrary object localization and
tracking via multiple-camera surveillance system em-
bedded in a parking garage. In Video Surveillance
and Transportation Imaging Applications 2015, vo-
lume 9407, page 94070G. International Society for
Optics and Photonics.
Ibisch, A., Houben, S., Schlipsing, M., Kesten, R., Reim-
che, P., Schuller, F., and Altinger, H. (2014). Towards
highly automated driving in a parking garage: General
object localization and tracking using an environment-
embedded camera system. In Intelligent Vehicles
Symposium Proceedings, 2014 IEEE, pages 426–431.
IEEE.
Ju, R., Ge, L., Geng, W., Ren, T., and Wu, G. (2014). Depth
saliency based on anisotropic center-surround diffe-
rence. In Image Processing (ICIP), 2014 IEEE In-
ternational Conference on, pages 1115–1119. IEEE.
Kaehler, A. and Bradski, G. (2016). Learning OpenCV
3: computer vision in C++ with the OpenCV library.
O’Reilly Media, Inc.
Kumar, A. K. T. R., Sch
¨
aufele, B., Becker, D., Sawade, O.,
and Radusch, I. (2016). Indoor localization of vehicles
using deep learning. In World of Wireless, Mobile and
Multimedia Networks (WoWMoM), 2016 IEEE 17th
International Symposium on A, pages 1–6. IEEE.
Lempitsky, V. S., Kohli, P., Rother, C., and Sharp, T. (2009).
Image segmentation with a bounding box prior. In
ICCV, pages 277–284. Citeseer.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.
(2016). You only look once: Unified, real-time object
detection. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 779–
788.
Redmon, J. and Farhadi, A. (2017). Yolo9000: better, faster,
stronger. arXiv preprint.
Zhang, Z. (2000). A flexible new technique for camera ca-
libration. IEEE Transactions on pattern analysis and
machine intelligence, 22(11):1330–1334.
Zitnick, C. L. and Doll
´
ar, P. (2014). Edge boxes: Locating
object proposals from edges. In European conference
on computer vision, pages 391–405. Springer.
Zivkovic, Z. (2004). Improved adaptive gaussian mixture
model for background subtraction. In Pattern Recog-
nition, 2004. ICPR 2004. Proceedings of the 17th In-
ternational Conference on, volume 2, pages 28–31.
IEEE.
Zivkovic, Z. and Van Der Heijden, F. (2006). Efficient adap-
tive density estimation per image pixel for the task of
background subtraction. Pattern recognition letters,
27(7):773–780.
Object Detection, Classification and Localization by Infrastructural Stereo Cameras
815