It was observed that rejecting invalid pixels within
stixels occasionally results in small holes in the regis-
tered images at locations where no disparity estimate
is available. Although such holes can mostly be avoi-
ded by using a morphological-closing filter prior to
rejecting the pixels, some holes may persist. Howe-
ver, the downside of having small holes in the registe-
red image did not outweigh the benefit of having a cle-
aner texture projection. We plan to look into guided-
image filtering to prevent such holes and further refine
the stixel boundaries in future work.
The current disparity estimation, which is outside
the scope of this work, is noisy and has a very limited
sub-pixel resolution. We hypothesize that a more ex-
pensive disparity estimation algorithm, increased ba-
seline or zoom-lenses will improve the depth accu-
racy, which in turn will extend the operational range
of the proposed 3D model.
7 CONCLUSION
We have introduced a diorama-box model for aligning
images acquired from a moving vehicle. The propo-
sed model extends the non-linear ground surface mo-
del (van de Wouw et al., 2016) with a model of the
3D objects in the scene. For this purpose, the Stixel
World algorithm is used to segment the scene into
super-pixels, which are projected to 3D to form an
obstacle model. The consistency of the stixel-based
model is improved by assigning a slanting orienta-
tion to each 3D stixel and by interpolating between
the stixels to fill gaps in the 3D model. Conse-
quently, registration accuracy is increased by 6%. As
a further improvement of the algorithm, background
pixels contained in object-related stixels are removed
by checking their consistency with the stixel-slanting
orientation. This improvement prevents ghosting ef-
fects, due to falsely projected background pixels.
The resulting alignment framework shows good
results for typical driving scenarios, in which both
live and historic recordings were acquired from the
same driving lane. In this case, 96% of all manu-
ally annotated points are registered with an alignment
error up to 5 pixels for images with a resolution of
1920 × 1440 pixels, where even 79% of the annotati-
ons have an error of unity pixel or lower. Even when
driving in an adjacent lane, the system is able to accu-
rately align 71% of all annotated points.
It was found that the disparity resolution of the
depth map, i.e. the lack of sub-pixel accuracy, limits
the accuracy of the 3D model, making it less effective
for displacements above 4 meters. Nevertheless, the
proposed work significantly improves the operatio-
nal range of the real-time change detection system,
which now covers the full 3D scene, instead of only
the ground plane. Higher accuracies and/or perfor-
mance of the change detection system can be achie-
ved when important parameters are improved, such as
lenses and/or a larger baseline, together with a more
accurate depth estimation algorithm.
REFERENCES
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., and
Susstrunk, S. (2012). Slic superpixels compared to
state-of-the-art superpixel methods. IEEE Trans. Pat-
tern Anal. Mach. Intell., 34(11):2274–2282.
Broggi, A., Cattani, S., Patander, M., Sabbatelli, M., and
Zani, P. (2013). A full-3d voxel-based dynamic ob-
stacle detection for urban scenario using stereo vi-
sion. In Intelligent Transportation Systems-(ITSC),
2013 16th International IEEE Conference on, pages
71–76. IEEE.
Chauve, A. L., Labatut, P., and Pons, J. P. (2010). Ro-
bust piecewise-planar 3d reconstruction and comple-
tion from large-scale unstructured point data. In 2010
IEEE Computer Society Conference on Computer Vi-
sion and Pattern Recognition, pages 1261–1268.
Cordts, M., Rehfeld, T., Enzweiler, M., Franke, U., and
Roth, S. (2016). Tree-structured models for efficient
multi-cue scene labeling. IEEE Transactions on Pat-
tern Analysis and Machine Intelligence.
Felzenszwalb, P. F. and Huttenlocher, D. P. (2004). Effi-
cient graph-based image segmentation. International
Journal of Computer Vision, 59(2):167–181.
Labatut, P., Pons, J.-P., and Keriven, R. (2009). Robust and
efficient surface reconstruction from range data. In
Computer graphics forum, volume 28, pages 2275–
2290. Wiley Online Library.
Lou, Z. and Gevers, T. (2014). Image alignment by piece-
wise planar region matching. IEEE Transactions on
Multimedia, 16(7):2052–2061.
Maiti, A. and Chakravarty, D. (2016). Performance analy-
sis of different surface reconstruction algorithms for
3d reconstruction of outdoor objects from their digital
images. SpringerPlus, 5(1):932.
Natour, G. E., Ait-Aider, O., Rouveure, R., Berry, F., and
Faure, P. (2015). Toward 3d reconstruction of outdoor
scenes using an mmw radar and a monocular vision
sensor. Sensors, 15(10):25937–25967.
Pfeiffer, D.-I. D. (2012). The stixel world. PhD thesis,
Humboldt-Universit
¨
at zu Berlin.
Salman, N. and Yvinec, M. (2010). Surface reconstruction
from multi-view stereo of large-scale outdoor scenes.
International Journal of Virtual Reality, 9(1):19–26.
Sanberg, W. P., Do, L., and de With, P. H. N. (2013). Flex-
ible multi-modal graph-based segmentation. In Ad-
vanced Concepts for Intelligent Vision Systems: 15th
International Conference, ACIVS 2013, Pozna
´
n, Po-
land, October 28-31, 2013. Proceedings, pages 492–
503. Springer International Publishing.
VISAPP 2018 - International Conference on Computer Vision Theory and Applications
258