
tion, such as the base width and the distance to the
analyzed scene, as desired. This theoretically en-
ables the entire working area of the robot to be re-
constructed with a high and, above all, consistent ac-
curacy. The system comprises two computing units
connected via Ethernet, allowing the 3D reconstruc-
tion process to be partitioned based on the specific re-
quirements of the intended application. To optimize
the system configuration for the target application, a
self-evaluation framework was developed to assess re-
construction accuracy, speed, and latency for different
system configurations. For the hardware setup inves-
tigated during this work, the best results for sparse
3D reconstruction were achieved by running the fea-
ture detection on the embedded system and transmit-
ting the features to the central machine, where the re-
maining 3D reconstruction is performed. Both sparse
and dense reconstructions achieved errors in the mil-
limeter range. Future works aim to use the achieved
3D reconstruction for robotic manipulation tasks and
investigate reducing the processing and latency time
to use the system in a closed control loop. One pos-
sibility would be using a simple tracker that runs at
high speed on the embedded system in combination
with the robot’s state information and a Kalman Filter
to overcome the issue that the tracker only works for
small movements.
ACKNOWLEDGEMENTS
The authors acknowledge the financial support by
the Bavarian State Ministry for Economic Affairs,
Regional Development and Energy (StMWi) for the
Lighthouse Initiative KI.FABRIK, (Phase 1: Infras-
tructure as well as research and development pro-
gram, grant no. DIK0249).
REFERENCES
Arruda, E., Wyatt, J., and Kopicki, M. (2016). Active
vision for dexterous grasping of novel objects. In
2016 IEEE/RSJ International Conference on Intelli-
gent Robots and Systems (IROS), pages 2881–2888.
Bay, H., Tuytelaars, T., and Van Gool, L. (2006). Surf:
Speeded up robust features. In Leonardis, A., Bischof,
H., and Pinz, A., editors, Computer Vision – ECCV
2006, pages 404–417. Springer Berlin Heidelberg.
Burschka, D. and Mair, E. (2008). Direct pose estimation
with a monocular camera. In Sommer, G. and Klette,
R., editors, Robot Vision, pages 440–453. Springer
Berlin Heidelberg.
Fern
´
andez Alcantarilla, P. (2013). Fast explicit diffusion for
accelerated features in nonlinear scale spaces.
Hagiwara, H. and Yamazaki, Y. (2019). Autonomous valve
operation by a manipulator using a monocular cam-
era and a single shot multibox detector. In 2019 IEEE
International Symposium on Safety, Security, and Res-
cue Robotics (SSRR), pages 56–61.
Hartley, R. and Zisserman, A. (2004). Multiple View Geom-
etry in Computer Vision. Cambridge University Press,
2 edition.
Hirschmuller, H. (2005). Accurate and efficient stereo pro-
cessing by semi-global matching and mutual informa-
tion. In 2005 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR’05),
volume 2, pages 807–814 vol. 2.
Kappler, D., Meier, F., Issac, J., Mainprice, J., Ci-
fuentes, C. G., W
¨
uthrich, M., Berenz, V., Schaal,
S., Ratliff, N. D., and Bohg, J. (2017). Real-time
perception meets reactive motion generation. CoRR,
abs/1703.03512.
Lin, Y., Tremblay, J., Tyree, S., Vela, P. A., and Birchfield,
S. (2021). Multi-view fusion for multi-level robotic
scene understanding. In 2021 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS),
pages 6817–6824.
Lowe, D. G. (2004). Distinctive image features from
scale-invariant keypoints. Int. J. Comput. Vision,
60(2):91–110.
McCormac, J., Clark, R., Bloesch, M., Davison, A. J., and
Leutenegger, S. (2018). Fusion++: Volumetric object-
level SLAM. CoRR, abs/1808.08378.
Murali, A., Mousavian, A., Eppner, C., Paxton, C., and Fox,
D. (2019). 6-dof grasping for target-driven object ma-
nipulation in clutter. CoRR, abs/1912.03628.
Nister, D. (2004). An efficient solution to the five-point
relative pose problem. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 26(6):756–770.
Redmon, J., Divvala, S. K., Girshick, R. B., and Farhadi, A.
(2015). You only look once: Unified, real-time object
detection. CoRR, abs/1506.02640.
Rosten, E. and Drummond, T. (2006). Machine learning
for high-speed corner detection. In Leonardis, A.,
Bischof, H., and Pinz, A., editors, Computer Vision
– ECCV 2006, pages 430–443. Springer Berlin Hei-
delberg.
Rublee, E., Rabaud, V., Konolige, K., and Bradski, G.
(2011). Orb: An efficient alternative to sift or surf. In
2011 International Conference on Computer Vision,
pages 2564–2571.
Sch
¨
onberger, J. L. and Frahm, J.-M. (2016). Structure-
from-motion revisited. In 2016 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR),
pages 4104–4113.
Shahzad, A., Gao, X., Yasin, A., Javed, K., and Anwar,
S. M. (2020). A vision-based path planning and ob-
ject tracking framework for 6-dof robotic manipulator.
IEEE Access, 8:203158–203167.
Wang, T.-M. and Shih, Z.-C. (2021). Measurement and
analysis of depth resolution using active stereo cam-
eras. IEEE Sensors Journal, 21(7):9218–9230.
Zhang, Q., Cao, Y., and Wang, Q. (2023). Multi-vision
based 3d reconstruction system for robotic grinding.
In 2023 IEEE 18th Conference on Industrial Electron-
ics and Applications (ICIEA), pages 298–303.
VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications
658