
from the segmentation results, to support automatic
progress tracking and quality control, as shown in real
construction-site applications. Our method provides
more detailed information than existing state-of-the-
art approaches, and we demonstrate generalization to
scenes outside the training dataset, showcasing its po-
tential to boost productivity, planning accuracy, and
regulatory compliance in the construction industry.
ACKNOWLEDGEMENTS
This work has partly been funded by the German
Federal Ministry for Digital and Transport (project
EConoM under grant number 19OI22009C). We
thank HOCHTIEF ViCon GmbH for providing train-
ing images of construction sites.
REFERENCES
Bayer, H. and Aziz, A. (2022). Object detection of fire
safety equipment in images and videos using yolov5
neural network. In Proceedings of 33. Forum Bauin-
formatik.
Cambeiro Barreiro, A., Seibold, C., Hilsmann, A., and Eis-
ert, P. (2022). Automated damage inspection of power
transmission towers from uav images. In Proceed-
ings of the International Joint Conference on Com-
puter Vision, Imaging and Computer Graphics Theory
and Applications, volume 5.
Cambeiro Barreiro, A., Trzeciakiewicz, M., Hilsmann, A.,
and Eisert, P. (2023). Automatic reconstruction of se-
mantic 3d models from 2d floor plans. In 2023 18th
International Conference on Machine Vision and Ap-
plications (MVA), pages 1–5. IEEE.
Chauhan, I. and Sepp
¨
anen, O. (2023). Automatic indoor
construction progress monitoring: challenges and so-
lutions. In EC3 Conference 2023, volume 4, pages 0–
0. European Council on Computing in Construction.
Dalva, Y., Pehlivan, H., Altindis, S. F., and Dundar, A.
(2023). Benchmarking the robustness of instance seg-
mentation models. IEEE Transactions on Neural Net-
works and Learning Systems.
Ekanayake, B., Fini, A. A. F., Wong, J. K. W., and Smith,
P. (2024). A deep learning-based approach to facili-
tate the as-built state recognition of indoor construc-
tion works. Construction Innovation, 24.
Ekanayake, B., Wong, J. K.-W., Fini, A. A. F., and Smith,
P. (2021). Computer vision-based interior construc-
tion progress monitoring: A literature review and fu-
ture research directions. Automation in construction,
127:103705.
Fischler, M. A. and Bolles, R. C. (1981). Random sample
consensus: a paradigm for model fitting with appli-
cations to image analysis and automated cartography.
Communications of the ACM, 24(6):381–395.
Gard, N., Hilsmann, A., and Eisert, P. (2024). Spvloc: Se-
mantic panoramic viewport matching for 6d camera
localization in unseen environments. In Proc. Euro-
pean Conf. on Computer Vision (ECCV).
He, K., Gkioxari, G., Doll
´
ar, P., and Girshick, R. (2017).
Mask r-cnn. In 2017 IEEE International Conference
on Computer Vision (ICCV), pages 2980–2988.
Kalervo, A., Ylioinas, J., H
¨
aiki
¨
o, M., Karhu, A., and Kan-
nala, J. (2019). Cubicasa5k: A dataset and an im-
proved multi-task model for floorplan image analy-
sis. In Image Analysis: 21st Scandinavian Confer-
ence, SCIA 2019, Norrk
¨
oping, Sweden, June 11–13,
2019, Proceedings 21, pages 28–40. Springer.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ra-
manan, D., Doll
´
ar, P., and Zitnick, C. L. (2014). Mi-
crosoft COCO: Common objects in context. In Fleet,
D., Pajdla, T., Schiele, B., and Tuytelaars, T., edi-
tors, Computer Vision – ECCV 2014, pages 740–755.
Springer International Publishing.
Lv, X., Zhao, S., Yu, X., and Zhao, B. (2021). Residential
floor plan recognition and reconstruction. In Proceed-
ings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition, pages 16717–16726.
Pal, A., Lin, J. J., Hsieh, S.-H., and Golparvar-Fard,
M. (2023). Automated vision-based construction
progress monitoring in built environment through dig-
ital twin. Developments in the Built Environment,
page 100247.
Research (2023). room-interior dataset. https://universe.
roboflow.com/research-twzom/room-interior. visited
on 2024-11-06.
Sch
¨
onfelder, P. and K
¨
onig, M. (2022). Deep learning-based
text detection on architectural floor plan images. In
IOP Conference Series: Earth and Environmental Sci-
ence, volume 1101.
Shamsollahi, D., Moselhi, O., and Khorasani, K. (2021). A
timely object recognition method for construction us-
ing the mask r-cnn architecture. In Proceedings of the
International Symposium on Automation and Robotics
in Construction, volume 2021-November.
Wei, W., Lu, Y., Zhong, T., Li, P., and Liu, B. (2022). In-
tegrated vision-based automated progress monitoring
of indoor construction using mask region-based con-
volutional neural networks and bim. Automation in
Construction, 140.
Woo, S., Debnath, S., Hu, R., Chen, X., Liu, Z., Kweon,
I. S., and Xie, S. (2023). Convnext v2: Co-designing
and scaling convnets with masked autoencoders. In
Proc. IEEE Conference on Computer Vision and Pat-
tern Recognition (CVPR).
Ying, H. Q. and Lee, S. (2019). A mask r-cnn based ap-
proach to automatically construct as-is ifc bim objects
from digital images. In Proceedings of the 36th Inter-
national Symposium on Automation and Robotics in
Construction, ISARC 2019.
Zhao, X. and Cheah, C. C. (2023). Bim-based indoor mo-
bile robot initialization for construction automation
using object detection. Automation in Construction,
146:104647.
VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications
716