
model not only performed excellently at the raster
and vector level, but also produced quite stable re-
sults with narrow quantile ranges, including for the
quite outlier-sensitive Hausdorff metric. In compari-
son, SAM’s performance fluctuated more. The main
reason for this is that the edges are real, detectable ob-
servations while the roof segments SAM relies on can
suffer from occlusions by trees, color changes, roof
objects, and so on.
We also conducted qualitative evaluations on both
the SGA and Melville datasets, showcasing represen-
tative results. Overall, our method can effectively
handle different roof structures and eliminate edge
gaps in YOLOv8. Even on the Melville model, which
was not involved in training and has a relatively lower
resolution, the results were satisfactory. Although
the roofs in Melville present more complex struc-
tures, most roof segments could still be well recog-
nized. This demonstrates our great potential in han-
dling complex roof vectorization.
In conclusion, our findings highlight the potential
of our method to effectively handle diverse roof struc-
tures, even in challenging scenarios with complex ge-
ometries. Moving forward, we plan to explore addi-
tional datasets and integrate our technique into urban
terrain reconstruction workflows. On the one hand, it
will help to explore further radiometric (detecting im-
portant installations on roofs: photovoltaic panels, so-
lar collectors, etc.) and geometric (non-planar roof el-
ements, such as domes, towers) aspects. On the other
hand, we aim to conduct a more comprehensive com-
parative analysis with other competing methods, fur-
ther establishing the robustness and versatility of our
approach. Finally, in future work, we plan to incorpo-
rate 3D data to search for more precise intersections
near our predicted ones to improve the vectorization.
ACKNOWLEDGEMENTS
The authors thank the China Scholarship Council
(CSC) for supporting this research, Grant/Award
Number: 202308080109. We also thank the review-
ers for their insightful comments.
REFERENCES
Alidoost, F., Arefi, H., and Hahn, M. (2020). Y-shaped
convolutional neural network for 3D roof elements
extraction to reconstruct building models from a sin-
gle aerial image. ISPRS Annals of the Photogramme-
try, Remote Sensing and Spatial Information Sciences,
2:321–328.
Alidoost, F., Arefi, H., and Tombari, F. (2019). 2D image-
to-3D model: Knowledge-based 3D building recon-
struction (3DBR) using single aerial images and con-
volutional neural networks (CNNs). Remote Sensing,
11(19):2219.
Avbelj, J., M
¨
uller, R., and Bamler, R. (2014). A metric for
polygon comparison and building extraction evalua-
tion. IEEE Geoscience and Remote Sensing Letters,
12(1):170–174.
Bulatov, D., H
¨
aufel, G., Meidow, J., Pohl, M., Solbrig, P.,
and Wernerus, P. (2014). Context-based automatic
reconstruction and texturing of 3D urban terrain for
quick-response tasks. ISPRS Journal of Photogram-
metry and Remote Sensing, 93:157–170.
Bulatov, D., Wenzel, S., H
¨
aufel, G., and Meidow, J. (2017).
Chain-wise generalization of road networks using
model selection. ISPRS Annals of the Photogramme-
try, Remote Sensing and Spatial Information Sciences,
4:59–66.
Canny, J. (1986). A computational approach to edge de-
tection. IEEE Transactions on Pattern Analysis and
Machine Intelligence, (6):679–698.
Esmaeily, Z. and Rezaeian, M. (2023). Building roof wire-
frame extraction from aerial images using a three-
stream deep neural network. Journal of Electronic
Imaging, 32(1):013001–013001.
Harris, C., Stephens, M., et al. (1988). A combined corner
and edge detector. In Alvey Vision Conference, vol-
ume 15, pages 147–152. Citeseer.
Henricsson, O. (1998). The role of color attributes and sim-
ilarity grouping in 3-d building reconstruction. Com-
puter Vision and Image Understanding, 72(2):163–
184.
Hensel, S., Goebbels, S., and Kada, M. (2021). Building
roof vectorization with PPGNET. The International
Archives of the Photogrammetry, Remote Sensing and
Spatial Information Sciences, 46:85–90.
House, D., Lech, M., and Stolar, M. (2018). Using deep
learning to identify potential roof spaces for solar pan-
els. In nternational Conference on Signal Process-
ing and Communication Systems (ICSPCS), pages 1–
6. IEEE.
Huang, K., Wang, Y., Zhou, Z., Ding, T., Gao, S., and Ma,
Y. (2018). Learning to parse wireframes in images of
man-made environments. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recogni-
tion, pages 626–635.
Ilehag, R., Bulatov, D., Helmholz, P., and Belton, D.
(2018). Classification and representation of com-
monly used roofing material using multisensorial
aerial data. The International Archives of the Pho-
togrammetry, Remote Sensing and Spatial Informa-
tion Sciences, 42:217–224.
Jocher, G., Chaurasia, A., and Qiu, J. (2023). Ultralytics
yolov8.
Jung, J., Jwa, Y., and Sohn, G. (2017). Implicit regular-
ization for reconstructing 3D building rooftop models
using airborne lidar data. Sensors, 17(3):621.
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C.,
Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C.,
VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications
48