6 CONCLUSION
In this paper, we have presented a novel bench-
mark for evaluating image-based 3D reconstruction
pipelines with aerial images in urban environments.
The results obtained with the considered SfM+MVS
state-of-the-art pipelines are evaluated at scene level
and per urban category. This allows for further
analysis of the reconstructions (i.e., analysis of
the influence of each urban category in the scene
level scores) and it supports previous hypothesis
(e.g., parks can degrade the F score values in a
scene level evaluation) with quantitative measure-
ments. Also, we provide the means for evaluating
results in a hidden area to avoid fine tuning of
algorithms to the given ground truth. Furthermore,
we stimulate and support the evaluation of new
approaches for image-based 3D reconstruction as
we do not limit the evaluation to a specific stage
of the pipeline (e.g., MVS). Finally, to support the
progress of research in the community we provide
the dataset and an online evaluation platform at
https://v-sense.scss.tcd.ie/research/6dof/benchmark-
3D-reconstruction.
ACKNOWLEDGEMENTS
This publication has emanated from research con-
ducted with the financial support of Science Foun-
dation Ireland (SFI) under the Grant Number
15/RP/2776.
REFERENCES
Aanæs, H., Jensen, R. R., Vogiatzis, G., Tola, E., and Dahl,
A. B. (2016). Large-scale data for multiple-view stere-
opsis. International Journal of Computer Vision.
Barnes, C., Shechtman, E., Finkelstein, A., and Goldman,
D. B. (2009). Patchmatch: A randomized correspon-
dence algorithm for structural image editing. In ACM
Transactions on Graphics (ToG). ACM.
Bosch, M., Foster, K., Christie, G., Wang, S., Hager, G. D.,
and Brown, M. (2019). Semantic stereo for incidental
satellite images. In WACV 2019. IEEE.
Furukawa, Y. and Ponce, J. (2010). Accurate, dense, and ro-
bust multiview stereopsis. IEEE Transactions on Pat-
tern Analysis and Machine Intelligence.
Hartley, R. and Zisserman, A. (2003). Multiple view geom-
etry in computer vision. Cambridge University Press.
Jancosek, M. and Pajdla, T. (2011). Multi-view reconstruc-
tion preserving weakly-supported surfaces. In CVPR
2011. IEEE.
Knapitsch, A., Park, J., Zhou, Q.-Y., and Koltun, V. (2017).
Tanks and temples: Benchmarking large-scale scene
reconstruction. ACM Transactions on Graphics.
Laefer, D. F., Abuwarda, S., Vo, A.-V., Truong-Hong, L.,
and Gharibi, H. (2015). 2015 aerial laser and pho-
togrammetry survey of dublin city collection record.
Menze, M. and Geiger, A. (2015). Object scene flow for
autonomous vehicles. In CVPR 2015.
Moulon, P., Monasse, P., and Marlet, R. (2012). Adaptive
structure from motion with a contrario model estima-
tion. In ACCV 2012. Springer Berlin Heidelberg.
Moulon, P., Monasse, P., and Marlet, R. (2013). Global fu-
sion of relative motions for robust, accurate and scal-
able structure from motion. In ICCV 2013.
Munoz, D., Bagnell, J. A., Vandapel, N., and Hebert, M.
(2009). Contextual classification with functional max-
margin markov networks. In CVPR 2009.
¨
Ozdemir, E., Toschi, I., and Remondino, F. (2019). A multi-
purpose benchmark for photogrammetric urban 3d re-
construction in a controlled environment.
Pag
´
es, R., Amplianitis, K., Monaghan, D., Ond
ˇ
rej, J., and
Smoli
´
c, A. (2018). Affordable content creation for
free-viewpoint video and vr/ar applications. Journal
of Visual Communication and Image Representation.
Rottensteiner, F., Sohn, G., Gerke, M., Wegner, J. D., Bre-
itkopf, U., and Jung, J. (2014). Results of the isprs
benchmark on urban object detection and 3d building
reconstruction. ISPRS journal of photogrammetry and
remote sensing.
Ruano, S., Cuevas, C., Gallego, G., and Garc
´
ıa, N. (2017).
Augmented reality tool for the situational awareness
improvement of uav operators. Sensors.
Ruano, S., Gallego, G., Cuevas, C., and Garc
´
ıa, N. (2014).
Aerial video georegistration using terrain models from
dense and coherent stereo matching. In Geospatial In-
foFusion and Video Analytics IV; and Motion Imagery
for ISR and Situational Awareness II. International So-
ciety for Optics and Photonics.
Sch
¨
onberger, J. L. and Frahm, J.-M. (2016). Structure-
from-motion revisited. In CVPR 2016.
Sch
¨
onberger, J. L., Zheng, E., Pollefeys, M., and Frahm, J.-
M. (2016). Pixelwise view selection for unstructured
multi-view stereo. In ECCV 2016. Springer.
Schops, T., Schonberger, J. L., Galliani, S., Sattler, T.,
Schindler, K., Pollefeys, M., and Geiger, A. (2017).
A multi-view stereo benchmark with high-resolution
images and multi-camera videos. In CVPR 2017.
Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., and
Szeliski, R. (2006). A comparison and evaluation of
multi-view stereo reconstruction algorithms. In CVPR
2006.
Serna, A., Marcotegui, B., Goulette, F., and Deschaud, J.-
E. (2014). Paris-rue-madame database: a 3d mobile
laser scanner dataset for benchmarking urban detec-
tion, segmentation and classification methods.
Snavely, N., Seitz, S. M., and Szeliski, R. (2008). Modeling
the world from internet photo collections. Interna-
tional Journal of Computer Vision.
Stathopoulou, E. K., Welponer, M., and Remondino, F.
(2019). Open-source image-based 3d reconstruction
VISAPP 2021 - 16th International Conference on Computer Vision Theory and Applications
740