
REFERENCES
Asanomi, T., Nishimura, K., and Bise, R. (2023). Multi-
frame attention with feature-level warping for drone
crowd tracking. In 2023 IEEE/CVF Winter Confer-
ence on Applications of Computer Vision (WACV),
pages 1664–1673.
Caruana, R. (1997). Multitask learning. Machine learning,
28:41–75.
Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., and
Sun, J. (2021). You only look one-level feature. In
Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition (CVPR), pages
13039–13048.
Chen, R., Han, S., Xu, J., and Su, H. (2020). Visibility-
aware point-based multi-view stereo network. IEEE
transactions on pattern analysis and machine intelli-
gence, 43(10):3695–3708.
Coughlan, J. M. and Yuille, A. L. (2003). Manhattan World:
Orientation and Outlier Detection by Bayesian Infer-
ence. Neural Computation, 15(5):1063–1088.
Crawshaw, M. (2020). Multi-task learning with deep neural
networks: A survey. ArXiv, abs/2009.09796.
Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser,
T., and Nießner, M. (2017). Scannet: Richly-
annotated 3d reconstructions of indoor scenes. In
Proc. Computer Vision and Pattern Recognition
(CVPR), IEEE.
Ding, Y., Yuan, W., Zhu, Q., Zhang, H., Liu, X., Wang,
Y., and Liu, X. (2022). Transmvsnet: Global context-
aware multi-view stereo network with transformers. In
Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition (CVPR), pages
8585–8594.
Eigen, D., Puhrsch, C., and Fergus, R. (2014). Depth map
prediction from a single image using a multi-scale
deep network. Advances in neural information pro-
cessing systems, 27.
Guo, E., Chen, Z., Zhou, Y., and Wu, D. O. (2021). Unsu-
pervised learning of depth and camera pose with fea-
ture map warping. Sensors, 21(3).
He, K., Gkioxari, G., Doll
´
ar, P., and Girshick, R. (2017).
Mask r-cnn. In 2017 IEEE International Conference
on Computer Vision (ICCV), pages 2980–2988.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In 2016 IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 770–778.
Im, S., Jeon, H.-G., Lin, S., and Kweon, I.-S. (2019). Dp-
snet: End-to-end deep plane sweep stereo. In 7th In-
ternational Conference on Learning Representations,
ICLR.
Lin, T.-Y., Doll
´
ar, P., Girshick, R., He, K., Hariharan, B.,
and Belongie, S. (2017a). Feature pyramid networks
for object detection. In 2017 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR),
pages 936–944.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll
´
ar, P.
(2017b). Focal loss for dense object detection. In 2017
IEEE International Conference on Computer Vision
(ICCV), pages 2999–3007.
Liu, C., Kim, K., Gu, J., Furukawa, Y., and Kautz, J. (2019).
Planercnn: 3d plane detection and reconstruction from
a single image. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition
(CVPR).
Liu, C., Yang, J., Ceylan, D., Yumer, E., and Furukawa, Y.
(2018a). Planenet: Piece-wise planar reconstruction
from a single rgb image. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recogni-
tion (CVPR).
Liu, J., Ji, P., Bansal, N., Cai, C., Yan, Q., Huang, X.,
and Xu, Y. (2022). Planemvs: 3d plane reconstruc-
tion from multi-view stereo. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 8665–8675.
Liu, R., Lehman, J., Molino, P., Petroski Such, F., Frank,
E., Sergeev, A., and Yosinski, J. (2018b). An in-
triguing failing of convolutional neural networks and
the coordconv solution. In Bengio, S., Wallach, H.,
Larochelle, H., Grauman, K., Cesa-Bianchi, N., and
Garnett, R., editors, Advances in Neural Information
Processing Systems, volume 31. Curran Associates,
Inc.
Qian, Y. and Furukawa, Y. (2020). Learning pairwise inter-
plane relations for piecewise planar reconstruction. In
European Conference on Computer Vision.
Standley, T., Zamir, A., Chen, D., Guibas, L., Malik, J., and
Savarese, S. (2020). Which tasks should be learned
together in multi-task learning? In III, H. D. and
Singh, A., editors, Proceedings of the 37th Interna-
tional Conference on Machine Learning, volume 119
of Proceedings of Machine Learning Research, pages
9120–9132. PMLR.
Tan, B., Xue, N., Bai, S., Wu, T., and Xia, G.-S. (2021).
Planetr: Structure-guided transformers for 3d plane
recovery. In Proceedings of the IEEE/CVF Interna-
tional Conference on Computer Vision (ICCV), pages
4186–4195.
Wang, X., Kong, T., Shen, C., Jiang, Y., and Li, L. (2020a).
Solo: Segmenting objects by locations. In Com-
puter Vision–ECCV 2020: 16th European Confer-
ence, Glasgow, UK, August 23–28, 2020, Proceed-
ings, Part XVIII 16, pages 649–665. Springer.
Wang, X., Zhang, R., Kong, T., Li, L., and Shen, C.
(2020b). Solov2: Dynamic and fast instance segmen-
tation. Advances in Neural information processing
systems, 33:17721–17732.
Xi, W. and Chen, X. (2019). Reconstructing piecewise pla-
nar scenes with multi-view regularization. Computa-
tional Visual Media, 5(4):337–345.
Xie, Y., Gadelha, M., Yang, F., Zhou, X., and Jiang, H.
(2022). Planarrecon: Real-time 3d plane detection and
reconstruction from posed monocular videos. In Pro-
ceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, pages 6219–6228.
Xie, Y., Rambach, J., Shu, F., and Stricker, D. (2021a).
Planesegnet: Fast and robust plane estimation using a
single-stage instance segmentation cnn. In 2021 IEEE
International Conference on Robotics and Automation
(ICRA), pages 13574–13580. IEEE.
Xie, Y., Shu, F., Rambach, J. R., Pagani, A., and Stricker, D.
(2021b). Planerecnet: Multi-task learning with cross-
VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications
660