
slot detection with a realistic dataset. IEEE Access,
8:171551–171559.
Gieruc, T., K
¨
astingsch
¨
afer, M., Bernhard, S., and Salzmann,
M. (2024). 6img-to-3d: Few-image large-scale out-
door driving scene reconstruction.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A., and Ben-
gio, Y. (2020). Generative adversarial networks.
Commun. ACM, 63(11):139–144.
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and
Hochreiter, S. (2017). Gans trained by a two time-
scale update rule converge to a local nash equilibrium.
In Proceedings of the 31st International Conference
on Neural Information Processing Systems, NIPS’17,
page 6629–6640, Red Hook, NY, USA. Curran Asso-
ciates Inc.
Huang, Y., Zheng, W., Zhang, Y., Zhou, J., and Lu, J.
(2023). Tri-perspective view for vision-based 3d se-
mantic occupancy prediction.
Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2017).
Image-to-image translation with conditional adversar-
ial networks. CVPR.
Jain, V., Wu, Q., Grover, S., Sidana, K., Chaudhary, D.-G.,
Myint, S., and Hua, Q. (2021). Generating bird’s eye
view from egocentric rgb videos. Wireless Communi-
cations and Mobile Computing, 2021:1–11.
Jiachen, L., Zheyuan, Z., Xiatian, Z., Hang, X., and Li, Z.
(2022). Learning ego 3d representation as ray tracing.
Johnson, J., Alahi, A., and Fei-Fei, L. (2016). Perceptual
losses for real-time style transfer and super-resolution.
In Leibe, B., Matas, J., Sebe, N., and Welling, M.,
editors, Computer Vision – ECCV 2016, pages 694–
711, Cham. Springer International Publishing.
Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunning-
ham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J.,
Wang, Z., and Shi, W. (2017). Photo-realistic single
image super-resolution using a generative adversarial
network. In 2017 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pages 105–
114, Los Alamitos, CA, USA. IEEE Computer Soci-
ety.
Lee, Y., Kim, M., Ahn, J., and Park, J. (2023). Accurate
visual simultaneous localization and mapping (slam)
against around view monitor (avm) distortion error us-
ing weighted generalized iterative closest point (gicp).
Sensors, 23(18).
Loukkal, A., Grandvalet, Y., Drummond, T., and Li, Y.
(2021). Driving among flatmobiles: Bird-eye-view
occupancy grids from a monocular camera for holistic
trajectory planning. In 2021 IEEE Winter Conference
on Applications of Computer Vision (WACV), pages
51–60.
Mallot, H., B
¨
ulthoff, H., J.J., L., and S, B. (1991). Inverse
perspective mapping simplifies optical flow computa-
tion and obstacle detection. Biological cybernetics,
64:177–85.
Mao, X., Li, Q., Xie, H., Lau, R. Y. K., Wang, Z., and Smol-
ley, S. P. (2017). Least squares generative adversarial
networks. In 2017 IEEE International Conference
on Computer Vision (ICCV), pages 2813–2821, Los
Alamitos, CA, USA. IEEE Computer Society.
May, Y., Wangy, T., Baiy, X., Yang, H., Hou, Y., Wang, Y.,
Qiao, Y., Yang, R., Manocha, D., and Zhu, X. (2022).
Vision-centric bev perception: A survey.
Musabini, A., Bozbayir, E., Marcasuzaa, H., and Ram
´
ırez,
O. A. I. (2021). Park4u mate: Context-aware digi-
tal assistant for personalized autonomous parking. In
2021 IEEE Intelligent Vehicles Symposium (IV), pages
724–731. IEEE.
Musabini, A., Novikov, I., Soula, S., Leonet, C., Wang, L.,
Benmokhtar, R., Burger, F., Boulay, T., and Perrotton,
X. (2024). Enhanced parking perception by multi-task
fisheye cross-view transformers.
Pan, X., Tewari, A., Leimk
¨
uhler, T., Liu, L., Meka, A., and
Theobalt, C. (2023). Drag your gan: Interactive point-
based manipulation on the generative image manifold.
In ACM SIGGRAPH 2023 Conference Proceedings.
Parallel Domain Plateform (n.d.). Parallel domain. Ac-
cessed: Febuary 2024.
Pham, T., Maghoumi, M., Jiang, W., Jujjavarapu, B. S. S.,
Sajjadi, M., Liu, X., Lin, H.-C., Chen, B.-J., Truong,
G., Fang, C., et al. (2024). Nvautonet: Fast and accu-
rate 360deg 3d visual perception for self driving. In
Proceedings of the IEEE/CVF Winter Conference on
Applications of Computer Vision, pages 7376–7385.
Regmi, K. and Borji, A. (2019). Cross-view image synthe-
sis using geometry-guided conditional gans. Com-
puter Vision and Image Understanding, 187:102788.
Reiher, L., Lampe, B., and Eckstein, L. (2020). A sim2real
deep learning approach for the transformation of im-
ages from multiple vehicle-mounted cameras to a se-
mantically segmented image in bird’s eye view. In
2020 IEEE 23rd International Conference on Intelli-
gent Transportation Systems (ITSC), pages 1–7.
Ren, B., Tang, H., and Sebe, N. (2021). Cascaded cross
mlp-mixer gans for cross-view image translation. In
British Machine Vision Conference.
Ren, B., Tang, H., Wang, Y., Li, X., Wang, W., and Sebe,
N. (2023). Pi-trans: Parallel-convmlp and implicit-
transformation based gan for cross-view image trans-
lation. In ICASSP 2023 - 2023 IEEE International
Conference on Acoustics, Speech and Signal Process-
ing (ICASSP), pages 1–5.
Samani, E., Tao, F., Dasari, H., Ding, S., and Banerjee,
A. (2023). F2bev: Bird’s eye view generation from
surround-view fisheye camera images for automated
driving. 2023 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS), pages 9367–
9374.
Smith, L. and Topin, N. (2018). Super-convergence: very
fast training of neural networks using large learning
rates. In Defense + Commercial Sensing.
Tan, M. and Le, Q. V. (2021). Efficientnetv2: Smaller mod-
els and faster training.
Tang, H., Xu, D., Sebe, N., Wang, Y., Corso, J. J., and
Yan, Y. (2019). Multi-channel attention selection gan
with cascaded semantic guidance for cross-view im-
age translation. 2019 IEEE/CVF Conference on Com-
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
162