
REFERENCES
Adewopo, V., Elsayed, N., ElSayed, Z., Ozer, M., Wangia-
Anderson, V., and Abdelgawad, A. (2023). Ai on the
road: A comprehensive analysis of traffic accidents
and autonomous accident detection system in smart
cities. In 2023 IEEE 35th International Conference on
Tools with Artificial Intelligence (ICTAI), pages 501–
506.
Andriluka, M., Roth, S., and Schiele, B. (2010). Monocular
3d pose estimation and tracking by detection. In 2010
IEEE Computer Society Conference on Computer Vi-
sion and Pattern Recognition, pages 623–630.
Cui, Z., Liu, Y., and Ren, F. (2019). Homography-based
traffic sign localization and pose estimation from im-
age sequence. IET Image Processing, 13.
Deng, L. (2012). The mnist database of handwritten digit
images for machine learning research. IEEE Signal
Processing Magazine, 29(6):141–142.
Hara, K., Vemulapalli, R., and Chellappa, R. (2017). De-
signing deep convolutional neural networks for con-
tinuous object orientation estimation.
Hodson, T. O. (2022). Root-mean-square error (rmse)
or mean absolute error (mae): when to use them or
not. Geoscientific Model Development, 15(14):5481–
5487.
Kanezaki, A., Matsushita, Y., and Nishida, Y. (2018). Ro-
tationnet: Joint object categorization and pose estima-
tion using multiviews from unsupervised viewpoints.
In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR).
Kendall, A., Grimes, M., and Cipolla, R. (2015). Posenet:
A convolutional network for real-time 6-dof camera
relocalization. CoRR, abs/1505.07427.
Koguciuk, D., Arani, E., and Zonooz, B. (2021). Perceptual
loss for robust unsupervised homography estimation.
In Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition (CVPR) Work-
shops, pages 4274–4283.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin,
S., and Guo, B. (2021). Swin transformer: Hierarchi-
cal vision transformer using shifted windows. CoRR,
abs/2103.14030.
Molnar, C. (2022). Interpretable Machine Learning. Lean-
pub, 2 edition.
Mousavian, A., Anguelov, D., Flynn, J., and Kosecka, J.
(2017). 3d bounding box estimation using deep learn-
ing and geometry.
Okorn, B., Pan, C., Hebert, M., and Held, D. (2022). Deep
projective rotation estimation through relative super-
vision.
Ozuysal, M., Lepetit, V., and Fua, P. (2009). Pose estima-
tion for category specific multiview object localiza-
tion. In 2009 IEEE Conference on Computer Vision
and Pattern Recognition, pages 778–785.
Prisacariu, V. A., Timofte, R., Zimmermann, K., Reid, I.,
and Van Gool, L. (2010). Integrating object detection
with 3d tracking towards a better driver assistance sys-
tem. In 2010 20th International Conference on Pat-
tern Recognition, pages 3344–3347.
Qin, X., Zhang, Z., Huang, C., Dehghan, M., Zaiane, O. R.,
and Jagersand, M. (2020). U2-net: Going deeper with
nested u-structure for salient object detection. Pattern
Recognition, 106:107404.
Raza, M., Rehman, S.-U., Wang, P., and Peng, B. (2018).
Appearance based pedestrians’ head pose and body
orientation estimation using deep learning. Neuro-
computing, 272:647–659.
Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ”why
should i trust you?”: Explaining the predictions of any
classifier. In Proceedings of the 22nd ACM SIGKDD
International Conference on Knowledge Discovery
and Data Mining, pages 1135–1144, New York, NY,
USA. Association for Computing Machinery.
Rodriguez Salas, R., Dokladal, P., and Dokladalova, E.
(2021). A minimal model for classification of rotated
objects with prediction of the angle of rotation. Jour-
nal of Visual Communication and Image Representa-
tion, 75:103054.
Rusiecki, A. (2019). Trimmed categorical cross-entropy for
deep learning with label noise. Electronics Letters, 55.
Sarlin, P.-E., DeTone, D., Malisiewicz, T., and Rabinovich,
A. (2020). Superglue: Learning feature matching with
graph neural networks.
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,
Parikh, D., and Batra, D. (2017). Grad-cam: Visual
explanations from deep networks via gradient-based
localization. In 2017 IEEE International Conference
on Computer Vision (ICCV), pages 618–626.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,
Anguelov, D., Erhan, D., Vanhoucke, V., and Rabi-
novich, A. (2014). Going deeper with convolutions.
Wang, Q., Ma, Y., Zhao, K., and Tian, Y. (2022). A compre-
hensive survey of loss functions in machine learning.
Annals of Data Science, 9(2):187–212.
Xie, S., Girshick, R. B., Doll
´
ar, P., Tu, Z., and He, K.
(2016). Aggregated residual transformations for deep
neural networks. CoRR, abs/1611.05431.
Zagoruyko, S. and Komodakis, N. (2016). Wide residual
networks. CoRR, abs/1605.07146.
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
240