Authors:
Constantinos Kyriakides
1
;
Marios Thoma
1
;
2
;
Zenonas Theodosiou
1
;
3
;
Harris Partaourides
4
;
Loizos Michael
2
;
1
and
Andreas Lanitis
5
;
1
Affiliations:
1
CYENS Centre of Excellence, Nicosia, Cyprus
;
2
Open University of Cyprus, Nicosia, Cyprus
;
3
Department of Communication and Internet Studies, Cyprus University of Technology, Limassol, Cyprus
;
4
AI Cyprus Ethical Novelties Ltd, Limassol, Cyprus
;
5
Department of Multimedia and Graphic Arts, Cyprus University of Technology, Limassol, Cyprus
Keyword(s):
Deep Learning Algorithms, Explainability, Eye Tracking, Heatmaps, Obstacle Recognition.
Abstract:
Contemporary cities are fractured by a growing number of barriers, such as on-going construction and infrastructure damages, which endanger pedestrian safety. Automated detection and recognition of such barriers from visual data has been of particular concern to the research community in recent years. Deep Learning (DL) algorithms are now the dominant approach in visual data analysis, achieving excellent results in a wide range of applications, including obstacle detection. However, explaining the underlying operations of DL models remains a key challenge in gaining significant understanding on how they arrive at their decisions. The use of heatmaps that highlight the focal points in input images that helped the models reach their predictions has emerged as a form of post-hoc explainability for such models. In an effort to gain insights into the learning process of DL models, we studied the similarities between heatmaps generated by a number of architectures trained to detect obstacl
es on sidewalks in images collected via smartphones, and eye-tracking heatmaps generated by humans as they detect the corresponding obstacles on the same data. Our findings indicate that the focus points of humans more closely align with those of a Vision Transformer architecture, as opposed to the other network architectures we examined in our experiments.
(More)