Authors:
Simone Bertolotto
1
;
2
;
André Panisson
2
and
Alan Perotti
2
Affiliations:
1
University of Turin, Italy
;
2
Centai Institute, Turin, Italy
Keyword(s):
Computer Vision, Image Classification, Neural Symbolic Integration, Large Language Models.
Abstract:
Neural networks have become the primary approach for tackling computer vision tasks, but their lack of transparency and interpretability remains a challenge. Integrating neural networks with symbolic knowledge bases, which could provide valuable context for visual concepts, is not yet common in the machine learning community. In image classification, class labels are often treated as independent, orthogonal concepts, resulting in equal penalization of misclassifications regardless of the semantic similarity between the true and predicted labels. Previous studies have attempted to address this by using ontologies to establish relationships among classes, but such data structures are generally not available. In this paper, we use a large language model (LLM) to generate textual descriptions for each class label, aiming to capture the visual characteristics of the corresponding concepts. These descriptions are then encoded into embedding vectors, which are used as the ground truth for t
raining the image classification model. By employing a cosine distance-based loss function, our approach considers the semantic similarity between class labels, encouraging the model to learn a more hierarchically structured internal feature representation. We evaluate our method on multiple datasets and compare its performance with existing techniques, focusing on classification accuracy, mistake severity, and the emergence of a hierarchical structure in the learned concept representations. The results suggest that semantic embedding representations extracted from LLMs have the potential to enhance the performance of image classification models and lead to more semantically meaningful misclassifications. A key advantage of our method, compared to those that leverage explicit hierarchical information, is its broad applicability to a wide range of datasets without requiring the presence of pre-defined hierarchical structures.
(More)