Authors:
Filippos Gouidis
1
;
2
;
Konstantinos Papoutsakis
1
;
Theodore Patkos
3
;
Antonis Argyros
2
;
3
and
Dimitris Plexousakis
2
;
3
Affiliations:
1
Department of Management, Science and Technology, Hellenic Mediterranean University, Agios Nikolaos, Greece
;
2
Computer Science Department, University of Crete, Heraklion, Greece
;
3
Institute of Computer Science, Foundation for Research and Technology Hellas, Heraklion, Greece
Keyword(s):
Visual Object State Classification, Zero-Shot Learning, Knowledge Graphs, Graph Neural Networks.
Abstract:
In this work, we explore the potential of Knowledge Graphs (KGs) towards an effective Zero-Shot Learning (ZSL) approach for Object State Classification (OSC) in images. For this problem, the performance of traditional supervised learning methods is hindered mainly by data scarcity, as they attempt to encode the highly varying visual features of a multitude of combinations of object state and object type classes (e.g. open bottle, folded newspaper). The ZSL paradigm does indicate a promising alternative to enable the classification of object state classes by leveraging structured semantic descriptions acquired by external commonsense knowledge sources. We formulate an effective ZS-OSC scheme by employing a Transformer-based Graph Neural Network model and a pre-trained CNN classifier. We also investigate best practices for both the construction and integration of visually-grounded common-sense information based on KGs. An extensive experimental evaluation is reported using 4 related im
age datasets, 5 different knowledge repositories and 30 KGs that are constructed semi-automatically via querying known object state classes to retrieve contextual information at different node depths. The performance of vision-language models for ZS-OSC is also assessed. Overall, the obtained results suggest performance improvement for ZS-OSC models on all datasets, while both the size of a KG and the sources utilized for their construction are important for task performance.
(More)