
ACKNOWLEDGEMENTS
The research project was supported by the Hellenic
Foundation for Research and Innovation (H.F.R.I.)
under the 3rd Call for H.F.R.I. Research Projects to
support Post-Doctoral Researchers (Project Number
7678 InterLinK: Visual Recognition and Anticipation
of Human-Object Interactions using Deep Learning,
Knowledge Graphs and Reasoning).
REFERENCES
Adhikari, A., Yuan, X., C
ˆ
ot
´
e, M.-A., Zelinka, M., Rondeau,
M.-A., Laroche, R., Poupart, P., Tang, J., Trischler,
A., and Hamilton, W. (2020). Learning dynamic be-
lief graphs to generalize on text-based games. NIPS,
33:3045–3057.
Alam, M., Buscaldi, D., Cochez, M., Osborne, F., Refor-
giato Recupero, D., Sack, H., Monka, S., Halilaj, L.,
Rettinger, A., Alam, M., Buscaldi, D., Cochez, M.,
Osborne, F., Refogiato Recupero, D., and Sack, H.
(2022). A survey on visual transfer learning using
knowledge graphs. Semant. Web, 13(3):477510.
Alberts, H., Huang, T., Deshpande, Y., Liu, Y., Cho, K.,
Vania, C., and Calixto, I. (2020). Visualsem: a high-
quality knowledge graph for vision and language.
arXiv preprint arXiv:2008.09150.
Anh, L.-T., Manh, N.-D., Jicheng, Y., Trung, K.-T., Man-
fred, H., and Danh, L.-P. (2021). Visionkg: Towards a
unified vision knowledge graph. In Proceedings of the
ISWC 2021 Posters & Demonstrations Track, Work-
shop Proceedings.
Bastings, J., Titov, I., Aziz, W., Marcheggiani, D., and
Sima’an, K. (2017). Graph convolutional encoders
for syntax-aware neural machine translation. arXiv
preprint arXiv:1704.04675.
Bhagavatula, C., Bras, R. L., Malaviya, C., Sakaguchi, K.,
Holtzman, A., Rashkin, H., Downey, D., Yih, S. W.-t.,
and Choi, Y. (2019). Abductive commonsense reason-
ing. arXiv preprint arXiv:1908.05739.
Bosselut, A., Rashkin, H., Sap, M., Malaviya, C., Celikyil-
maz, A., and Choi, Y. (2019). Comet: Commonsense
transformers for automatic knowledge graph construc-
tion. arXiv preprint arXiv:1906.05317.
Changpinyo, S., Chao, W.-L., Gong, B., and Sha, F. (2016).
Synthesized classifiers for zero-shot learning. In
Proceedings of the IEEE CVPR, pages 5327–5336.
Chen, J., Geng, Y., Chen, Z., Pan, J. Z., He, Y., Zhang,
W., Horrocks, I., and Chen, H. (2023). Zero-shot and
few-shot learning with knowledge graphs: A compre-
hensive survey. Proceedings of the IEEE.
Duan, K., Parikh, D., Crandall, D., and Grauman, K.
(2012). Discovering localized attributes for fine-
grained recognition. In 2012 IEEE CVPR, pages
3474–3481. IEEE.
Fellbaum, C. (2010). Wordnet. In Theory and applications
of ontology: computer applications, pages 231–243.
Springer.
Ghosh, P., Saini, N., Davis, L. S., and Shrivastava, A.
(2020). All about knowledge graphs for actions. arXiv
preprint arXiv:2008.12432.
Giuliari, F., Skenderi, G., Cristani, M., Wang, Y., and
Del Bue, A. (2022). Spatial commonsense graph for
object localisation in partial scenes. In Proceedings of
the IEEE/CVF CVPR, pages 19518–19527.
Gouidis, F., Patkos, T., Argyros, A., and Plexousakis, D.
(2022). Detecting object states vs detecting objects:
A new dataset and a quantitative experimental study.
In Proceedings of the 17th International Joint Con-
ference on Computer Vision, Imaging and Computer
Graphics Theory and Applications (VISAPP), vol-
ume 5, pages 590–600.
Gouidis, F., Patkos, T., Argyros, A., and Plexousakis, D.
(2023). Leveraging knowledge graphs for zero-shot
object-agnostic state classification. arXiv preprint
arXiv:2307.12179.
Goyal, R., Kahou, S., Michalski, V., Materzynska, J., West-
phal, S., Kim, H., Haenel, V., Fruend, I., Yianilos, P.,
Mueller-Freitag, M., Hoppe, F., Thurau, C., Bax, I.,
and Memisevic, R. (2017). The something something
video database for learning and evaluating visual com-
mon sense. In 2017 IEEE ICCV, pages 5843–5851,
Los Alamitos, CA, USA. IEEE Computer Society.
Hamilton, W., Ying, Z., and Leskovec, J. (2017). Inductive
representation learning on large graphs. NIPS, 30.
Ilievski, F., Szekely, P., and Zhang, B. (2021). Cskg: The
commonsense knowledge graph. Extended Semantic
Web Conference (ESWC).
Isola, P., Lim, J. J., and Adelson, E. H. (2015). Discov-
ering states and transformations in image collections.
Proceedings of the IEEE Computer Society CVPR,
07-12-June:1383–1391.
Jia, C., Yang, Y., Xia, Y., Chen, Y.-T., Parekh, Z., Pham,
H., Le, Q., Sung, Y.-H., Li, Z., and Duerig, T. (2021).
Scaling up visual and vision-language representation
learning with noisy text supervision. In ICML, pages
4904–4916. PMLR.
Kampffmeyer, M., Chen, Y., Liang, X., Wang, H.,
Zhang, Y., and Xing, E. P. (2019). Rethinking
knowledge graph propagation for zero-shot learning.
Proceedings of the IEEE Computer Society CVPR,
2019-June:11479–11488.
Kipf, T. N. and Welling, M. (2016). Semi-supervised clas-
sification with graph convolutional networks. arXiv
preprint arXiv:1609.02907.
Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata,
K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-
J., Shamma, D. A., et al. (2017). Visual genome:
Connecting language and vision using crowdsourced
dense image annotations. IJCV, 123:32–73.
Lampert, C. H., Nickisch, H., and Harmeling, S. (2013).
Attribute-based classification for zero-shot visual ob-
ject categorization. IEEE Trans. on PAMI, 36(3):453–
465.
Li, J., Li, D., Xiong, C., and Hoi, S. (2022). Blip:
Bootstrapping language-image pre-training for uni-
fied vision-language understanding and generation. In
ICML, pages 12888–12900. PMLR.
VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications
748