the 26th International Conference on Neural Informa-
tion Processing Systems - Volume 2, page 2121–2129.
Garg, A., Sani, D., and Anand, S. (2022). Learning hierar-
chy aware features for reducing mistake severity. In
Avidan, S., Brostow, G., Ciss
´
e, M., Farinella, G. M.,
and Hassner, T., editors, Computer Vision – ECCV
2022, pages 252–267, Cham. Springer Nature.
Garnot, V. S. F. and Landrieu, L. (2020). Leveraging class
hierarchies with metric-guided prototype learning. In
British Machine Vision Conference.
Halkidi, M. and Vazirgiannis, M. (2001). Clustering validity
assessment: finding the optimal partitioning of a data
set. In 2001 IEEE International Conference on Data
Mining. IEEE Comput. Soc.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep
residual learning for image recognition. In Proc. of
the IEEE conference on computer vision and pattern
recognition.
Incitti, F., Urli, F., and Snidaro, L. (2023). Beyond word
embeddings: A survey. Information Fusion, 89:418–
436.
Khan, S., Rahmani, H., Shah, S. A. A., Bennamoun, M.,
Medioni, G., and Dickinson, S. (2018). A guide to
convolutional neural networks for computer vision.
Springer.
Krizhevsky, A. and Hinton, G. (2009). Learning multiple
layers of features from tiny images. Technical report,
University of Toronto.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
agenet classification with deep convolutional neural
networks. In Advances in neural information process-
ing systems, pages 1097–1105.
Kroshchanka, A., Golovko, V., Mikhno, E., Kovalev, M.,
Zahariev, V., and Zagorskij, A. (2021). A neural-
symbolic approach to computer vision. In Interna-
tional Conference on Open Semantic Technologies for
Intelligent Systems, pages 282–309. Springer.
Kusupati, A., Bhatt, G., Rege, A., Wallingford, M., Sinha,
A., Ramanujan, V., Howard-Snyder, W., Chen, K.,
Kakade, S., Jain, P., and Farhadi, A. (2022). Ma-
tryoshka representation learning. In Advances in Neu-
ral Information Processing Systems, volume 35, pages
30233–30249. Curran Associates, Inc.
Liu, Y., Li, Z., Xiong, H., Gao, X., and Wu, J. (2010). Un-
derstanding of internal clustering validation measures.
In 2010 IEEE 10th International Conference on Data
Mining (ICDM). IEEE.
Mikolov, T., Chen, K., Corrado, G. S., and Dean, J. (2013a).
Efficient estimation of word representations in vector
space. In International Conference on Learning Rep-
resentations.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and
Dean, J. (2013b). Distributed representations of words
and phrases and their compositionality. In Neural In-
formation Processing Systems.
Miller, G. A. (1995). Wordnet: a lexical database for en-
glish. Communications of the ACM, page 39–41.
Pasini, A., Giobergia, F., Pastor, E., and Baralis, E. (2022).
Semantic image collection summarization with fre-
quent subgraph mining. IEEE Access, 10:131747–
131764.
Perotti, A., Bertolotto, S., Pastor, E., and Panisson, A.
(2023). Beyond one-hot-encoding: Injecting seman-
tics to drive image classifiers. In World Conference
on Explainable Artificial Intelligence, pages 525–548.
Springer.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G.,
Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark,
J., et al. (2021). Learning transferable visual models
from natural language supervision. In International
conference on machine learning, pages 8748–8763.
PMLR.
Redmon, J. and Farhadi, A. (2016). Yolo9000: Better,
faster, stronger. 2017 IEEE Conference on Computer
Vision and Pattern Recognition, pages 6517–6525.
Ren, M., Triantafillou, E., Ravi, S., Snell, J., Swersky, K.,
Tenenbaum, J. B., Larochelle, H., and Zemel, R. S.
(2018). Meta-learning for semi-supervised few-shot
classification. In Proceedings of 6th International
Conference on Learning Representations, ICLR.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.,
Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bern-
stein, M. S., Berg, A. C., and Fei-Fei, L. (2014). Im-
agenet large scale visual recognition challenge. Inter-
national Journal of Computer Vision, 115:211 – 252.
Sainburg, T., McInnes, L., and Gentner, T. Q. (2021).
Parametric umap embeddings for representation and
semisupervised learning. Neural Computation,
33(11):2881–2907.
Silla, C. N. and Freitas, A. A. (2011). A survey of hierarchi-
cal classification across different application domains.
Data Mining and Knowledge Discovery, 22:31–72.
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recognition.
CoRR, abs/1409.1556.
Tan, M. and Le, Q. (2019). EfficientNet: Rethinking model
scaling for convolutional neural networks. In Pro-
ceedings of the 36th International Conference on Ma-
chine Learning, volume 97 of Proceedings of Machine
Learning Research, pages 6105–6114. PMLR.
van der Maaten, L. and Hinton, G. (2008). Visualizing data
using t-sne. Journal of Machine Learning Research,
9(86):2579–2605.
Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun,
C., Shepard, A., Adam, H., Perona, P., and Belongie,
S. (2018). The inaturalist species classification and
detection dataset. In 2018 IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages
8769–8778.
Verma, N., Mahajan, D., Sellamanickam, S., and Nair, V.
(2012). Learning hierarchical similarity metrics. In
2012 IEEE Conference on Computer Vision and Pat-
tern Recognition (CVPR). IEEE.
Wu, H., Merler, M., Uceda-Sosa, R., and Smith, J. R.
(2016). Learning to make better mistakes: Semantics-
aware visual food recognition. In Proceedings of the
24th ACM International Conference on Multimedia,
page 172–176.
LLM-Generated Class Descriptions for Semantically Meaningful Image Classification
61