Authors:
G. J. Burghouts
;
K. Schutte
;
M. Kruithof
;
W. Huizinga
;
F. Ruis
and
H. Kuijf
Affiliation:
TNO, Intelligent Imaging, The Netherlands
Keyword(s):
Classifier Synthesis, Knowledge Representation, Semantics, Attributes, Class Descriptions, Large Language Models, Zero-Shot Learning, Few-Shot Learning.
Abstract:
Various good methods have been proposed for either zero-shot or few-shot learning, but these are commonly unsuited for both; whereas in practice one often starts without labels and some might become available later. We propose a method that naturally ties zero- and few-shot learning together. We initiate a zero-shot model from prior knowledge about the classes, by recombining the weights from a classification head via a linear reconstruction that is sparse to avoid overfitting. Our mapping is an explicit transfer of knowledge from known to new classes, hence it can be inspected and visualized, which is impossible with recently popular implicit prompt learning strategies. Our mapping is used to construct a classifier for the new class, by adapting the neural weights of the classifiers for the known classes. Effectively we synthesize a new classifier. Our method is flexible: we show its efficacy for various knowledge representations and various neural networks (whereas prompt learning
is limited to language-vision models). Our synthesized classifier can operate directly on test samples in a zero-shot fashion. We outperform CLIP especially for uncommon image classes, sometimes by margins up to 32%. Because the synthesized classifier consists of a tensor layer, it can be optimized further when a (few) labeled images become available. For few-shot learning, our synthesized classifier provides a kickstart. With one label per class, it outperforms strong baselines that require annotation of attributes or heavy pretraining (CLIP) by 8%, and increases accuracy by 39% relative to conventional classifier initialization. The code is available.
(More)