One of the domains that can be modeled as a meta-
learning problem is text classification, been one of
the major tasks in natural language processing. Many
papers (Wang, 2018; Sun et al., 2020; Yamada and
Shindo, 2019; Yang et al., 2019; Zaheer et al., 2021)
has tackled this problem with a deep architecture that
needs larges datasets to be trained one, large datasets
that can not be found in the majority of fields where
text classification is solicited, also these models are
task-specific which means that once trained for a task
they can not be used to generalize over an other task.
Taking into account these elements the need for a
meta-learning model for text classification can be felt.
In this paper we introduced a new simple architec-
ture deriving from metric-based meta-learning, seek-
ing to propose a solution for the lack of data in the
text classification field. Our proposal is based on the
prototypical networks (Snell et al., 2017) we adapted
this model on the few-shot text classification task by
adding an embedding layer aiming to transform sen-
tences received in input to a usable format and by sub-
stituting the softmax layer applied to distance vector
(between the classes centers and the query examples
) after the passage of the query examples through the
prototypical network by a non -linear classifier.
2 BACKGROUND
In this section, we introduce the main notions and def-
initions necessary to understand the framework and
the context of our proposal.
2.1 Few-shot Classification
Few-shot learning (FSL) is the most known meta-
Learning problem. It’s a powerful paradigm dealing
with tasks suffering from the lack of training exam-
ples, it involves training models on a bunch of similar
tasks and testing their ability to generalize over new
different tasks. Unlike deep learning classical models
that require huge amounts of data to train on a single
specific task and in which the knowledge gained in
the training is not usable to generalize over new tasks,
few-shot architectures are trained over a large number
of similar tasks (text classification for example) with
few examples for each task and the effectiveness of
these models measured on their ability to generalize
over new but similar tasks. Few-shot classification is
the most important application of the FSL, consist-
ing of applying FSL to achieve classification tasks. In
recent years few-shot classification has been highly
correlated with the notion of episodes. Introduced by
(Vinyals et al., 2016) an episode can represent a clas-
sification task composed of the training set (support
set) and the testing set (query set). The support set
contains K examples from N classes sampled for ev-
ery episode. We thus talk about N-way-K-Shot clas-
sification.
2.2 Task Definition
In the machine learning/deep learning field Datasets
used to train the models are regularly sampled into
3 sub-sets: The train set D
train
used to update the
weights of the model during the training, the test set
D
test
disjointed from the training set used to evaluate
the generalization power of the model at the end of the
training and the validation set D
val
used to select the
hyper-parameters of the model before the training or
to approximate the generalization power of the model
during the training.
However, for few-shot classification due to the
lack of data another sampling strategy that introduces
the notion of metasets is used. A metaset M is com-
posed from the two main regular sub-sets (D
train
,
D
test
). We thus talk about new type of sets, the meta-
train, meta-test and meta-validation sets (M
train
, M
test
and M
val
respectively ). To compose these metasets, a
widely adopted strategy proposed by (Vinyals et al.,
2016) is applied, it consists in sampling at every
epoch a n
episodes
number of episodes from a M
j
set ( j
can be train, test or val), such that for every episode
we select a subset V of N classes from L
j
set of classes
available in M
j
.
V will be used to compose the support set S (S =
D
j
train
) by selecting randomly K elements from each
class present in V from the M
j
in the same way the
query set Q (Q = D
j
test
) will be composed only this
time we select t elements from every class in V .
2.3 Prototypical Networks
Prototypical networks for few-shot learning is a pow-
erful algorithm proposed by (Snell et al., 2017). This
model belongs to the metric meta-learning models
and despite its simplicity, it achieves respectable per-
formances in k-shot learning task (achieves the state
of the art in zero-shot learning).
A prototypical network (Snell et al., 2017) is
based on the idea that there is an embedding space
where all the elements that belong to the same class
represented in this space are grouped around a single
prototype representative of this class. The prototype
of a class c
n
is calculated using the equation 1:
Hybrid Prototypical Networks Augmented by a Non-linear Classifier
791