mapping relationship between prompt and verbalizer,
the difference between pre-training tasks and
downstream tasks can be reduced by predicting
[mask]. LAMA (Petroni et al., 2019) proposed using
cloze to obtain the knowledge of the pre-trained
model without fine-tuning.
Soft prompt replaces fine-tuning by training a new
word vector and achieves the effect comparable to
fine-tuning with greatly reduced calculation. (Xiao
Liu et al., 2021) proposed to enhance the natural
language understanding ability of the pre-trained
model by automatically searching for a better prompt
in the semantic space. (Xiang Lisa Li et al., 2021)
proposed to replace fine-tuning with prompt and
prefix adjustment. Only 0.1% of the parameters need
to be trained, to get performance comparable to fine-
tuning. (Xiao Liu et al., 2021) further applied prompt-
tuning to complex natural language understanding
tasks.
On the other hand, some researchers have added
adversarial training in fine-tuning to improve the
robustness of the model. (Chen Zhu et al., 2019)
proposed to improve the robustness of the model by
adding adversarial disturbance in word embedding
and minimizing the adversarial risk generated in
different areas around the input sample. (Haoming
Jiang et al., 2020) proposed a framework of smooth
regularization and Bregman near point optimization
to prevent radical updates during model adversarial
training.
In this paper, we propose a training method based
on prompt-tuning with adversarial regularization:
1. An adversarial regularization algorithm is
proposed. Add disturbance in word embedding, and
increase the robustness of the model by updating the
disturbance in a small range, so that the model can
obtain higher accuracy under fewer supervision data:
2. In the process of prompt tuning, adversarial
regularization is organically incorporated to improve
the robustness of the model.
2 RELATED WORK
With the advent of large-scale pre-trained models,
deep learning has rapidly moved closer to large-scale
pre-trained models, changing the traditional mode of
deep learning and becoming a new benchmark for
various deep learning tasks. The more model
parameters, the more knowledge learned, the better
generalization ability, and the better performance in
the training of downstream tasks.
On the other hand, the large-scale pre-trained
model is extremely complex, and the limited
supervised data can not pry hundreds of millions of
parameters, resulting in poor transferability of the
large-scale pre-trained model. In order to solve this
problem, researchers propose a manual prompt, that
is, to design a prompt template for task data so that
the downstream task is as close as possible to the pre-
trained task. This method greatly reduces the
requirements of downstream tasks on the amount of
supervised data and allows small parameters to pry
the large model. (Shengding Hu et al., 2021) proposed
using an external knowledge base to expand the
mapped tag language space, which greatly improves
the accuracy of short text classification. However,
this method relies heavily on prompt template and
validation set data, and its performance is not stable.
LAMA (Petroni et al., 2019) showed the cases in the
following Table 2-1 in the knowledge inquiry. It can
be seen that the change of one of the words will lead
to a huge difference in the results.
Table 2-1: Effect of different prompt templates on
accuracy.
Prom
? [Y].
31.40
[X] is located in which
country? In [Y]
51.08
After the discrete manual prompt, researchers
have proposed a continuous automatic prompt, that is,
freezing the parameters of the pre-trained model and
only fine-tuning the continuous prompt. (Xiang Lisa
Li et al., 2021) proposed that by adding prefixes,
0.1% of the parameters can be trained to obtain
performance matching with fine-tuning, which
proved that GPT was equally excellent in natural
language understanding tasks. (Brian Lester et al.
2021) proposed to add trainable continuous
embedding (also known as continuous prompts) to
word embedding in the original sequence, freeze the
pre-trained model parameters during training, and
only update the continuous prompts to complete
downstream tasks. With the continuous development
of prompt-tuning, it has achieved the same effect as
fine-tuning. (Yuxian Gu et al., 2021) added prompt in
the pre-training stage to pre-train the prompt, so as to
obtain better initialization of prompt in the
downstream task and achieve better performance than
fine-tuning in the classification task. (Brian Lester et
al., 2021) pointed out that the effect of prompt tuning
is positively correlated with the size of the pre-trained