to the high efficiency of the pre-trained model (Rad-
ford et al., 2019) and the successful application of the
fine-tuning.
Due to lack of resources we couldn’t fine-tune the
1.5B GPT-2 model, this model has performed better
than the smaller ones in a variety of scenarios (Rad-
ford et al., 2019). The use of this model could po-
tentially increase the formality and meaning preser-
vation of the transfers, the approximate scores would
have been a 0.85 for formality and 0.9 for meaning
preservation.
The histograms depicted on Figures 4 and 5
present a skewed left distribution, this means that
most of the scores on formality and meaning preser-
vation have a very high value in the range.
6 CONCLUSIONS
Thanks to this research we have established that suc-
cessful informal-to-formal style transfer tasks that
presents high scores in formality and meaning preser-
vation can be accomplished by fine-tuning a pre-
trained Transformer model like the GPT-2 (Radford
et al., 2019), being the GPT-2 original task to handle
sequential input data, with a parallel corpus and con-
necting it with a meaning preservation and formality
modules.
The lack of training input data impacted directly in
the fine-tuning procedure that we applied to the GPT-
2 pre-trained model (Radford et al., 2019), for in-
stance the GYAFC parallel corpus (Rao and Tetreault,
2018) with its 110k informal/formal sentence pairs
was enough, by a small margin, to produce the de-
sired results that we obtained. If the parallel corpus
used to perform the fine-tuning procedure had been
smaller our final results would have been strictly in-
ferior both in its meaning preservation and formality
scores.
The use of transformers is recommended for Nat-
ural Language Processing, specially in Style Transfer
tasks, due to its attention mechanisms which weight
the influence of different parts of input data. A dif-
ferent allocations of the fully connected layers could
potentially decrease the computational time required
for the style transfer, which would consequently di-
minish the time resources needed. Using the biggest
One-Shot model, like the 1.5B pre-trained model of
GPT-2 (Radford et al., 2019) or Few-shot learning
model, like the GPT-3 (Brown et al., 2020), would
potentially outperform in all steps of the process in
style transfer tasks and generate better results.
Our approach for Style Transfering might be used
for Question Answering for HRI (Burga-Gutierrez
et al., 2020) or furthermore using softness for tun-
ning the meaning preservation metrics (Ugarte et al.,
2015).
ACKNOWLEDGMENT
We would like to warmly thank Joel Tetreault, co-
author of the paper Dear Sir or Madam, May I Intro-
duce the GYAFC Dataset: Corpus, Benchmarks and
Metrics for Formality Style Transfer for providing us
with the parallel corpus dataset Grammarly’s Yahoo
Answer Formality Corpus” (GYAFC) which is based
on the L6 corpus of Yahoo Answers.
REFERENCES
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.,
Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,
Askell, A., Agarwal, S., Herbert-Voss, A., Krueger,
G., Henighan, T., Child, R., Ramesh, A., Ziegler,
D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler,
E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner,
C., McCandlish, S., Radford, A., Sutskever, I., and
Amodei, D. (2020). Language models are few-shot
learners. In NeurIPS.
Burga-Gutierrez, E., Vasquez-Chauca, B., and Ugarte, W.
(2020). Comparative analysis of question answering
models for HRI tasks with NAO in spanish. In SIM-
Big, pages 3–17. Springer.
Cer, D., Yang, Y., Kong, S., Hua, N., Limtiaco, N., John,
R. S., Constant, N., Guajardo-Cespedes, M., Yuan, S.,
Tar, C., Strope, B., and Kurzweil, R. (2018). Universal
sentence encoder for english. In EMNLP (Demonstra-
tion), pages 169–174. Association for Computational
Linguistics.
Chen, X., Zhang, M., and Zhu, K. Q. (2019). Align-
ing sentences between comparable texts of different
styles. In JIST (2), volume 1157 of Communications
in Computer and Information Science, pages 51–64.
Springer.
Gong, H., Bhat, S., Wu, L., Xiong, J., and Hwu, W. W.
(2019). Reinforcement learning based text style trans-
fer without parallel training corpus. In NAACL-HLT
(1), pages 3168–3180. Association for Computational
Linguistics.
Heilman, M., Cahill, A., Madnani, N., Lopez, M., Mul-
holland, M., and Tetreault, J. R. (2014). Predicting
grammaticality on an ordinal scale. In ACL (2), pages
174–180. The Association for Computer Linguistics.
Hoang, L., Wiseman, S., and Rush, A. M. (2018). Entity
tracking improves cloze-style reading comprehension.
In EMNLP, pages 1049–1055. Association for Com-
putational Linguistics.
John, V., Mou, L., Bahuleyan, H., and Vechtomova, O.
(2019). Disentangled representation learning for non-
FormalStyler: GPT based Model for Formal Style Transfer based on Formality and Meaning Preservation
55