Authors:
Alexander Smirnov
1
;
Nikolay Teslya
1
;
a Nikolay Shilov
1
;
Diethard Frank
2
;
Elena Minina
2
and
Martin Kovacs
2
Affiliations:
1
SPIIRAS, SPC RAS, 14th line 39, St. Petersburg, Russia
;
2
Festo SE & Co. KG, Ruiter Str. 82, Esslingen, Germany
Keyword(s):
Machine Translation, DNN Translation, Comparison, Training, Transformers, Fine-tuning.
Abstract:
While processing customers’ feedback for an industrial company, one of the important tasks is the classification of customer inquiries. However, this task can produce a number of difficulties when the text of the message can be composed using a large number of languages. One of the solutions, in this case, is to determine the language of the text and translate it into a base language, for which the classifier will be developed. This paper compares open models for automatic translation of texts. The following models based on the Transformers architecture were selected for comparison: M2M100, mBART, OPUS-MT (Helsinki NLP). A test data set was formed containing texts specific to the subject area. Microsoft Azure Translation was chosen as the reference translation. Translations produced by each model were compared with the reference translation using two metrics: BLEU and METEOR. The possibility of fast fine-tuning of models was also investigated to improve the quality of the translation
of texts in the problem area. Among the reviewed models, M2M100 turned out to be the best in terms of translation quality, but it is also the most difficult to fine-tune it.
(More)