Authors:
Motonobu Yoshida
1
;
Kazuyuki Matsumoto
2
;
Minoru Yoshida
2
and
Kenji Kita
2
Affiliations:
1
Tokushima University, 2-1, Minami-jousangima-cho, Tokushima-shi, Tokushima, Japan
;
2
Tokushima University, Graduate School of Technology, Industrial and Social Sciences, 2-1, Minami-jousangima-cho, Tokushima-shi, Tokushima, Japan
Keyword(s):
Toxic Expression, BERT, Classification, Text Correction, Flame War.
Abstract:
This paper describes a system for converting posts with toxic expression on social media, such as those containing slander and libel, into less-toxic sentences. In recent years, the number of social media users as well as the cases of online flame wars has been increasing. Therefore, to prevent flaming, we first use a prediction model based on Bidirectional Encoder Representations from Transformers (BERT) to determine whether a sentence is likely to be flamed before it is posted. The highest classification accuracy recorded 82% with the Japanese Spoken Language Field Adaptive BERT Model (Japanese Spoken BERT model) as a pre-trained model. Then, for sentences that are judged to be toxic, we propose a system that uses BERT’s masked word prediction to convert toxic expressions into safe expressions, thereby converting them into sentences with mitigated aggression. In addition, the BERTScore is used to quantify whether the meaning of the converted sentence has changed in meaning compared
to the original sentence and evaluate whether the modified sentence is safe while preserving the meaning of the original sentence.
(More)