unknown words, and all those papers do not focus
on network popular words. Fortunately, a few of
Chinese papers can be read as reference materials.
Those existing researches about translation of
network popular words mainly have three methods:
Rule-based method, Word vector method and
Combine Rule-based and Word vector-based method.
Network popular words have lots of
characteristics and categories, such as homophonic
network popular words (e.g. バイ バイ---- bye bye),
simplified network popular words (e.g. イケメン----
いける man), English homophonic network popular
words (e.g. ト リ ク ル ダ ウ ン ---- trickle-down
theory) and pictographic network popular words
(e.g. %>_<% ---- crying). Rule-based method can
use those characteristics to translate some network
popular words, but it is not able to translate certain
network popular words which do not have
significant characteristics. The accuracy of Rule-
based method is from 20% to 80% (Shang Fenfen,
2015).
Word-vector-based method basically use the
semantic relationships between words in the context
to find the synonyms of network popular words.
This method compares network popular word’s
word vector with other word’s word vector to get the
closest word as the synonym of the network popular
words, then the synonym can be regarded as the
network popular word, the accuracy of this method
can reach 80% (Zhao Xinyi, 2015). Word2vec is the
most popular tool, based on deep learning and
released by Google in 2013, to train word vector
now. This tool adopts two main model architectures,
continuous bag-of-words (CBOW) model and
continuous skip-gram mode. To learn the vector
representations of words: The CBOW architecture
predicts the current word based on the context, and
the skip-gram use the current word to predict
surrounding words (Jansen and Stefan, 2018).
The combining Rule-based and word-vector-
based method has the best precision to translate
network popular now. Because these two methods
are independent, this method has achieved a higher
accuracy, the precision rate is about 85% in certain
context data.
Since network popular words are most used in
SNS, not in formal paper or news. The very precise
translation of network popular words is unnecessary.
It is enough to just transmit the rough meaning and
feeling of network popular words in the social
internet. Therefore, this paper aims to get the
meaning of network popular words by not only
semantic analysis but also sentiment analysis.
However, the existing two methods Word-
vector-based method and combining Rule-based and
word-vector-based method are mainly using word’s
vector, it is important to obtain a very precise and
complete data to train word vector. Due to the
imperfect training data and training tool, the data of
word vector cannot be perfect so that a few of
chosen synonyms are not suitable. In this paper, to
adjust those unsuitable synonyms and get a higher
translation precision, we use sentiment polarity
analysis to supplement the word-vector-based
method.
Sentiment analysis can be improved by dividing
sentiment into different types. As Plutchik’s Wheel
of Emotions shows that sentiment words not only
can be divided into positive or negative polarity, but
also can be divided into detailed emotional types
(Plutchik and Robert, 1991), such as joy, anger,
sadness and so on. This is also a way to get more
precise result of sentiment analysis. And different
sentiment words which belong to the same polarity
usually have different sentiment intensity (Wang,
2015), e.g. “laugh” has a larger intensity than
“smile.”
Figure 1: Plutchik’s Wheel of Emotions.
There are mainly three types of methods in
identifying polarity of Chinese sentiment words:
Thesaurus-based method, Corpus-based method and
Morpheme-based method.
Thesaurus-based method which computes
similarity distance between reference words and the
given sentiment word in thesaurus. It acquires
sentiment words mainly by synonyms, antonyms,
and hierarchies in thesaurus such as WordNet and
HowNet (Kim and Soo-min, 2004). This method