resources including parallel corpus, web, comparable
corpus. The combination of both transliteration
generation and mining is known as the hybrid or
fusion approach. In this study, transliteration
generation was applied. (Prabhakar & Pal, 2018)
In (Mahata et al., 2019) English segments have
been translated using a character-based sequence-to-
sequence(seq2seq) model with attention mechanism
and back transliteration of Romanized Bengali
segments to Devanagari has been performed using a
seq2seq character-based model. Although, the model
has shown a testing accuracy of 48.2% which is
comparatively low. Furthermore, a language model
has been built to solve the grammatical errors that
arise when combining translated and back
transliterated subsequences of text. Similarly, a
dictionary lookup-based approach for transliteration
of Gujarati text written in Roman script to the native
script has been proposed. The variations that are not
contained in the dictionary have been handled using
Google Translation API (Patel & Parikh, 2020). As
Sinhala is a low-resource language, it is difficult to
find or develop corpora with a sufficient number of
texts to obtain a good accuracy for language
identification using this approach. Hence, this method
is not much applicable in the Sinhala-English context.
Some studies have been conducted in the last few
years for Sinhala-English code-mixed text. However,
the number of studies conducted in this area is
limited. In (Smith & Thayasivam, 2019) and
(Shanmugalingam & Sumathipala, 2019) two
different approaches for language identification of
Sinhala-English code-mixed text have been
introduced. Sentence level and word level annotation
have been performed in (Smith & Thayasivam, 2019).
The highest accuracy of 92.1% for language
identification has been gained for the XGB model
with bigram surpassing Support Vector Machine
(SVM) and neural network models. Similarly,
Shanmugalingam and Sumathipala have introduced
another approach for word-level language
classification using machine learning models where
90.5% accuracy has been achieved for the Random
Forest (RF) classifier (Shanmugalingam &
Sumathipala, 2019). Nevertheless, almost all the
existing research on language identification of
Singlish texts has targeted detecting Romanized
Sinhala text when mixed up with English but none of
them focused on identifying Singlish texts when
mixed up with both pure English words and Sinhala
words written in Sinhala Unicode.
Different approaches have been proposed for
transliteration of Singlish to Sinhala in recent years.
Liwera and Ranathunga have conducted a study
applying a combination of trigram and rule-based
approach that acquired a 77% of accuracy (Liwera &
Ranathunga, 2021). In addition, a study for
sentimental analysis of Sinhala-English code-mixed
text has been carried out in which transliteration of
Singlish to Sinhala was performed using a Singlish to
Sinhala dictionary. The Singlish words identified in
the comments have been replaced by the Sinhala
words in the created dictionary. This approach has
gained an accuracy of 72% (Sentimental Analysis of
Comments in Social Media in Sinhala - English
Code-Mixed Language Using Supervised Learning
Techniques, 2020). However, in the rule-based
approach mapping of English characters to Sinhala
Unicode has to be performed manually and it requires
a better understanding and knowledge of both
languages. As there are multiple matches, the
manually selected pair would not be the best possible
candidate. De Silva has utilized an encoder-decoder
LSTM model for Singlish to Sinhala transliteration
with the use of a parallel corpus of 6000 Singlish
words and received an accuracy of 40%. Although,
the accuracy of the model was comparatively low (de
Silva, 2019).
2.2 Recommendation System
Improvement of the social media recommender
system decides the usability of users’ activity. It has
been stated that an ensemble model developed by
merging the K-nearest neighbor and Naïve Bayes can
be used to achieve a majority rating for a
recommender system. In addition, fuzzy c means, and
SVM have been used in this work to evaluate the
overall average accuracy(Fayyaz et al., 2020).
Furthermore, in social media networks,
recommendation models should be able to recognize
users’ dynamic information. An algorithm has been
suggested by using time factors, social relationships,
response information, and geolocation which has
been capable of mitigating the drawbacks of the
traditional CF algorithm. The proposed algorithm has
recognized users’ behavioral patterns in order to
recommend content in social media networks while
maintaining a high-level efficiency (Cheng et al.,
2015).
Furthermore, in (Amato et al., 2019) it has been
claimed that most of the latest generation
recommendation systems have been built of pre-
filtering modules and advocated a user-centered
approach for recommendations. The Root Mean
Square Error (RMSE) and Mean Absolute Error
(MAE) remained at 0.94 and 1.6 respectively in the
model they have proposed.