Authors:
Alaa Mohasseb
1
;
Benjamin Aziz
1
and
Andreas Kanavos
2
Affiliations:
1
School of Computing, University of Portsmouth, Portsmouth, U.K.
;
2
Computer Engineering and Informatics Department, University of Patras, Patras, Greece
Keyword(s):
Information Retrieval, Spam (SMS) Detection, Risk Assessment, Class Imbalance, Machine Learning.
Abstract:
Short Message Service (SMS) constitutes one of the most used communication medium. It has become an integral part of people’s lives and like other communication media, SMS texts have been used for propagating spam messages. Despite the fact that a broad range of spam techniques have been proposed to reduce the frequency of such incidents, many difficulties are still present due to text ambiguity; there, the same words can be used in seemingly similar texts which makes it more difficult to identify spam messages. In this paper, we propose an approach for identifying and classifying spam SMS based on the Syntactical features and patterns of the message. The proposed approach consists of four main parts, namely, SMS Pre-processing, Syntactical Features Extraction and Pattern Formulation, Classification and, Risk Analysis. Experimental results show that the proposed approach achieves a good level of accuracy. In addition, to show the effectiveness of handling class imbalance on the class
ification performance, two additional experiments were conducted using the implementation of the SMOTE algorithm. There, the results depicted that handling class imbalance help in improving identification and classification accuracy. Furthermore, based on the above, a risk model has been proposed that addresses the risk probability and the impact of spam SMS.
(More)