solutions to this issue and to a more general case of
differentiating ’meanings’ attached to text. We have
tested LSA (Latent Semantic Analysis) - mentioned
before as a pre-processing method to other classifiers,
but it can also be solely used as a classifier - and also
LDA (Latent Dirichlet Allocation). LSA is a bag-
of-words model that represents word co-occurrences,
meaning the structure within the documents is not
maintained. LDA on the other hand can be seen as
a mixture of topics that splits out words with certain
probabilities, so if applied to a set of documents and
topics, it will output topic representations for each
document.
Models have successfully been populated with
representatives of important campaigns, including the
bank scam mentioned earlier, to be used in block-
ing. If the cosine distance of any incoming message is
higher than a certain threshold, it represents an actual
spam, as opposed to a message including the CTA.
The higher the threshold the more accurate the model
is, which can be tuned to avoid false positives. The
case of forwarding however is a lost cause and would
still be blocked.
Other important challenges ahead of mobile mes-
saging abuse are bot-driven campaigns mentioned
earlier, whether originating from ordinary phone
numbers belonging to spammers, or infected mobiles.
During the analyzed period, we saw an instance of a
campaign distributing malware, which was a Trojan
SMS Agent /Opfake, representing a variant of a con-
tinually evolving infection typically used to send text
messages from infected mobile devices to premium
rate numbers. This malicious application creates a
mobile botnet by sending malicious links to numbers
in the contact list via SMS. Analysis of command-
and-control (C&C) activities revealed a wide spread
in a short period of time, from the initial infected de-
vices in Egypt reporting to the C&C server, to tens of
thousandsof SMS messages sent in the US, South Ko-
rea, India, and many other countries. This highlights
the importance of defenses at the client side, as well
as preventing malicious messaging, whether internal
to operators or across borders.
4 CONCLUSION
Although we have not seen an increase in volume, we
have come across a relatively high level of sophistica-
tion in the SMS spam world. The mobile ecosystem
is also undergoing major developments, with an in-
creased in the market share of smartphones, and the
wider adoption of IP-messaging over text messaging,
but this is not necessarily taking the heat off mobile
network operators. We have seen evidence that spam-
mers are using multiple delivery channels and are by
no means abandoning SMS messages just yet. Unlim-
ited text plans and the trusted nature of text messages
will always attract attackers and make it necessary for
operators to deploy effective defenses. This paper has
described the SMS spam ecosystem, covered some of
the most effective counter measures to a wide range
of SMS spam, along with the trends and challenges
ahead.
REFERENCES
Charles Lever, Manos Antonakakis, B. R. P. T. and Lee, W.
(2013). The core of the matter: Analyzing malicious
traffic in cellular carriers. In NDSS 2013.
Delany, S. J., Buckley, M., and Greene, D. (2012). Review:
Sms spam filtering: Methods and data. Expert Systems
with Applications, 39(10):9899–9908.
G´omez Hidalgo, J. M., Bringas, G. C., S´anz, E. P., and
Garc´ıa, F. C. (2006). Content based sms spam filter-
ing. In Proceedings of the 2006 ACM Symposium on
Document Engineering, DocEng ’06, pages 107–114,
New York, NY, USA. ACM.
GSMA (2014). The GSM association.
http://www.gsma.com/. [Online; accessed 20-
June-2014].
GSMA Spam Reporting (2011). Sms spam and mobile mes-
saging attacks - introduction, trends and examples.
Technical report.
Jiang, N., Jin, Y., Skudlark, A., and Zhang, Z.-L. (2013).
Greystar: Fast and accurate detection of sms spam
numbers in large cellular networks using grey phone
space. In Proceedings of the 22Nd USENIX Confer-
ence on Security, SEC’13, pages 1–16, Berkeley, CA,
USA. USENIX Association.
Kharif, O. (2012). Mobile Spam Texts
Hit 4.5 billion Raising Consumer Ire.
http://www.bloomberg.com/news/2012-04-
30/mobile-spam-texts-hit-4-5-billion-raising-
consumer-ire.html. [Online; accessed 20-June-2014].
M. Zubair Rafique, M. F. (2010). Sms spam detection
by operating on byte-level distributions using hidden
markov models. In Virus Bulletin 2010.
M3AAWG (2014). Messaging, malware and mobile anti-
abuse working group. http://www.maawg.org/. [On-
line; accessed 20-June-2014].
Murynets, I. and Piqueras Jover, R. (2012). Crime scene
investigation: Sms spam data analysis. In Proceed-
ings of the 2012 ACM Conference on Internet Mea-
surement Conference, IMC ’12, pages 441–452, New
York, NY, USA. ACM.
Narang, S. (2014). Snapchat spam: Sexy photos
lead to compromised branded short domains.
http://www.symantec.com/connect/blogs/snapchat-
spam-sexy-photos-lead-compromised-branded-short-
domains. [Online; accessed 16-January-2014].
Yvon, F. (2010). Rewriting the orthography of sms mes-
sages. Natural Language Engineering, 16:133–159.
SECRYPT2014-InternationalConferenceonSecurityandCryptography
228