erating system and should be verified. In order to
process all these data, we decided to build a classi-
fying algorithm. This algorithm should classify in-
coming data according to clusters defined by the sub-
space clustering method. Client’s data that deviates
from the “normal” clusters defining his previous be-
havior may be subject to further investigations by the
network FDS. Suspects may also arise when the clas-
sification algorithm does not seem able to make new
acquired data fit previous defined clusters. Also, data
that is classified to clusters previously identified by
being acquainted with fraud should be forwarded to
the next step of the FDS. On the contrary, data that is
classified as belonging to previously defined non risky
clusters may be considered safe. Or, for a straight
client when his new data corresponds to his cluster-
ing profile, no further measures should be taken, as
data presents close to null risk.
5 CONCLUSIONS
Fraud presents itself as a major concern for telecom-
munication providers in today’s competitive market.
As competition tends to decrease operating margins,
telecom companies try to cut in costs, such as those
caused by fraudsters. Due to the great amount of
data generated by each customer transaction, fraud
detection cannot be addressed only by humans. Many
methodologies have been presented in the litera-
ture. Unsupervised clustering has been used to au-
tomatically group transactions into clusters of similar
records. When run in regular intervals of time, this
tool allows a telecom to keep up to date with the ever
evolving fraud dynamics.
This paper proposes a bi-level clustering algo-
rithm to address fraud in telecommunications. In the
first level, transactional records are grouped into clus-
ters for each one of those services. Once this proce-
dure is done, data is aggregated for each SIM card
belonging to clients. As fraud, or just suspicious be-
havior, may be performed in only some of the services
provided by the telecom, the clustering algorithm ap-
plied to the second level should be of the subspace
type. Current research is concerned with the first level
clustering.
REFERENCES
Barse, E., Kvarnstrom, H., and Jonsson, E. (2003). Synthe-
sizing test data for fraud detection systems. In Pro-
ceedings of the 19th Annual Computer Security Appli-
cations Conference (ACSAC 2003). Citeseer.
Berkhin, P. (2006). A survey of clustering data mining tech-
niques. Grouping Multidimensional Data, pages 25-
71.
Bishop, C. (1995). Neural networks for pattern recognition.
Oxford university press.
Cortes, C. and Pregibon, D. (2001). Signature-based meth-
ods for data streams. Data Mining and Knowledge
Discovery, 5(3):167-182.
Cortes, C., Pregibon, D., and Volinsky, C. (2002). Commu-
nities of interest. Intelligent Data Analysis, 6(3):211-
219.
Cortes˜ao, L., Martins, F., Rosa, A., and Carvalho, P. (2005).
Fraud management systems in telecommunications: a
practical approach. In Proceeding of ICT.
Est´evez, P., Held, C., and Perez, C. (2006). Subscription
fraud prevention in telecommunications using fuzzy
rules and neural networks. Expert Systems with Appli-
cations, 31(2):337-344.
Guha, S., Rastogi, R., and Shim, K. (2000). Rock: A ro-
bust clustering algorithm for categorical attributes* 1.
Information Systems, 25(5):345-366.
Hand, D. (1997). Construction and assessment of classifi-
cation rules, volume 15. Wiley.
Huang, Z. (1998). Extensions to the k-means algorithm for
clustering large data sets with categorical values. Data
Mining and Knowledge Discovery, 2(3):283-304.
Krenker, A., Volk, M., Sedlar, U., Better, J., and Kos,
A. (2009). Bidirectional Artificial Neural Networks
for Mobile-Phone Fraud Detection. Etri Journal,
31(1):92-94.
MacQueen, J. et al. (1967). Some methods for classification
and analysis of multivariate observations. In Proceed-
ings of the fifth Berkeley symposium on mathematical
statistics and probability, volume 1, pages 281-297.
Moreau, Y., Preneel, B., Burge, P., Shawe-Taylor, J., Sto-
ermann, C., and Cooke, C. (1996). Novel techniques
for fraud detection in mobile telecommunication net-
works. Proceedings of ACTS Mobile Telecommunica-
tions Summit, Granada, Spain.
Parsons, L., Haque, E., and Liu, H. (2004). Subspace
clustering for high dimensional data: a review. ACM
SIGKDD Explorations Newsletter, 6(1):90-105.
Phua, C., Alahakoon, D., and Lee, V. (2004). Minority re-
port in fraud detection: classification of skewed data.
ACM SIGKDD Explorations Newsletter, 6(1):50-59.
Takagi, H. (1991). Introduction to fuzzy systems, neural
networks, and genetic algorithms. Intelligent Hybrid
Systems: Fuzzy Logic, Neural Networks, and Genetic
Algorithms, pages 405-468.
Viaene, S., Derrig, R., and Dedene, G. (2004). A case study
of applying boosting Naive Bayes to claim fraud di-
agnosis. IEEE Transactions on Knowledge and Data
Engineering, 16(5):612-620.
BI-LEVEL CLUSTERING IN TELECOMMUNICATION FRAUD
131