SAURIDA: CLOUD COMPUTING BASED
Data Mining System in Telecommunication Industry
Qing Ke, Bin Wu
School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, China
Yuxiao Dong, Lei Qin
School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, China
Keywords: Cloud Computing, Data Mining, Data Flow, Telecommunication.
Abstract: Telecommunication data mining has been often used as a background application to motivate many
technical problems in data mining research. However, traditional mining algorithms face new challenges
which are tremendous amount of data and high time and space complexity of algorithms. Recently, Map-
Reduce parallel computing model has been emerging. In this paper, we combine data mining with Map-
Reduce based cloud computing to meet the challenges and showcase our applied system named Saurida. As
a full functionality system, we provide data flow oriented preprocessing utilities which achieve almost linear
speedup and extensively support for user defined functions, and we also provide many data mining
algorithms. More importantly, we elaborate several application scenarios as real-word requirements of
telecom industry by employing a large volume of data obtained from telecom operator. And we validate our
system has a good scalability, effectiveness and efficiency.
1 INTRODUCTION
Telecommunication data analysis has stimulated
great interests in recent years. Typical application
scenarios are customer churn prediction and
customers’ relationship management.
However, these analysis methods face new
challenges. Firstly, the telecom industry generates
and stores a tremendous amount of data. Secondly,
many data mining algorithms have high time and
space complexity.
Traditional business solutions of data mining are
commercial database or data warehouse systems or
commercial data mining tools. However, these
systems or tools are low scalability and high cost. In
research areas, Wang et.al (Wang, 2009) developed
a working data mining system on real mobile
communication data, but the system mainly focused
on algorithms in research such as sequential patterns
mining and community detection.
Recently, the Map-Reduce (Dean, 2004)
computational model and its open-source
implementation, Hadoop, are widely applied both in
research and industry areas. The model mainly
focuses on share-nothing parallelism, and its storage
system focuses on scalability. These advantages are
very suitable for telecom data mining.
In this paper, we combine data mining with
Map-Reduce based cloud computing to meet the
challenges and introduce our applied system,
Saurida. The system is built on distributed cluster
infrastructure as hardware and Hadoop distributed
computing platform as
fundamental software. As a
full functionality system, we provide data
preprocessing utilities, data mining algorithms
.
More importantly, we elaborate several application
scenarios as real-word requirements of telecom
industry by employing a large volume of data
obtained from telecom operator, we validate our
system from the view of scalability, effectiveness
and efficiency. In summary, Saurida takes the
following challenges as its destination as well as the
contributions to this work:
Data flow oriented and almost linear
speedup of preprocessing.
Extensive support for user defined functions.
Nearly linear speedup of data mining
algorithm.
516
Ke Q., Wu B., Dong Y. and Qin L..
SAURIDA: CLOUD COMPUTING BASED - Data Mining System in Telecommunication Industry.
DOI: 10.5220/0003387905160519
In Proceedings of the 1st International Conference on Cloud Computing and Services Science (CLOSER-2011), pages 516-519
ISBN: 978-989-8425-52-2
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)