that, the machine learning techniques are used such
as in (Yao and Jia, 2019), where a multi-agent Q-
learning algorithm has been developed for solving
the formulated anti-jamming Markov game. Simi-
larly, as in (Kosmanos et al., 2018), the authors have
proposed a detection framework by combining two
supervised machine learning methods, which are K-
Nearest Neighbors (KNN) and Random Forests (RF),
with the metric of the variations of the relative speed
(VRS) between the target and the jammer. Another
example k-means (Pang et al., 2017), where its ad-
vantages are used to predict the number of multi-
ple jamming attackers and ensure the preset functions
of VANET. However, the common issue with these
works is the use of the whole data during the applica-
tion of Big Data techniques. Yet, the size of datasets is
increasingly being gathered by ubiquitous smart IoT
sensors. That means the manipulation of whole data
might increase the computational cost and time of
data processing exponentially. Thus, our proposed so-
lution could address those problems by turning large
data into very small yet representative data. Further, it
could guarantee the best manipulation of data in real-
time as well as the scalability of outcomes. As a re-
sult, the advantages of coreset could play an essential
role in the success of transport systems that depend
on the efficient integration, representation, and man-
agement of data.
6 CONCLUSIONS
In this paper, we have proposed a sampling tech-
nique, coreset, for Big Data. The coreset can ex-
tract the key features of the Big Data while reducing
the Big Data to a manageable data scale. Besides,
we have proposed a few improvement techniques for
coreset. Based on the coreset technique, we have pro-
posed a possible Big Data application in the context of
Smart City. Since Smart City is changing and updat-
ing quickly, different possible applications, especially
with Big Data, are frequently proposed. In order to ef-
ficiently test the feasibility the proposed application,
we envision that the coreset technique can be used to
efficiently build the prototypes for Big Data applica-
tions in Smart Cities. As future work, we plan to ap-
ply the coreset technique in real-world Smart City ap-
plications and evaluate how much effort and time can
be saved by using the proposed coreset technique.
ACKNOWLEDGEMENTS
This research is funded by Vietnam National
University Ho Chi Minh City (VNU-HCM) un-
der grant number C2019-20-13. The work was
also supported from European Regional Develop-
ment Fund Project CERIT Scientific Cloud (No.
CZ.02.1.01/0.0/0.0/16 013/0001802). Access to the
CERIT-SC computing and storage facilities provided
by the CERIT-SC Center, under the ”Projects of
Large Research, Development, and Innovations In-
frastructures” programme (CERIT Scientific Cloud
LM2015085), is greatly appreciated.
REFERENCES
Agarwal, P. K., Har-Peled, S., and Varadarajan, K. R.
(2005). Geometric approximation via coresets. In
COMBINATORIAL AND COMPUTATIONAL GE-
OMETRY, MSRI, pages 1–30. University Press.
Bangui, H., Ge, M., and Buhnova, B. (2018a). Exploring
big data clustering algorithms for internet of things
applications. In Proceedings of the 3rd International
Conference on Internet of Things, Big Data and Se-
curity, IoTBDS 2018, Funchal, Madeira, Portugal,
March 19-21, 2018., pages 269–276.
Bangui, H., Ge, M., and Buhnova, B. (2018b). A research
roadmap of big data clustering algorithms for future
internet of things. International Journal of Organiza-
tional and Collective Intelligence, 9(2):16–30.
Bangui, H., Ge, M., and Buhnova, B. (2019). A research
roadmap of big data clustering algorithms for future
internet of things. International Journal of Organiza-
tional & Collective Intelligence, 9(2):16–30.
Cordts, M., Omran, M., Ramos, S., Scharw
¨
achter, T., En-
zweiler, M., Benenson, R., and Schiele, B. (2015).
The cityscapes dataset. In In CVPR Workshop on the
Future of Datasets in Vision, volume 2.
Erl, T., Khattak, W., and Buhler, P. (2016). Big Data Fun-
damentals: Concepts, Drivers & Techniques. Prentice
Hall Press, Upper Saddle River, NJ, USA, 1st edition.
Ge, M., Bangui, H., and Buhnova, B. (2018). Big data for
internet of things: A survey. Future Generation Com-
puter Systems, 87:601–614.
Ge, M. and Dohnal, V. (2018). Quality management in big
data. Informatics, 5(2):19.
Har-Peled, S. and Mazumdar, S. (2004). On coresets for
k-means and k-median clustering. In Proceedings of
the Thirty-sixth Annual ACM Symposium on Theory
of Computing, STOC ’04, pages 291–300, New York,
NY, USA. ACM.
Kosmanos, D., Karagiannis, D.and Argyriou, A. L. S., and
Maglaras, L. (2018). Rf jamming classification us-
ing relative speed estimation in vehicular wireless net-
works. arXiv preprint arXiv:1812.11886.
Matheus, R. and Janssen, M.and Maheshwari, D. (2018).
Data science empowering the public: Data-driven
DATA 2019 - 8th International Conference on Data Science, Technology and Applications
362