Scaling Big Data Applications in Smart City with Coresets

Le Hong Trang

, Hind Bangui

2,3

, Mouzhi Ge

2,3

and Barbora Buhnova

2,3

Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology,

Vietnam National University, Ho Chi Minh City, Vietnam

Institute of Computer Science, Masaryk University, Brno, Czech Republic

Faculty of Informatics, Masaryk University, Brno, Czech Republic

Keywords:

Big Data, Classiﬁcation, Coreset, Clustering, Sampling, Smart City.

Abstract:

With the development of Big Data applications in Smart Cities, various Big Data applications are proposed

within the domain. These are however hard to test and prototype, since such prototyping requires big comput-

ing resources. In order to save the effort in building Big Data prototypes for Smart Cities, this paper proposes

an enhanced sampling technique to obtain a coreset from Big Data while keeping the features of the Big Data,

such as clustering structure and distribution density. In the proposed sampling method, for a given dataset and

an ε > 0, the method computes an ε-coreset of the dataset. The ε-coreset is then modiﬁed to obtain a sample

set while ensuring the separation and balance in the set. Furthermore, by considering the representativeness

of each sample point, our method can helps to remove noises and outliers. We believe that the coreset-based

technique can be used to efﬁciently prototype and evaluate Big Data applications in the Smart City.

1 INTRODUCTION

Big Data has been receiving increasing attention in re-

cent years, as organizations and cities are dealing with

tremendous amounts of data with high complexity

and velocity (Ge et al., 2018). Given the speciﬁc fea-

tures of Big Data, the data has been classiﬁed accord-

ing to ﬁve fundamental elements, which are volume

(size of data), variety (different types of data from

several sources), velocity (data collected in real time),

veracity (uncertainty of data) and value (beneﬁts to

various industrial and academic ﬁelds). Moreover,

additional characteristics beyond the 5V’s model has

been discussed such as: validity (correct processing

of the data), variability (context of data), viscosity (la-

tency data transmission between the source and desti-

nation), virality (speed of the data sent and received

from various sources) and visualization (interpreta-

tion of data and identiﬁcation of the most relevant

information for the users). Despite the existence of

additional characteristics of Big Data, the 5V model

lays the foundational description of the Big Data con-

cept (Erl et al., 2016). Recently, Big Data research

has been undergoing substantial transformation from

its research harvest towards its high impact and appli-

cations in different areas, especially in the Smart City

(Bangui et al., 2018b).

The Smart City is to improve the lifestyle of

citizens by providing smart applications in various

ﬁelds such as urban planning, mobility and trans-

portation, smart living and community, smart environ-

ment, emergency, e-health and government (Step

anek

et al., 2017). The data generated in these Smart City

applications are usually fast moving and changing in

value, meaning and format. They also can originate

from various sources, such as social networks, un-

structured data from different devices or raw feeds

from sensors (Ge and Dohnal, 2018). Thus Big Data

processing and analytics can offer extensive insights

for Smart Cities. However, one of main factors that

mainly affects to the cost of Big Data analysis, is the

size of the dataset to be examined. Many datasets are

too large to store and process in a computer memory.

In the case, analyzing the datasets needs to access the

disk of the computer or even extra devices (Bangui

et al., 2018a). It is thus always an expensive compu-

tational task to analyze such datasets. Therefore, the

use of sampling technique is natural to overcome this

difﬁculty.

In this paper, a sampling method is proposed to

obtain a core sample dataset from Big Data while

keeping the features of Big Data such as clustering

and structures. This sampling method can be used to

quickly conduct prototypes for Big Data applications.

Trang, L., Bangui, H., Ge, M. and Buhnova, B.

Scaling Big Data Applications in Smart City with Coresets.

DOI: 10.5220/0007958803570363

In Proceedings of the 8th International Conference on Data Science, Technology and Applications (DATA 2019), pages 357-363

ISBN: 978-989-758-377-3

357

Thus, in order to test the feasibility of the certain ap-

plications in Big Data, we could save the efforts to or-

ganize the whole big dataset. Instead, we could work

on a scalable sample dataset to do the pilot study for

the feasility and applicability test for Big Data appli-

cations.

The rest of the paper is organized as follows. Sec-

tion 2 introduces an application scenario for the sam-

pling technique. Section 3 explains the intuition of

the sampling in Big data. Section 4 describes our pro-

posed methods for generating the coreset of the Big

Data as well as improvement techniques for coreset.

Based on the coreset technique, Section 5 describes

the possible application of coreset in the Smart City.

Finally, Section 6 concludes the paper and outlines

the future work.

2 BIG DATA IN THE SMART CITY

Nowadays, the cities are becoming the space

equipped with smart digital communication

transceivers, with an ambition of connecting, in-

tegrating and enhancing communicating objects.

Accordingly, we have observed an increasing pres-

ence of intelligent applications in our daily lives such

as smart parking. Meanwhile, many studies have pro-

posed various strategies for ﬁnding better governance

intelligence for modern cities (Ge et al., 2018). One

of these approaches features gathering data from

multiple domains, and then provides speciﬁc data to

decision-makers (Matheus and Janssen, 2018).

The visual datasets are one of the largest datasets

from those typically available in Smart Cities (Ge

et al., 2018), since they help to understand the most

fundamental and challenging goals in urban places.

A typical example is the Cityscapes Dataset (Cordts

et al., 2015), which very well illustrates the visual

complexity of such scenes (i.e., GPS positions) from

50 different cities by providing a large set of stereo

video sequences of street views. Likewise, Mapillary

Vistas Dataset (Neuhold et al., 2017), Daimler Urban

Segmentation Dataset (Scharw

achter et al., 2013),

and ApolloScape Dataset (Xinyu et al., 2018) con-

sist of video sequences recorded in urban trafﬁc that

could be used for developing autonomous driving ve-

hicles, learning how to detect objects and enumerate

them precisely, analyzing the road construction, and

so on. Therefore, these datasets help the scientiﬁc and

industrial communities in understanding urban street

scenes through visual perception. As a result, the

availability of large-scale datasets plays a vital role

in the understanding of the mutual information that

can be obtained from the joint Big Data analysis al-

gorithms and urban governance challenges. Further-

more, the proper analysis process of data is required

for providing the exact knowledge and achieving the

ultimate goal of the Smart City paradigm, which is

making better use of public resources by improving

the quality of services and reducing the operational

costs.

3 SAMPLING IN BIG DATA

Whenever the dataset is too big to be analysed in its

fullest, sampling can be used to return a representative

sample of the dataset that can be examined and its

properties extrapolated to the original datase.

The basic type of sampling is the uniformly ran-

dom sampling. It, however, is inefﬁcient when deal-

ing with datasets of non-uniform distribution. If

the shape and the density of datasets are varied, a

small sample obtained by uniform sampling would

have poor representativeness. The size of the sample

should thus be increased if a higher representativeness

is required. Two approaches proposed for overcom-

ing the drawbacks of uniform sampling are based on

distance and density features in datasets.

A point can represent a subset within a set if it is

close to the others. A basic measure is the distance.

A distance-based sampling measures the similarity of

points in a dataset. This approach is thus strongly re-

lated to clustering techniques (Bangui et al., 2019).

A distance used for the measurement is different for

datasets which depend on the distribution feature, for

example the shape of clusters in the dataset. For a

convex-shape (spherical) cluster, the Euclidean dis-

tance is proper, while a path-based one should be re-

quired for more complex shapes. In case of imbal-

anced datasets, in order to maintain the representa-

tiveness of a sample, a density bias is necessary. A

density-based sampling would select a representative

point based on a density-speciﬁed function of patterns

in a dataset. The principle is to try keeping represen-

tation of sparsely distributed clusters of the dataset. A

recent concise review of these two approaches can be

found in (Ros and Guillaume, 2017). Some methods

also were proposed in which the distance and den-

sity are coupled. These methods aim at ensuring the

structural feature as well as representativeness of a re-

sulting sample. This also is the main purpose of the

method proposed in this work while producing sam-

ples of small size.

By the approximate computation point of view,

sampling can be seen as determining approximately

a subset of a given set. For the geometric approxi-

mation problem, a concept called ε-coreset was intro-

DATA 2019 - 8th International Conference on Data Science, Technology and Applications

358

duced by Agarwal et al. (Agarwal et al., 2005). Given

a set P and ε > 0, an ε-coreset denoted by Q is a subset

of P that approximates P due to a monotonic measure

function. Recently, Ros and Guillaume have proposed

a sampling called ProTraS (Ros and Guillaume, 2018)

which can be seen as an extention of the fft (farthest

ﬁrst traversal) algorithm (Rosenkrantz et al., 1977).

They also indicated that the sample obtained by Pro-

TraS is a coreset of the original dataset. ProTraS it-

eratively adds a representative into the sample until

the sampling cost drops below a given threshold. The

representative is selected due to a probability of cost

reduction which is deﬁned based on the coupling of

distance and density concepts.

Our method employs ProTraS to compute a core-

set of a given dataset. Unlike ProTraS, the resulting

coreset is not the ﬁnal sample. For each point of the

coreset, we compute the center of the subset of the

dataset that the point represents. The sample of the

dataset includes centers of all points of the coreset.

Furthermore, if the representativeness of a point of

the coreset is low, i.e., the number of the subset of

the dataset that is represented by the point is small,

the point is removed from the sample. The method is

implemented in Matlab and experimentally compared

with ProTraS. The applicability of the method is also

evaluated with two key problems in data mining in-

cluding clustering and especially classiﬁcation with

imbalanced datasets.

4 CORESET FOR BIG DATA

SAMPLING

We ﬁrst recall the concept of coreset of a set (Agarwal

et al., 2005). Let µ be a monotone function from sub-

sets of R

to R

∪ {0}, i.e., for P

⊆ P, µ(P

) ≤ µ(P).

Given ε > 0 and Q ⊆ P, Q is called an ε-coreset of P

with respect to µ if

(1 − ε)µ(P) ≤ µ(Q).

When this concept is applied for the clustering

problem, then it is extended as bellow.

Deﬁnition 4.1. (Har-Peled and Mazumdar, 2004) A

subset S of P is an (k, ε)-coreset for P if

(1−ε)Cost

(C), (1)

where C ⊂ P is a set of k centers of P.

Our method proposed in the next section uses

the sample given by the ProTraS algorithm (Ros and

Guillaume, 2018) as the ﬁrst step. We now brieﬂy

describe this one and then discuss some observations

of its results. The main idea of ProTraS is to select

a representative point based on a probability of cost

reduction. Given an ε > 0, for each iteration of the

algorithm, it adds a new representative into a group of

the sample with highest probability of the cost reduc-

tion. When the cost drops below a threshold which

depends on ε, the algorithm stops. The details of the

algorithm are given in Algorithm 1.

Algorithm 1: ProTraS (Ros and Guillaume, 2018).

Require: P = {x

}, for i = 1,2,..., n, a tolerance ε > 0.

Ensure: A sample S = {y

} and P(y

), for j = 1,2, .. ., s.

1: Initialize a pattern x

init

∈ P.

2: y

= x

init

,P(y

) = {y

}, and S = {y

3: s = 1.

4: repeat

5: for all x

∈ P \ S do

6: y

= argmin

∈S

d(x

7: P(y

) = P(y

) ∪ {x

8: end for

9: maxW D = cost = 0.

10: for all y

∈ S do

11: x

max

) = argmax

∈P(y

)

d(x

12: d

max

) = d(x

max

),y

13: p

= |P(y

)|d

max

14: if p

> maxW D then

15: maxW D = p

16: y

∗

= y

17: end if

18: cost = cost + p

/n.

19: end for

20: x

∗

= x

max

∗

21: S = S ∪ {x

∗

} and s = s + 1.

22: P(y

∗

) = {x

∗

23: until cost < ε

24: return S and P(y

), for j = 1,2, .. ., s.

Lines 5-8 of the algorithm ﬁnd the nearest group

for points that are not yet assigned to any group of the

current sample. The point among them is determined

to be the new representative if it is farthest in its group

and has also highest probability (Lines 10-19). This

also means that the representative selected by ProTraS

is the farthest-ﬁrst traversal item.

Given a dataset P, let us denote by C =

,. . . , c

} the set of centers of P. ProTraS aims

at generating a coreset as the sample of P. Indeed, for

∈ P, let c

∗

∗0

∈ C be the closest centers to x

∈ P

and y

∈ S, respectively. We deﬁne

• Cost

∑

i=1

d(x

∗

) and

• Cost

∑

j=1

d(y

∗0

), where w

is the

number of points of P(y

) and P(y

) is also called

the set of patterns of y

Since the set of representatives is selected by the

farthest-ﬁrst traversal, it has been shown in (Ros and

Scaling Big Data Applications in Smart City with Coresets

359

Guillaume, 2018) that if we choose

ε =

∑

j=1

Cost

(C)

where d

= max

∈P(y

)

{d(y

)}, for y

∈ S, then (1)

is satisﬁed. Hence, the obtained sample is a coreset of

P. We now discuss some experimental results of the

ProTraS algorithm.

4.1 Implementation of ProTraS

We implemented the algorithm in Matlab and tested

on some synthetic datasets

. Fig. 1 and Fig. 2 show

the results tested for S1 dataset with several values of

ε. The size of the dataset is 3000. For ε = 0.2, the ob-

Figure 1: The sample of S1 dataset obtained by ProTraS

with ε = 0.2, the sample size is 97.

tained sample consists of 97 data points. We observe

that the points are selected at border sides of clusters

of the set. This is due to the principle of farthest-

ﬁrst traversal. We now decrease the value of ε. The

number of sample points is thus increased. Fig. 2

shows the sample with ε = 0.1. The sample points

are now distributed uniformly over the dataset, mean-

ing that the structural representativeness of the sam-

ple is higher. The size of the sample is 261. This is

reasonable when compared with the size of the whole

dataset.

However, since the method is based on farthest-

ﬁrst traversal, the points are farthest among a group

should always be chosen. These points are not use-

ful in some cases. For example, assume that we are

clustering a very large dataset in which the separa-

tion of clusters of the dataset is low. If we apply a

ProTraS for sampling the dataset, the sample will in-

clude some points located at the middle of clusters

(see Fig. 3). That makes it difﬁcult to process groups

which include these points in clustering task.

https://cs.joensuu.ﬁ/sipu/datasets/

Figure 2: Obtained sample with ε = 0.1 and the size is 261.

Figure 3: The sample of dataset S8 consisting of 5000

points, which is obtained with ε = 0.1, the size is 327.

Another issue that can arise is that a point in a

sample is at the boundary of a dataset as farthest-ﬁrst

selecting (see points marked by red circle). The dis-

tance of the points and a cluster can be longer than

that between clusters in the sample. This leads to

wrong clustering. Consequently, the results of clus-

tering on the whole dataset might be inaccurate.

In order to overcome the difﬁculty mentioned

above, the next section describes our technique in

which the representative in a group is re-selected to

be the center of the group. Furthermore, some points

in the sample can be removed if they are less useful

for the mining purpose.

4.2 An Improved Technique

Given a dataset P = {x

}, for i = 1, 2,...,n and a given

ε > 0, our method ﬁrstly calls ProTraS to obtain S =

} and P(y

) for j = 1,2,. . . , s. The method next

tries to ﬁnd out some sample points, which have low

representativeness and remove them from the sample.

A point in remaining points is then replicated by the

center of the set of patterns which the point repre-

sents. The details of the method are given in Algo-

rithm 2.

DATA 2019 - 8th International Conference on Data Science, Technology and Applications

360

Algorithm 2: Coreset-based algorithm for sampling.

Require: P = {x

}, for i = 1,2,.. ., n, a tolerance ε > 0.

Ensure: A sample S = {y

} and P(y

), for j = 1,2, .. ., s.

1: Call ProTraS for P and ε to obtain S = {y

} and P(y

2: S

3: for all y

∈ S do

4: if |P(y

)| is greater than a threshold then

5: y

∗

= argmin

∈P(y

)

∑

∈P(y

)

d(y

6: S

= S

∪ {y

∗

7: end if

8: end for

9: S = S

10: return S and P(y

∗

), for j = 1,2, .. ., s

, where s

≤ s.

Line 4 in the algorithm decides if a sample point

will be select into our sample, i.e., S

. This is per-

formed using a threshold. |P(y

)| denotes the number

of patterns in P with y

∈ S being their representa-

tive. A small value of |P(y

)| means that the repre-

sentativeness of y

is low. It thus is not necessary and

then can be removed from the sample. The value of

the threshold should be chosen due to the distribution

characteristics of datasets.

For y

∈ S that is not removed, line 5 computes the

center of the group represented by y

, to consider re-

placing it. The center here, denoted by y

∗

, is deﬁned

to be the point in P(y

) such that the total distance to

all others in the group is minimized. The set S

in-

cluding such y

∗

is the output sample of the algorithm.

Figure 4: The sample of dataset S1 obtained with ε = 0.1

by our algorithm.

We now discuss to indicate the meaningful of S

Let us describe the replication of the representative

∈ S. This task aims at moving the representative of

a group into its center. There are two cases that can be

happen. If y

is located at the border of a cluster and

it represents P(y

) in that cluster, the center of P(y

)

should be located near of that cluster than any other of

the dataset. This helps S

to highlight the cluster ten-

dency of the dataset. In case that y

is strictly inside of

a cluster, it might be not far from the center of P(y

The change of distance from y

to the center is thus

small. In practice, most of such y

also is the center

of P(y

). Therefore, S

still keeps the main structure

of the whole dataset, where the distribution density is

high. Fig. 4 shows the sample obtained by our algo-

rithm for S1 dataset. The sample represents better the

structure of the dataset, compared with that shown in

Fig. 1 abtained by ProTraS. We note also that, as men-

tioned, Line 4 of our algorithm will remove a number

of points y

whose small value of |P(y

)|. This can

helps us to deal with noisy data and outliers which

usually are low representative.

5 EFFICIENT BIG DATA

PROTOTYPING WITH

CORESETS

As seen in the previous section, the investigation of

the beneﬁts of the coreset method could achieve reli-

able results that could support the shifting from tra-

ditional cities to Smart Cities. Indeed, the new mod-

ern environment characterizes by integrating various

smart applications that demand autonomous commu-

nication between intelligent devices for responding to

speciﬁc tasks necessary for citizens’ lifestyle. The

digitization of the transport systems is one of these

applications that reﬂect this big advancement of mod-

ern cities, in which IoT sensors play a crucial role in

realizing the vision of future transportation. In fact,

the digitization of smart road infrastructure and ve-

hicles (i.e., cars) produce each day a signiﬁcant data

through IoT devices that could be used to manage and

optimize various transport applications, such as route

planning, surveillance applications, situation recogni-

tion, weather prediction, accident detection, applica-

tions for pedestrians, emergency management, trafﬁc

control, autonomous driving, trafﬁc prediction, etc.

As a result, the shared transportation data can min-

imize the risks that hit back the safety of citizens

as well as contribute to building a sustainable smart

transport environment (Priyan and Devi, 2019).

The achievement of this vision of future trans-

portation requires a perfect processing of data. How-

ever, in practice, it is hard to obtain reliable and accu-

rate outcomes since the majority of transport works

focuses on applying the Big Data techniques with-

out paying attention to the rapid changes in the size

of data. For example, jamming attack topic in wire-

less vehicular ad-hoc networks (VANET) is one of the

hard challenges in the transportation domain that aims

at securing the vehicle network communication by

developing anti-jamming attack applications. To do

Scaling Big Data Applications in Smart City with Coresets

361

that, the machine learning techniques are used such

as in (Yao and Jia, 2019), where a multi-agent Q-

learning algorithm has been developed for solving

the formulated anti-jamming Markov game. Simi-

larly, as in (Kosmanos et al., 2018), the authors have

proposed a detection framework by combining two

supervised machine learning methods, which are K-

Nearest Neighbors (KNN) and Random Forests (RF),

with the metric of the variations of the relative speed

(VRS) between the target and the jammer. Another

example k-means (Pang et al., 2017), where its ad-

vantages are used to predict the number of multi-

ple jamming attackers and ensure the preset functions

of VANET. However, the common issue with these

works is the use of the whole data during the applica-

tion of Big Data techniques. Yet, the size of datasets is

increasingly being gathered by ubiquitous smart IoT

sensors. That means the manipulation of whole data

might increase the computational cost and time of

data processing exponentially. Thus, our proposed so-

lution could address those problems by turning large

data into very small yet representative data. Further, it

could guarantee the best manipulation of data in real-

time as well as the scalability of outcomes. As a re-

sult, the advantages of coreset could play an essential

role in the success of transport systems that depend

on the efﬁcient integration, representation, and man-

agement of data.

6 CONCLUSIONS

In this paper, we have proposed a sampling tech-

nique, coreset, for Big Data. The coreset can ex-

tract the key features of the Big Data while reducing

the Big Data to a manageable data scale. Besides,

we have proposed a few improvement techniques for

coreset. Based on the coreset technique, we have pro-

posed a possible Big Data application in the context of

Smart City. Since Smart City is changing and updat-

ing quickly, different possible applications, especially

with Big Data, are frequently proposed. In order to ef-

ﬁciently test the feasibility the proposed application,

we envision that the coreset technique can be used to

efﬁciently build the prototypes for Big Data applica-

tions in Smart Cities. As future work, we plan to ap-

ply the coreset technique in real-world Smart City ap-

plications and evaluate how much effort and time can

be saved by using the proposed coreset technique.

ACKNOWLEDGEMENTS

This research is funded by Vietnam National

University Ho Chi Minh City (VNU-HCM) un-

der grant number C2019-20-13. The work was

also supported from European Regional Develop-

ment Fund Project CERIT Scientiﬁc Cloud (No.

CZ.02.1.01/0.0/0.0/16 013/0001802). Access to the

CERIT-SC computing and storage facilities provided

by the CERIT-SC Center, under the ”Projects of

Large Research, Development, and Innovations In-

frastructures” programme (CERIT Scientiﬁc Cloud

LM2015085), is greatly appreciated.

REFERENCES

Agarwal, P. K., Har-Peled, S., and Varadarajan, K. R.

(2005). Geometric approximation via coresets. In

COMBINATORIAL AND COMPUTATIONAL GE-

OMETRY, MSRI, pages 1–30. University Press.

Bangui, H., Ge, M., and Buhnova, B. (2018a). Exploring

big data clustering algorithms for internet of things

applications. In Proceedings of the 3rd International

Conference on Internet of Things, Big Data and Se-

curity, IoTBDS 2018, Funchal, Madeira, Portugal,

March 19-21, 2018., pages 269–276.

Bangui, H., Ge, M., and Buhnova, B. (2018b). A research

roadmap of big data clustering algorithms for future

internet of things. International Journal of Organiza-

tional and Collective Intelligence, 9(2):16–30.

Bangui, H., Ge, M., and Buhnova, B. (2019). A research

roadmap of big data clustering algorithms for future

internet of things. International Journal of Organiza-

tional & Collective Intelligence, 9(2):16–30.

Cordts, M., Omran, M., Ramos, S., Scharw

achter, T., En-

zweiler, M., Benenson, R., and Schiele, B. (2015).

The cityscapes dataset. In In CVPR Workshop on the

Future of Datasets in Vision, volume 2.

Erl, T., Khattak, W., and Buhler, P. (2016). Big Data Fun-

damentals: Concepts, Drivers & Techniques. Prentice

Hall Press, Upper Saddle River, NJ, USA, 1st edition.

Ge, M., Bangui, H., and Buhnova, B. (2018). Big data for

internet of things: A survey. Future Generation Com-

puter Systems, 87:601–614.

Ge, M. and Dohnal, V. (2018). Quality management in big

data. Informatics, 5(2):19.

Har-Peled, S. and Mazumdar, S. (2004). On coresets for

k-means and k-median clustering. In Proceedings of

the Thirty-sixth Annual ACM Symposium on Theory

of Computing, STOC ’04, pages 291–300, New York,

NY, USA. ACM.

Kosmanos, D., Karagiannis, D.and Argyriou, A. L. S., and

Maglaras, L. (2018). Rf jamming classiﬁcation us-

ing relative speed estimation in vehicular wireless net-

works. arXiv preprint arXiv:1812.11886.

Matheus, R. and Janssen, M.and Maheshwari, D. (2018).

Data science empowering the public: Data-driven

DATA 2019 - 8th International Conference on Data Science, Technology and Applications

362

dashboards for transparent and accountable decision-

making in smart cities. Government Information,

Quarterly.

Neuhold, G., Ollmann, T., Rota Bulo, S., and Kontschieder,

P. (2017). The mapillary vistas dataset for semantic

understanding of street scenes. In In Proceedings of

the IEEE International Conference on Computer Vi-

sion, pages 4990–4999.

Pang, L.and Guo, P. C., X., Li, J., and Xue, Z. (2017). Es-

timating the number of multiple jamming attackers in

vehicular ad hoc network. In In 2017 6th International

Conference on Computer Science and Network Tech-

nology (ICCSNT), pages 366–370. IEEE.

Priyan, M. K. and Devi, G. U. (2019). A survey on inter-

net of vehicles: applications, technologies, challenges

and opportunities. nternational Journal of Advanced

Intelligence Paradigms, 12(1-2):98–119.

Ros, F. and Guillaume, S. (2017). Dides: A fast and ef-

fective sampling for clustering algorithm. Knowledge

and Information Systems, 50(2):543–568.

Ros, F. and Guillaume, S. (2018). Protras: A probabilis-

tic traversing sampling algorithm. Expert System with

Applications, 105:65–76.

Rosenkrantz, D. J., Stearns, R. E., and Lewis, P. M. (1977).

An analysis of several heuristics for the traveling

salesman problem. SIAM Journal on Computing,

6(3):563–581.

Scharw

achter, T., Enzweiler, M., Franke, U., and Roth, S.

(2013). Efﬁcient multi-cue scene segmentation. In

In German Conference on Pattern Recognition, pages

435–445. Springer.

Step

anek, P., Ge, M., and Walletzk

y, L. (2017). It-enabled

digital service design principles - lessons learned from

digital cities. In Information Systems - 14th European,

Mediterranean, and Middle Eastern Conference, EM-

CIS 2017, Coimbra, Portugal, September 7-8, 2017,

Proceedings, pages 186–196.

Xinyu, H., Cheng, X., Geng, Q., Cao, B., Zhou, D., Wang,

P., Lin, Y., and Yang, R. (2018). The apolloscape

dataset for autonomous driving. In In Proceedings of

the IEEE Conference on Computer Vision and Pattern

Recognition Workshops, pages 954–960. IEEE.

Yao, F. and Jia, L. (2019). A collaborative multi-agent re-

inforcement learning anti-jamming algorithm in wire-

less networks. IEEE Wireless Communications Let-

ters.

Scaling Big Data Applications in Smart City with Coresets

363