be clustered, as the so-called non-metric spaces such
as graphs or sequences, for which defining a mean or
median is nonsensical.
REFERENCES
Aloise, D., Deshpande, A., Hansen, P., and Popat, P. (2009).
Np-hardness of euclidean sum-of-squares clustering.
Machine learning, 75(2):245–248.
Ankerst, M., Breunig, M. M., Kriegel, H.-P., and Sander, J.
(1999). Optics: Ordering points to identify the clus-
tering structure. In Proceedings of the 1999 ACM
SIGMOD International Conference on Management
of Data, SIGMOD ’99, pages 49–60, New York, NY,
USA. ACM.
Arbelaez, A. and Quesada, L. (2013). Parallelising the k-
medoids clustering problem using space-partitioning.
In Sixth Annual Symposium on Combinatorial Search.
Arthur, D. and Vassilvitskii, S. (2007). k-means++: The
advantages of careful seeding. In Proceedings of the
eighteenth annual ACM-SIAM symposium on Discrete
algorithms, pages 1027–1035. Society for Industrial
and Applied Mathematics.
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., and
Vassilvitskii, S. (2012). Scalable k-means++. Pro-
ceedings of the VLDB Endowment, 5(7):622–633.
Bianchi, F. M., Livi, L., and Rizzi, A. (2016). Two
density-based k-means initialization algorithms for
non-metric data clustering. Pattern Analysis and Ap-
plications, 3(19):745–763.
Bradley, P. S., Mangasarian, O. L., and Street, W. N. (1996).
Clustering via concave minimization. In Proceedings
of the 9th International Conference on Neural Inform-
ation Processing Systems, NIPS’96, pages 368–374,
Cambridge, MA, USA. MIT Press.
Dean, J. and Ghemawat, S. (2008). Mapreduce: simplified
data processing on large clusters. Communications of
the ACM, 51(1):107–113.
Del Vescovo, G., Livi, L., Frattale Mascioli, F. M., and
Rizzi, A. (2014). On the problem of modeling struc-
tured data with the minsod representative. Interna-
tional Journal of Computer Theory and Engineering,
6(1):9.
Ester, M., Kriegel, H.-P., Sander, J., Xu, X., et al. (1996).
A density-based algorithm for discovering clusters in
large spatial databases with noise. In Proceedings of
the Second International Conference on Knowledge
Discovery and Data Mining, volume 96, pages 226–
231.
Guha, S., Rastogi, R., and Shim, K. (1998). Cure: An ef-
ficient clustering algorithm for large databases. SIG-
MOD Rec., 27(2):73–84.
Jiang, Y. and Zhang, J. (2014). Parallel k-medoids cluster-
ing algorithm based on hadoop. In Software Engin-
eering and Service Science (ICSESS), 2014 5th IEEE
International Conference on, pages 649–652. IEEE.
Kaufman, L. and Rousseeuw, P. J. (1987). Clustering by
means of medoids. Statistical Data Analysis Based
on the L1-Norm and Related Methods, pages North–
Holland.
Kaufman, L. and Rousseeuw, P. J. (2009). Finding groups
in data: an introduction to cluster analysis, volume
344. John Wiley & Sons.
Lichman, M. (2013). UCI machine learning repository.
Lloyd, S. (1982). Least squares quantization in pcm. IEEE
transactions on information theory, 28(2):129–137.
MacQueen, J. B. (1967). Some methods for classification
and analysis of multivariate observations. In Cam, L.
M. L. and Neyman, J., editors, Proc. of the fifth Berke-
ley Symposium on Mathematical Statistics and Prob-
ability, volume 1, pages 281–297. University of Cali-
fornia Press.
Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkatara-
man, S., Liu, D., Freeman, J., Tsai, D., Amde, M.,
Owen, S., et al. (2016). Mllib: Machine learning in
apache spark. Journal of Machine Learning Research,
17(34):1–7.
Ng, R. T. and Han, J. (1994). Efficient and effective cluster-
ing methods for spatial data mining. In Proceedings
of the 20th International Conference on Very Large
Data Bases, VLDB ’94, pages 144–155, San Fran-
cisco, CA, USA. Morgan Kaufmann Publishers Inc.
Park, H.-S. and Jun, C.-H. (2009). A simple and fast al-
gorithm for k-medoids clustering. Expert systems with
applications, 36(2):3336–3341.
van der Walt, S., Colbert, S. C., and Varoquaux, G. (2011).
The numpy array: A structure for efficient numerical
computation. Computing in Science & Engineering,
13(2):22–30.
Xu, X., J
¨
ager, J., and Kriegel, H.-P. (1999). A fast parallel
clustering algorithm for large spatial databases. Data
Mining and Knowledge Discovery, 3(3):263–290.
Yue, X., Man, W., Yue, J., and Liu, G. (2016). Paral-
lel k-medoids++ spatial clustering algorithm based on
mapreduce. arXiv preprint arXiv:1608.06861.
Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S.,
and Stoica, I. (2010). Spark: Cluster computing with
working sets. HotCloud, 10(10-10):95.
Zhang, T., Ramakrishnan, R., and Livny, M. (1996). Birch:
An efficient data clustering method for very large
databases. In Proceedings of the 1996 ACM SIGMOD
International Conference on Management of Data,
SIGMOD ’96, pages 103–114, New York, NY, USA.
ACM.
Zhao, W., Ma, H., and He, Q. (2009). Parallel k-means
clustering based on mapreduce. In IEEE Interna-
tional Conference on Cloud Computing, pages 674–
679. Springer.