Table 1. Time comparison in clustering and mixture models.
Dataset K Time (sec.)
VP VKM SKM VEM Incremental EM
Circle 5 5.908 0.001 4.204 0.012 10.136
(25 prototypes) 10 — 0.004 4.216 0.048 19.133
4-Cross 4 7.440 0.001 2.204 0.028 7.732
(23 prototypes) 8 — 0.001 2.532 0.184 15.332
5-Gaussian 5 7.928 0.001 2.544 0.012 9.808
(24 prototypes) 10 — 0.008 3.520 0.104 18.825
dataset, because VP is a completely one-pass algorithm. It should be noted that VKM
and VEM are separately applicable to the same set of volume prototypes. In addition,
we can try several values of K very efficiently with VKM and VEM for model selection.
6 Conclusions
In this paper, we have presented two algorithms for clustering and density estimation on
the basis of volume prototypes which can be used instead of a huge data and obtained
by a single-pass algorithm. The necessary number of volume prototypes is quite smaller
than the number of given samples, therefore, our algorithms work very efficiently for a
huge data or data streams.
One of proposed algorithms is a volume prototype version (VEM) of EM algorithm.
It is for density estimation. Since each prototype has a volume, we extended the original
algorithm so as to take into account the volume and the number of samples included.
Another algorithm is a k-means algorithm for volume prototypes (VKM). In this al-
gorithm, we developed a distance measure between a volume prototype and a cluster
center as a natural extension of its point version.
We confirmed the efficiency of both algorithms in some experiments with 2 - di-
mensional artificial data. The main advantage of these algorithms is that we can carry
out both algorithms in low cost, once volume prototypes are given. We will further
investigate the applicability for high-dimensional real-world datasets.
References
1. Tabata, K., Kudo, M.: Information compression by volume prototypes. The IEICE Technical
Report, PRMU, 106 (2006) 25–30 (in Japanese)
2. Sato, M., Kudo, M., Toyama, J.: Behavior Analysis of Volume Prototypes in High Dimen-
sionality. In: Structural, Syntactic and Statistical Pattern Recognition, Lecture Notes in Com-
puter Science. Volume 5342., Springer (2008) 884–894
3. Zhang, T., Ramakrishnan, R., Livny, M.: Fast density estimation using CF-kernel for very
large databases. Proceedings of the fifth ACM SIGKDD international conference on Knowl-
edge discovery and data mining (1999) 312–316
4. Arandjelovic
´
c, O., Cipolla, R.: Incremental learning of temporally-coherent Gaussian mix-
ture models. Proceedings of the IAPR British Machine Vision Conference (2005) 759–768
5. Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justifies incremental, sparse, and
other variants. Learning in Graphical Models (1999) 355–368
47