# Clustering and Density Estimation for Streaming Data using Volume Prototypes

### Maiko Sato, Mineichi Kudo, Jun Toyama

#### Abstract

The authors have proposed volume prototypes as a compact expression of a huge data or a data stream, along with a one-pass algorithm to find them. A reasonable number of volume prototypes can be used, instead of an enormous number of data, for many applications including classification, clustering and density estimation. In this paper, two algorithms using volume prototypes, called VKM and VEM, are introduced for clustering and density estimation. Compared with the other algorithms for such a huge data, we showed that our algorithms were advantageous in speed of processing, while keeping the same degree of performance, and that both applications were available from the same set of volume prototypes.

#### References

- Tabata, K., Kudo, M.: Information compression by volume prototypes. The IEICE Technical Report, PRMU, 106 (2006) 25-30 (in Japanese)
- Sato, M., Kudo, M., Toyama, J.: Behavior Analysis of Volume Prototypes in High Dimensionality. In: Structural, Syntactic and Statistical Pattern Recognition, Lecture Notes in Computer Science. Volume 5342., Springer (2008) 884-894
- Zhang, T., Ramakrishnan, R., Livny, M.: Fast density estimation using CF-kernel for very large databases. Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (1999) 312-316
- Arandjelovicc, O., Cipolla, R.: Incremental learning of temporally-coherent Gaussian mixture models. Proceedings of the IAPR British Machine Vision Conference (2005) 759-768
- Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in Graphical Models (1999) 355-368
- Thiesson, B., Meek, C., Heckerman, D.: Accelerating EM for Large Databases. Machine Learning 45 (2001) 279-299
- Charikar, M., O'Callaghan, L., Panigrahy, S.U.R.: Better Streaming Algorithms for Clustering Problems. Proceedings of the thirty-fifth annual ACM symposium on Theory of computing (2003) 30-39
- Bradley, P.S., Fayyad, U.M., Reina, C.A.: Scaling clustering algorithms to large databases. Knowledge Discovery and Data Mining (1998) 9-15
- Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. SIGMOD Rec., 25 (1996) 103-114
- Goswami, A., Jin, R., Agrawal, G.: Fast and exact out-of-core k-means clustering. IEEE International Conference on Data Mining (2004) 83-90
- Farnstrom, F., Lewis, J., Elkan, C.: Scalability for clustering algorithms revisited. SIGKDD Explor. Newsl., 2 (2000) 51-57

#### Paper Citation

#### in Harvard Style

Sato M., Kudo M. and Toyama J. (2009). **Clustering and Density Estimation for Streaming Data using Volume Prototypes** . In *Proceedings of the 9th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2009)* ISBN 978-989-8111-89-0, pages 39-48. DOI: 10.5220/0002173500390048

#### in Bibtex Style

@conference{pris09,

author={Maiko Sato and Mineichi Kudo and Jun Toyama},

title={Clustering and Density Estimation for Streaming Data using Volume Prototypes},

booktitle={Proceedings of the 9th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2009)},

year={2009},

pages={39-48},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0002173500390048},

isbn={978-989-8111-89-0},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the 9th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2009)

TI - Clustering and Density Estimation for Streaming Data using Volume Prototypes

SN - 978-989-8111-89-0

AU - Sato M.

AU - Kudo M.

AU - Toyama J.

PY - 2009

SP - 39

EP - 48

DO - 10.5220/0002173500390048