Figure 13: Comparison of memory usage by execution
time.
5 CONCLUSIONS AND FUTURE
WORK
In this paper, we have proposed the RA-HCluster
algorithm for ubiquitous data stream clustering. This
algorithm adopts the resources-aware technique to
adapt algorithm settings and the level of the hierar-
chical summary frame, which enables mobile de-
vices to continue with mining and overcomes the
problem of lower accuracy or mining interruption
caused by insufficient memory in traditional data
stream clustering algorithms. Furthermore, we in-
clude the technique of computing the correlation
coefficients between micro-clusters to ensure that
more related data points are attributed to the same
cluster during the clustering process, thereby im-
proving the accuracy of clustering results. Experi-
mental results show that not only is the accuracy of
RA-HCluster higher than that of RA-VFKM, it can
also maintain a low and stable memory usage.
Because we have only dealt with mining a single
data stream using mobile devices in this paper, for
future research we may consider dealing with multi-
ple data streams. In addition, we can consider factors
such as battery, CPU utilization, and data rate to the
resource-aware technique, so that algorithms can be
more effectively adapted with respect to the current
environment of mobile devices and the characteris-
tics of data stream. For practical applications, we
may consider applications such as vehicle collision
prevention, intrusion detection, stock analysis, etc.
ACKNOWLEDGEMENTS
The authors would like to express their appreciation
for the financial support from the National Science
Council of Republic of China under Project No.
NSC 99-2221-E-031-005.
REFERENCES
Aggarwal, C. C., Han, J., Wang, J., Yu, P. S., 2003. A
Framework for Clustering Evolving Data Streams. In
Proceedings of the 29th International Conference on
Very Large Data Bases, Berlin, Germany, pp. 81-92.
Babcock, B., Babu, S., Motwani, R., Widom, J., 2002.
Models and Issues in Data Stream Systems.
In Pro-
ceedings of the 21st ACM SIGMOD Symposium on
Principles of Database Systems
, Madison, Wisconsin,
U.S.A., pp. 1-16.
Dai, B. R., Huang, J. W., Yeh, M. Y., Chen, M. S., 2006.
Adaptive Clustering for Multiple Evolving Streams.
IEEE Transactions on Knowledge and Data Engineer-
ing, Vol. 18, No. 9, pp. 1166-1180.
Gaber, M. M., Zaslavsky, A., Krishnaswamy, S., 2004.
Towards an Adaptive Approach for Mining Data
Streams in Resource Constrained Environment.
In
proceedings of the International Conference on Data
Warehousing and Knowledge Discovery,
Zaragoza,
Spain, pp. 189-198.
Gaber, M. M., Krishnaswamy, S., Zaslavsky, A., 2004.
Ubiquitous Data Stream Mining. In
Proceedings of the
8th Pacific-Asia Conference on Knowledge Discovery
and Data Mining
, Sydney, Australia.
Gaber, M. M., Yu, P. S., 2006. A Framework for Re-
source-aware Knowledge Discovery in Data Streams:
A Holistic Approach with Its Application to Cluster-
ing. In
Proceedings of the 2006 ACM Symposium on
Applied Computing, Dijon, France, pp. 649-656.
Golab, L., Ozsu, T. M., 2003. Issues in Data Stream Man-
agement
ACM SIGMOD Record, Vol. 32, Issue 2, pp.
5-14.
Kargupta, H., Park, B. H., Pittie, S., Liu, L., Kushraj, D.,
Sarkar, K., 2002. MobiMine: Monitoring the Stock
Market from a PDA.
ACM SIGKDD Explorations
Newsletter, Vol. 3, No. 2, pp. 37-46.
Kargupta, H., Bhargava, R., Liu, K., Powers, M., Blair, P.,
Bushra, S., Dull, J., Sarkar, K., Klein, M., Vasa, M.,
Handy, D., 2004. VEDAS: a Mobile and Distributed
Data Stream Mining System for Real-Time Vehicle
Monitoring. In
Proceedings of the 4th SIAM Interna-
tional Conference on Data Mining, Florida, U.S.A.,
pp. 300-311.
Shah, R., Krishnaswamy, S., Gaber, M. M., 2005. Re-
source-Aware Very Fast K-Means for Ubiquitous Data
Stream Mining. In
Proceedings of 2nd International
Workshop on Knowledge Discovery in Data Streams
,
Porto, Portugal, pp. 40-50.
RESOURCE-AWARE HIGH QUALITY CLUSTERING IN UBIQUITOUS DATA STREAMS
73