was used to determine the cluster dimensions and lo-
cations and a 2100 point testing set (disjoint from the
training set) was used for classification. In this exper-
iment, 74.8% of the points were classified correctly.
This demonstrates that the algorithm is able to deter-
mine the cluster properties from a small set of exam-
ples and apply them to previously unknown points.
The performance is lower than the unsupervised ex-
periments owing to the smaller set of points used to
determine the clusters.
6 CONCLUSIONS
We have presented a new algorithm called SEPC
for locating projective clusters using a Monte Carlo
method. The algorithm is straightforward to imple-
ment and has low complexity (linear in the number of
data points and low-order polynomial in the number
of dimensions). In addition, the algorithm does not re-
quire the number of clusters or the number of cluster
dimensions as input and does not make assumptions
about the distribution of cluster points (other than that
the clusters have bounded diameter). The algorithm is
widely applicable to projective clustering problems,
including the ability to find both disjoint and non-
disjoint clusters. The performance of the SEPC al-
gorithm surpasses previously reported results on both
synthetic and real data.
REFERENCES
Aggarwal, C. C., Wolf, J. L., Yu, P. S., Procopiuc, C., and
Park, J. S. (1999). Fast algorithms for projected clus-
tering. In Proceedings of the ACM SIGMOD Inter-
national Conference on Management of Data, pages
61–72.
Aggarwal, C. C. and Yu, P. S. (2000). Finding generalized
projected clusters in high dimensional spaces. In Pro-
ceedings of the ACM SIGMOD International Confer-
ence on Management of Data, pages 70–81.
Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P.
(1998). Automatic subspace clustering of high dimen-
sional data for data mining applications. In Proceed-
ings of the ACM SIGMOD International Conference
on Management of Data, pages 94–105.
Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P.
(2005). Automatic subspace clustering of high dimen-
sional data. Data Mining and Knowledge Discovery,
11:5–33.
Asuncion, A. and Newman, D. (2007). UCI ma-
chine learning repository. University of California,
Irvine, School of Information and Computer Sciences,
http://www.ics.uci.edu/∼mlearn/MLRepository.html.
Beyer, K., Goldstein, J., Ramakrishnan, R., and Shaft, U.
(1999). When is“nearest neighbor” meaningful? In
Proceedings of the 7th International Conference on
Database Theory, pages 217–235.
Cheng, C. H., Fu, A. W., and Zhang, Y. (1999). Entropy-
based subspace clustering for mining numerical data.
In Proceedings of the ACM SIGKDD International
Conference on Knowledge Discovery and Data Min-
ing, pages 84–93.
Dash, M., Choi, K., Scheuermann, P., and Liu, H. (2002).
Feature selection for clustering - a filter solution. In
Proceedings of the IEEE International Conference on
Data Mining, pages 115–122.
Ding, C., He, X., Zha, H., and Simon, H. D. (2002). Adap-
tive dimension reduction for clustering high dimen-
sional data. In Proceedings of the IEEE International
Conference on Data Mining, pages 147–154.
Goil, S., Nagesh, H., and Choudhary, A. (1999). MAFIA:
Efficient and scalable subspace clustering for very
large data sets. Technical Report No. CPDC-TR-
9906-010, Northwestern University.
Jain, A. K., Murty, M. N., and Flynn, P. J. (1999). Data
clustering: A review. ACM Computing Surveys,
31(3):264–323.
Nagesh, H., Goil, S., and Choudhary, A. (2001). Adaptive
grids for clustering massive data sets. In Proceedings
of the SIAM International Conference on Data Min-
ing.
Parsons, L., Haque, E., and Liu, H. (2004). Subspace clus-
tering for high dimensional data: A review. SIGKDD
Explorations, 6(1):90–105.
Procopiuc, C. M., Jones, M., Agarwal, P. K., and Murali,
T. M. (2002). A Monte Carlo algorithm for fast pro-
jective clustering. In Proceedings of the ACM SIG-
MOD International Conference on Management of
Data, pages 418–427.
Woo, K.-G., Lee, J.-H., and Lee, Y.-J. (2004). FINDIT: A
fast and intelligent subspace clustering algorithm us-
ing dimension voting. Information and Software Tech-
nology, 46(4):255–271.
Yiu, M. L. and Mamoulis, N. (2005). Iterative projected
clustering by subspace mining. IEEE Transactions on
Knowledge and Data Engineering, 17(2):176–189.
SIMPLE AND EFFICIENT PROJECTIVE CLUSTERING
55