6 RELATED WORK
In the area of clustering Web users, it is a new ap-
proach to take into account the non-obvious profiles,
described in (Zicari et al., 2006). Clustering non-
obvious-profiles with CORD takes indirectly into ac-
count the time spent by the user on a page and the con-
tent topics of this page. CORD clusters users based
on their supposed interest in these topics. With the
fuzzy centroids, it gives an interpretation to the clus-
ters as a value of a predefined scale of interest. This is
a nameable advantage. Several algorithms have been
proposed in the area of clustering large datasets, as
BIRCH (Zhang et al., 1996) and CLARANS (Cluster-
ing Large Applications based on RANdom Search) by
Ng and Han (Ng and Han, 1994). There are special-
ized fields, e.g. multi-relational data clustering. Yin
and Han proposed here CrossClus (Yin et al., 2007),
that clusters data stored in multiple relational tables
based on user guidance and multi-relational features.
This algorithm requires as CORD the help of the
user, that is here the person who wants to cluster
the elements. Clustering can be applied to various
domains and issues, e.g. in (Aggarwal et al., 2006)
the k-anonymity (a technique to preserve privacy in
data) is treated as a special clustering problem, called
r-cellular clustering. (Aggarwal et al., 2006) han-
dle categorical attributes by the representation as n
equidistant points in a metric space. Hybrid Systems
are used in various research fields, e.g. in the area of
Web Burke (Burke, 2002) has defined Hybrid Recom-
mender Systems, that combine information filtering
and collaborative filtering techniques. Helmer pro-
posed in (Helmer, 2007) a hybrid approach to mea-
sure the similarity of semistructured documents based
on entropy. (Kossmann et al., 2002) use a hybrid
approach to find the Skyline, i.e. a set of interesting
points from a potentially large set of data.
REFERENCES
Aggarwal, G., Feder, T., and Kenthapadi, K. (2006).
Achieving anonymity via clustering. In Proc. of the
25
th
ACM SIGMOD-SIGACT-SIGART symposium on
Principles of database systems, pages 153–162, NY,
USA.
Braun-Blanquet, J., Conard, H. S., and Fuller, G. D. (1932).
Plant sociology. McGraw-Hill book company.
http://www.biodiversitylibrary.org/bibliography/7161.
Burke, R. (2002). Hybrid recommender systems: Survey
and experiments. User Modeling and User-Adapted
Interaction, 12(4):331–370.
Chen, N. and Marques, N. C. (2005). An extension of self-
organizing maps to categorical data. In EPIA, Portu-
gal.
Cheu, E. Y., Kwoh, C. K., and Zhou, Z. (2004). On the
two-level hybrid clustering algorithm. Nanyang Tech-
nological University.
Chiu, T., Fang, D., Chen, J., Wang, Y., and Jeris, C. (2001).
A robust and scalable clustering algorithm for mixed
type attributes in large database environment. In Pro-
ceedings of the 7
th
ACM SIGKDD, pages 263–268,
NY, USA.
D.J. Newman, A. A. (2007). UCI machine learning reposi-
tory. http://archive.ics.uci.edu/ml/.
Gan, G., Yang, Z., and Wu, J. (2005). A genetic k-modes
algorithm for clustering categorical data. In ADMA,
pages 195–202.
Gugubarra (2009). Data set user profiles. www.dbis.cs.uni-
frankfurt.de/downloads/research/data.zip.
Helmer, S. (2007). Measuring the structural similarity of
semistructured documents using entropy. In Proc. of
the 33
rd
Int. Conf. on VLDBs, pages 1022–1032.
Huang, Z. (1997). A fast clustering algorithm to cluster
very large categorical data sets in data mining. In In
Research Issues on Data Mining and Knowledge Dis-
covery, pages 1–8.
Kim, D.-W., Lee, K. H., and Lee, D. (2004). Fuzzy cluster-
ing of categorical data using fuzzy centroids. Pattern
Recogn. Lett., 25(11):1263–1271.
Kossmann, D., Ramsak, F., and Rost, S. (2002). Shoot-
ing stars in the sky: an online algorithm for skyline
queries. In Proc. of the 28
th
Int. Conf. on VLDBs,
pages 275–286.
Ng, R. T. and Han, J. (1994). Efficient and effective clus-
tering methods for spatial data mining. In Proc. of the
20
th
Int. Conf. on VLDBs, pages 144–155, San Fran-
cisco, CA, USA. Morgan Kaufmann Pub. Inc.
Parmar, D., Wu, T., and Blackhurst, J. (2007). Mmr: An
algorithm for clustering categorical data using rough
set theory.
Podani, J. (2005). Multivariate exploratory analysis of ordi-
nal data in ecology: Pitfalls, problems and solutions.
Journal of Vegetation Science, 16(5):497–510.
Yin, X., Han, J., and Yu, P. S. (2007). Crossclus: user-
guided multi-relational clustering. Data Min. Knowl.
Discov., 15(3):321–348.
Zhang, T., Ramakrishnan, R., and Livny, M. (1996). Birch:
An efficient data clustering method for vldbs. In
Proc. of the ACM SIGMOD, pages 103–114, Mon-
treal, Canada.
Zicari, R. V., Hoebel, N., Kaufmann, S., and Tolle, K.
(2006). The design of gugubarra 2.0: A tool for build-
ing and managing profiles of web users. In Proc. of the
IEEE/WIC/ACM Int. Conf. on Web Intelligence, pages
317–320, Washington, DC, USA.
WEBIST 2010 - 6th International Conference on Web Information Systems and Technologies
306