6 CONCLUSIONS
In this research the Bisecting K-means clustering
technique was applied to cluster the social network
discussion groups using the groups’ meta-features.
The main contributions in this paper are: assigning a
suitable similarity measure for each meta-feature,
and enhancing the clustering quality by assigning a
weight for each feature using genetic algorithms.
Making use of the data of the members of a
group; namely the networks to which they belong
and the types and subtypes of the groups they joined
was the new idea in developing the similarity
measure. The similarity measures of network, type
and subtype features were based on building
statistical correlation for analyzing relationships
between a pair of feature values of group members.
One of the important results is that giving
weights to the features has increased the clustering
quality. When more weight is given to the group
description and type features better Silhouette
coefficient was obtained. The results of the
experiments illustrates the effect of social features,
induced from members data, as the best results were
obtained when the network, type, and subtype
features are combined in the experiment in which we
tried different combinations.
As a future work, more social features; for
example, the users posts, should be investigated and
used in building the clusters and see the effect of
these social features on the clustering quality. More
experiments are needed on larger dataset to prove
the preliminary findings explained in this work.
ACKNOWLEDGEMENTS
The authors of this paper are especially grateful for
Cairo Microsoft Innovation Center CMIC, for the
support to this research.
REFERENCES
Abrantes, A., 2000. A Constrained Clustering Algorithm
for Shape Analysis with Multiple Features. ICPR, 15th
International Conference on Pattern Recognition
(ICPR'00) - Volume 1, page 1916.
Antonellis, P., Makris, C., Tsirakis, N., 2008. XEdge:
Clustering Homogeneous and Heterogeneous XML
Documents using Edge Summaries. In Proceedings of
the 2008 ACM symposium on Applied computing,
Fortaleza, pages 1081-1088.
Blake, A., Isard, M. 1998. Active Con tours, Springer,
Chastain, L., 2008. Social networking for Businesses and
Association. Cerado Inc. Half Moon Bay.
Costa, G., Manco, G., Ortale, R, Tagarelli, A., 2004. A
Tree-Based Approach to Clustering XML Documents
by Structure. In Proceedings of the 8th European
Conference on Principles and Practice Knowledge
Discovery in Databases (PKDD ’04).Pisa, pages 137-
148.
Dalamagas, T., Cheng, T., Winkel, K., Sellis, T.K., 2006.
A methodology for clustering XML documents by
structure. In Information Systems Journal, 31(3),
pages 187-228.
Doucet, A., Ahonen-Myka, H., 2002. Naïve Clustering of
a large XML Document Collection. In Proceedings of
the 2002 Initiative for the Evaluation of XML
Retrieval Workshop (INEX ’02), pages 81-87.
Eterfreund, N., 1998. Robust Tracking with Spatio-
Velocity Snakes: Kalman Filtering Approach. ICCV,
pages 433-439.
Frakes, W. B., Baeza-Yates, R., 1992. Information
Retrieval: Data Structures and Algorithms, Prentice
Hall, Englewood Cliffs.
Kleinberg, J., Papadimitriou, C., Raghavan, P., 1998. A
Microeconomic View of Data Mining. Data Mining
and Knowledge Discovery, 2(4), pages 311-324.
Lee, H., Lee, C., Kim, S., 2000. Abrupt Shot Change
Detection using an Unsupervised Clustering of
Multiple Features. In Proceedings of IEEE
International Conference on Acoustics, Speech, and
Signal Processing, Volume 6, pages 2015 - 2018
Modha, D., Spangler, S., 2003. Feature Weighting in K-
means clustering. Machine Learning, 52(3), pages
217-237.
Nayak, R., Xu, S., 2006. XCLS: A Fast and Effective
Clustering Algorithm for Heterogeneous XML
Documents. In Proceedings of the Pacific-Asia
Conference on Knowledge Discovery and Data
Mining (PAKDD ’06). Singapore, pages 292-302.
Salton, G., McGill, M. J., 1983. Introduction to Modern
Retrieval, McGraw-Hill Book Company.
Singhal, A., Buckley, C., Mitra, M., Salton, G., 1996.
Pivoted Document Length Normalization. In Proc.
ACM SIGIR, pages 21-29.
Tagarelli, A., Greco, S., 2006. Toward Semantic XML
Clustering. In Proceedings of the 2006 Siam
Conference on Data Mining (SDM ’06). Maryland,
pages188-199.
Tan, P., Steinbach, M., Kumar, V., 2006. Introduction to
Data Mining, Pearson Addison Wesley.
Witten, I., Paynter, G., Frank, E., Gutwin, C., Neville-
Manning, C., 1999. KEA: Practical Automatic
Keyphrase Extraction. In Proceedings of the Fourth
ACM Conference on Digital Libraries, Berkeley,
pages 254-255.
Zhong, Y., Jain, A., Dubuisson-Jolly, M., 1998. Object
Tracking Using Deformable Templates, ICCV, pages
440-446.
ICEIS 2011 - 13th International Conference on Enterprise Information Systems
210