Ana Isabel Oviedo, Oscar Ortega


A complex multimedia object is an information unit composed by multiple media types like text, images, audio and video. Applications related with huge sets of such objects exceed the human capacity to synthesize useful information. The search for similarities and dissimilarities among objects is a task that has been done through clustering analysis, which tries to find groups in unlabeled data sets. Such analysis applied to complex multimedia object sets has a special restriction. The method must analyze the multiple media types present in the objects. This paper proposes a clustering ensemble that jointly assesses several media types present in this kind of objects. The proposed ensemble was applied to cluster webpages, constructing a text and image clustering prototypes. The Hubert’s statistic was used to evaluate the ensemble performance, showing that the proposed method creates clustering structures more similar to the real classification than a joint-feature vector.


  1. Algergawy, A., Schallehn, E., and Saake, G. (2008). A schema matching-based approach to xml schema clustering. In Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services, pages 131-136, New York, NY, USA. ACM.
  2. Bae, E. and Bailey, J. (2006). Coala: A novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In IEEE International Conference on Data Mining, pages 53-62.
  3. Caruana, R., Elhawary, M., Nguyen, N., and Smith, C. (2006). Meta clustering. In Proceedings of the Sixth International Conference on Data Mining, ICDM 06, pages 107-118, Washington, DC, USA. IEEE Computer Society.
  4. Carvalho, F. (2007). Fuzzy c-means clustering methods for symbolic interval data. Pattern Recognition Letters, 28(4):423-437.
  5. Choubassi, M. E., Nefian, A., Kozintsev, I., Bouguet, J., and Wu, Y. (2007). Web image clustering. In IEEE International Conference on Acoustics, Speech and Signal Processing, volume 4, pages 15-20.
  6. Davidson, I. and Qi, Z. (2008). Finding alternative clusterings using constraints. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pages 773-778, Washington, DC, USA. IEEE Computer Society.
  7. Dhillon, I. S. and Modha, D. S. (2001). Concept decompositions for large sparse text data using clustering. Machine Learning, 42(1-2):143-175.
  8. Dimitrova, N. and Golshani, F. (1995). Motion recovery for video content classification. ACM Trans. Inf. Syst., 13:408-439.
  9. Dy, J. and Brodley, C. (2004). Feature selection for unsupervised learning. Journal of Machine Learning Research, 5:845-889.
  10. Feng, Z., Bao, J., and Shen, J. (2010). Dynamic and adaptive self organizing maps applied to high dimensional large scale text clustering. In Software Engineering and Service Sciences ICSESS, pages 348-351. IEEE International Conference.
  11. Filippone, M., Camastra, F., Masulli, F., and Rovetta, S. (2008). A survey of kernel and spectral methods for clustering. Pattern Recognition, 41:176-190.
  12. Forestier, G., Wemmert, C., and Gancarski, P. (2010). Towards conflict resolution in collaborative clustering.
  13. In Intelligent Systems (IS), 2010 5th IEEE International Conference, pages 361-366.
  14. Forestier, G., Wemmert, C., and Ganc¸arski, P. (2008). Multisource images analysis using collaborative clustering. EURASIP J. Adv. Signal Process, 2008:133:1- 133:11.
  15. Francois, O., Ancelet, S., and Guillot, G. (2006). Bayesian clustering using hidden markov random fields in spatial population genetics. Genetics, 174:805-816.
  16. Fred, A. and Jain, A. (2005). Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6):835-850.
  17. Gancarski, P. and Wemmert, C. (2007). Collaborative multistep mono-level multi-strategy classification. Multimedia Tools Appl., 35:1-27.
  18. Halkidi, M., Batistakis, Y., and Vazirgiannis, M. (2002). Cluster validity methods: part i. ACM SIGMOD Record, 31(2).
  19. Hashimoto, W., Nakamura, T., and Miyamoto, S. (2009). Comparison and evaluation of different cluster validity measures including their kernelization. Journal of Advanced Computational Intelligence, 13(3).
  20. Hofmann, T., Scholkopf, B., and Smola, A. (2008). Kernel methods in machine learning. The Annals of Statistcs, 36(3):1171-1220.
  21. Hoi, S. and Lyu, M. (2008). A multimodal and multilevel ranking scheme for large-scale video retrieval. Multimedia, IEEE Transactions on, 10:607-619.
  22. Hubert, L. and Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1):193-218.
  23. Hunter, J. and Choudhury, S. (2003). Implementing preservation strategies for complex multimedia objects. In Seventh European Conference on Research and Advanced Technology for Digital Libraries, ECDL 2003, pages 473-486. Springer.
  24. Jain, A., Murty, M., and Flynn, P. J. (1999). Data clustering: a review. ACM Computing Surveys, 31(3):264-323.
  25. Jain, A. K. (2010). Data clustering: 50 years beyond kmeans. Pattern Recognition Letters, 31(8):651-666.
  26. Jiamthapthaksin, R., Eick, C. F., and Rinsurongkawong, V. (2009). An architecture and algorithms for multi-run clustering. In Computational Intelligence Symposium on Data Mining CIDM 09, pages 306-313.
  27. Kriegel, H.-P., Kunath, P., Pryakhin, A., and Schubert, M. (2008). Distribution-based similarity for multirepresented multimedia objects. In Proceedings of the 14th international conference on Advances in multimedia modeling, MMM 08, pages 155-164, Berlin, Heidelberg. Springer-Verlag.
  28. Law, M. H. C., Topchy, A. P., and Jain, A. K. (2004). Multiobjective data clustering. In Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition, CVPR 04, pages 424- 430, Washington, DC, USA. IEEE Computer Society.
  29. Liu, T., Liu, S., Chen, Z., and Ma, W. (2003). An evaluation on feature selection for text clustering. In Proceedings of the 20th International Conference on Machine Learning, pages 448-495. AAAI Press.
  30. Liu, Z., Wang, Y., and Chen, T. (1998). Audio feature extraction and analysis for scene segmentation and classification. In Journal of VLSI Signal Processing System, volume 20, pages 61-79.
  31. Lu, L., Zhang, H.-J., Member, S., and Jiang, H. (2002). Content analysis for audio classification and segmentation. IEEE Transactions on Speech and Audio Processing, 10(4):504-516.
  32. Meinedo, H. and Neto, J. (2003). Audio segmentation, classification and clustering in a broadcast news task. In Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP 7803).
  33. Meneses, E. (2006). Vectors and graphs: Two representations to cluster web sites using hyperstructure. In Latin American Web Congress, pages 20-25.
  34. Ngo, C.-W., Pong, T.-C., and Zhang, H.-J. (2001). On clustering and retrieval of video shots. In Proceedings of the ninth ACM international conference on Multimedia, MULTIMEDIA 01, pages 51-60, New York, NY, USA. ACM.
  35. Osinski, S. and Weiss, D. (2004). Conceptual clustering using lingo algorithm: Evaluation on open directory project data. In IIPWM04, pages 369-377.
  36. Romesburg, C. (2004). Cluster Analysis for Researchers. Lulu Press.
  37. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1-47.
  38. Strehl, A. and Ghosh, J. (2003). Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res., 3:583-617.
  39. Topchy, A., Jain, A. K., and Punch, W. (2005). Clustering ensembles: Models of consensus and weak partitions. IEEE Transactions on pattern analysis and machine intelligence, 27:1866-1881.
  40. Wang, Y., Liu, Z., and Huang, J.-C. (2000). Multimedia content analysis using both audio and visual cues. IEEE Signal Processing Magazine, 17(6):12-36.
  41. Wong, W. and Fu, A. (2000). Incremental document clustering for web page classification. In In IEEE 2000 Int. Conf. on Info. Society in the 21st, pages 5-8.
  42. Xie, X. and Beni, G. (1991). A validity measure for fuzzy clustering. IEEE Trans. on Pattern Analysis and Machine Intelligence, 13(4):841-846.
  43. Xu, R. and Wunsch, D. (2005). Survey of clustering algorithms. IEEE Trans. Neural Networks, 16(3):645-667.
  44. Yang, A., Jiang, L., and Zhou, Y. (2007). A kfcm-based fuzzy classifier. In Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 02, FSKD 07, pages 80-84, Washington, DC, USA. IEEE Computer Society.
  45. Yang, Y., Zhuang, Y.-T., Wu, F., and Pan, Y.-H. (2008). Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Transactions on Multimedia, 10(3):437- 446.
  46. Yeung, M., Yeo, B., and Liu, B. (1996). Extracting story units from long programs for video browsing and navigation. In Proceedings of the 1996 International Conference on Multimedia Computing and Systems, pages 296-305, Washington, DC, USA. IEEE Computer Society.
  47. Zhang, Y. and Rueda, L. (2005). A geometric framework to visualize fuzzy-clustered data. In Chilean Computer Science Society, SCCC.
  48. Zhong, D. and Hongjiang, D. Z. (1997). Clustering methods for video browsing and annotation. Technical report, In SPIE Conference on Storage and Retrieval for Image and Video Databases.
  49. Zhuang, Y., Yi, Y., and Fei, W. (2008). Mining semantic correlation of heterogeneous multimedia data for cross- media retrieval. IEEE Transactions on Multimedia, 10(2):221-229.

Paper Citation

in Harvard Style

Isabel Oviedo A. and Ortega O. (2012). CLUSTERING COMPLEX MULTIMEDIA OBJECTS USING AN ENSEMBLE APPROACH . In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8425-98-0, pages 134-143. DOI: 10.5220/0003794501340143

in Bibtex Style

author={Ana Isabel Oviedo and Oscar Ortega},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},

in EndNote Style

JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
SN - 978-989-8425-98-0
AU - Isabel Oviedo A.
AU - Ortega O.
PY - 2012
SP - 134
EP - 143
DO - 10.5220/0003794501340143