Clustering Users’ Requirements Schemas

Nouha Arfaoui, Jalel Akaichi

Abstract

Data Mining proposes different techniques to deal with data. In our work, we suggest the use of clustering technique since we want grouping the schemas into clusters according to their similarity. This technique is applied to variety type of variables. We focus on categorical data. Many algorithms are proposed, but no one of them takes into consideration the semantic aspect. For this reason, and in order to ensure a good clustering of the schemas of the users’ requirements, we extend the k-mode algorithm by modifying its dissimilarity measure. The schemas within each cluster will be merged to construct the schemas of the data mart.

References

  1. Alexander, J. H., Freiling, M. J., Shulman, S. J., Staley, J. L., Rehfuss, S., and Messick, S. L., 1986. Knowledge Level Engineering: Ontological Analysis. In Proceedings of the 5th National Conference on Artificial Intelligence, AAAI-86, 963-968.
  2. Alexiev, V., Breu, M., De Bruijn, J., Fensel, D., Lara, R., and Lausen, H., 2005. Information Integration with Ontologies: Experiences from an Industrial Showcase, John Wiley & Son.
  3. Andreopoulos, B., An, A., and Wang, X., 2004. MULIC: Multi-Layer Increasing Coherence Clustering of Categorical data sets. Technical Report CS-2004-07, York University.
  4. Andritsos, P., Tsaparas, P., Miller, R. J., and Sevcik, K. C., 2004. LIMBO: Scalable Clustering of Categorical Data. In Proceedings of the 9th International Conference on Extending Database Technology (EDBT), Heraklion, Greece, 123-146.
  5. Annoni, E., Ravat, F., Teste, O., and Zurfluh, G., 2006. Towards Multidimensional Requirement Design. In Proceedings of 8th International Conference Data Warehousing and Knowledge Discovery (DaWaK), 75-84.
  6. Arfaoui. N., Akaichi. J., 2013. New Approach for the Collection of Users' Requirements using DwADS. In Proceedings of 22nd International Business Information Management Association (IBIMA), Rome, Italy,
  7. Barbara, D., Couto, J., and Li, Y., 2002. COOLCAT: An entropy-based algorithm for categorical clustering. In Proceedings of the eleventh international conference on Information and knowledge management, 582-589.
  8. Batet, M., Valls, A., and Gibert, K.., 2008. Improving classical clustering with ontologies. In Proceedings of the 4th world conference of the international association for statistical computing, 137-146.
  9. Chavent, M., Kuentz, V., and Saracco, J., 2010. Clustering of categorical variables around latent variables. Cahiers du GREThA 2010-02, Groupe de Recherche en Economie Theorique et Appliquee.
  10. Chen, D., Cui, D.W., Wang, C.X., and Wang, Z. R., 2006. A Rough Set-Based Hierarchical Clustering Algorithm for Categorical Data. International Journal of Information Technology.
  11. Faber, V., 1994. Clustering and the Continuous k-means Algorithm. Los Alamos Science, 138-144.
  12. Guha, S., Rastogi, R., and Shim, K.., 2000. ROCK: A Robust Clustering Algorithm for Categorical Attributes. In: Inf. Syst., Vol. 25, Nr. 5 Oxford, UK, UK: Elsevier Science Ltd., 345-366.
  13. Gyssens, M. and Lakshmanan, L. V. S., 1997. A Foundation for Multi-dimensional Databases. In Proceedings of 23rd International Conference on Very Large Data Bases (VLDB), 106-11.
  14. Hand, D., Mannila, H., and Smyth, P., 2001. Principles of Data Mining. MIT Press, Cambridge, MA.
  15. Hotho, A., Staab, S., and Stumme, G., 2003. Wordnet improves Text Document Clustering. In Proceedings of the SIGIR 2003 Semantic Web Workshop.
  16. Huang, Z., 1998. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery, 2:283-304.
  17. Jain, A. K., Murty, M. N., and Flynn, P. J., 1999. Data Clustering: A Review. ACM Comput. Surv., 264-323.
  18. Jing, L., Zhou, L., Ng, M. K., and Huang, J. Z., 2006. Ontology-based Distance Measure for Text Clustering. In Proceeding of SIAM International conference on Text Data Mining, Bethesda.
  19. Khan, S. S., Kant, S., 2007. Computation of Initial Modes for K-modes Clustering Algorithm using Evidence Accumulation. International Joint Conference on Artificial Intelligence, 2785-2789.
  20. Malinowski, E., and Zimanyi, E., 2008. Advanced Data Warehouse Design, From Conventional to Spatial and Temporal Applications, Springer Verlag Berlin Heidelberg.
  21. Ng, M. K., Li, M. J., Huang, J. Z., and He, Z., 2007. On the Impact of Dissimilarity Measure in k-modes Clustering Algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29 (3): 503-507.
  22. Quine, W.V.O., 1980. From a Logical Point of View. Harvard University Press; Cambridge, MA.
  23. Rezankova, H., 2009. Cluster Analysis and Categorical Data. Statistika, 216-232.
  24. San, O. M., Huynh, V. N., and Nakamori, Y., 2004. An Alternative Extension Of The K-Means Algorithm For Clustering Categorical Data. Journal of Applied Mathematics and Computer Science, No. 2, 241-247.
  25. Studer, R., Benjamins, V. R., and Fensel, D., 1998. Knowledge Engineering: Principles and Methods. IEEE Trans on Data and Knowledge Engineering, 25 (1-2): 161-197.
  26. Tibshirani, R., Walther, G.., and Hastie, T., 2001. Estimating the number of clusters in a data set via the gap statistic, J. R. Statist. Soc. B, 411-423.
Download


Paper Citation


in Harvard Style

Arfaoui N. and Akaichi J. (2014). Clustering Users’ Requirements Schemas . In Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA, ISBN 978-989-758-035-2, pages 15-21. DOI: 10.5220/0004991100150021


in Bibtex Style

@conference{data14,
author={Nouha Arfaoui and Jalel Akaichi},
title={Clustering Users’ Requirements Schemas},
booktitle={Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA,},
year={2014},
pages={15-21},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004991100150021},
isbn={978-989-758-035-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA,
TI - Clustering Users’ Requirements Schemas
SN - 978-989-758-035-2
AU - Arfaoui N.
AU - Akaichi J.
PY - 2014
SP - 15
EP - 21
DO - 10.5220/0004991100150021