Integration Process for Multidimensional Textual Data Modeling

Rachid Aknouche, Ounas Asfari, Fadila Bentayeb, Omar Boussaid

Abstract

In this paper, we propose an original approach for text warehousing process. It is based on a decisional architecture which combines classical data warehousing tasks and information retrieval (IR) techniques. We first propose a new ETL process, named ETL-Text, for textual data integration and then, we present a new Text Warehouse Model, denoted TWM, which takes into account both the structure and the semantics of the textual data. TWM is associated with new dimensions types including: a metadata dimension and a semantic dimension. In addition, we propose a new analysis measure based on the modeling language widely used in IR area. Moreover, our approach is based on Wikipedia as external knowledge source to extract the semantics of the textual documents. To validate our approach, we develop a prototype composed of several processing modules that illustrate the different steps of the ETL-Text. Also, we use the 20 Newsgroups corpus to perform our experimentations.

References

  1. Bentayeb, F., Maiz, N., Mahboubi, H., Favre, C., Loudcher, S., Harbi, N., Boussaid, O., Darmont, J.: Innovative Approaches for efficiently Warehousing Complex Data from the Web. In: Business Intelligence Applications and the Web : Models, Systems and Technologies. IGI BOOK (2012) 26-52
  2. Lai, K.K., Yu, L., Wang, S.: Multi-agent web text mining on the grid for enterprise decision support. In: Proceedings of the international conference on Advanced Web and Network Technologies, and Applications. APWeb'06, Berlin (2006) 540-544
  3. Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual modeling for etl processes. In: Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP. DOLAP 7802, New York, NY, USA, ACM (2002) 14-21
  4. Bleyberg, M., Ganesh, K.: Dynamic multi-dimensional models for text warehouses. In: Systems, Man, and Cybernetics, 2000 IEEE International Conference on. Volume 3. (2000) 2045-2050 vol.3
  5. Mothe, J., Chrisment, C., Dousset, B., Alaux, J.: Doccube: multi-dimensional visualisation and exploration of large document sets. Journal of the American Society for Information Science and Technology, JASIST, Special 54 (2003) 650659
  6. Tseng, F.S.C., Chou, A.Y.H.: The concept of document warehousing for multi-dimensional modeling of textual-based business intelligence. Decis. Support Syst. 42(2) (November 2006)
  7. McCabe, M.C., Lee, J., Chowdhury, A., Grossman, D., Frieder, O.: On the design and evaluation of a multi-dimensional approach to information retrieval. In: Proceedings of the 23rd annual international ACM SIGIR, New York, NY, USA (2000) 363-365
  8. Ravat, F., Teste, O., Tournier, R., Zurlfluh, G.: A conceptual model for multidimensional analysis of documents. In: Proceedings of the 26th international conference on Conceptual modeling. ER'07, Berlin (2007) 550-565
  9. Lin, C.X., Ding, B., Han, J., Zhu, F., Zhao, B.: Text cube: Computing ir measures for multidimensional text database analysis. In: In ICDM. (2008) 905-910
  10. Zhang, D., Zhai, C., Han, J., Srivastava, A., Oza, N.: Topic modeling for olap on multidimensional text databases: topic cube and its applications. Stat. Anal. Data Min. 2(56) (December 2009)
  11. PĂ©rez, J.M., Berlanga, R., Aramburu, M.J., Pedersen, T.B.: A relevance-extended multidimensional model for a data warehouse contextualized with documents. DOLAP 7805, New York, NY, USA, ACM (2005) 19-28
  12. Keith, S., Kaser, O., Lemire, D.: Analyzing large collections of electronic text using olap. CoRR abs/cs/0605127 (2006)
  13. Porter, M. F.: Readings in information retrieval. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1997) 313-316
  14. Golfarelli, M., Maio, D., Rizzi, S.: Conceptual design of data warehouses from e/r schemes. (1998) 334-343
Download


Paper Citation


in Harvard Style

Aknouche R., Asfari O., Bentayeb F. and Boussaid O. (2013). Integration Process for Multidimensional Textual Data Modeling . In Proceedings of the 1st International Workshop in Software Evolution and Modernization - Volume 1: SEM, (ENASE 2013) ISBN 978-989-8565-66-2, pages 119-126. DOI: 10.5220/0004602501190126


in Bibtex Style

@conference{sem13,
author={Rachid Aknouche and Ounas Asfari and Fadila Bentayeb and Omar Boussaid},
title={Integration Process for Multidimensional Textual Data Modeling},
booktitle={Proceedings of the 1st International Workshop in Software Evolution and Modernization - Volume 1: SEM, (ENASE 2013)},
year={2013},
pages={119-126},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004602501190126},
isbn={978-989-8565-66-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Workshop in Software Evolution and Modernization - Volume 1: SEM, (ENASE 2013)
TI - Integration Process for Multidimensional Textual Data Modeling
SN - 978-989-8565-66-2
AU - Aknouche R.
AU - Asfari O.
AU - Bentayeb F.
AU - Boussaid O.
PY - 2013
SP - 119
EP - 126
DO - 10.5220/0004602501190126