dimension and a semantic dimension. Also, it has a new analysis measure adapted for
text analysis based on the modeling language notion. The documents semantics are ex-
tracted by using Wikipedia as an external knowledge source. To validate our approach,
we have developed a prototype composed of several processing modules that illustrate
the different phases of the ETL-Text. These modules are tested on the 20 Newsgroups
corpus. In perspective, we plan to define a new aggregation operators adapted to OLAP
analysis on textual data.
References
1. Bentayeb, F., Maiz, N., Mahboubi, H., Favre, C., Loudcher, S., Harbi, N., Boussaid, O., Dar-
mont, J.: Innovative Approaches for efficiently Warehousing Complex Data from the Web.
In: Business Intelligence Applications and the Web : Models, Systems and Technologies.
IGI BOOK (2012) 26–52
2. Lai, K.K., Yu, L., Wang, S.: Multi-agent web text mining on the grid for enterprise decision
support. In: Proceedings of the international conference on Advanced Web and Network
Technologies, and Applications. APWeb’06, Berlin (2006) 540–544
3. Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual modeling for etl processes. In: Pro-
ceedings of the 5th ACM international workshop on Data Warehousing and OLAP. DOLAP
’02, New York, NY, USA, ACM (2002) 14–21
4. Bleyberg, M., Ganesh, K.: Dynamic multi-dimensional models for text warehouses. In:
Systems, Man, and Cybernetics, 2000 IEEE International Conference on. Volume 3. (2000)
2045–2050 vol.3
5. Mothe, J., Chrisment, C., Dousset, B., Alaux, J.: Doccube: multi-dimensional visualisation
and exploration of large document sets. Journal of the American Society for Information
Science and Technology, JASIST, Special 54 (2003) 650659
6. Tseng, F.S.C., Chou, A.Y.H.: The concept of document warehousing for multi-dimensional
modeling of textual-based business intelligence. Decis. Support Syst. 42(2) (November
2006)
7. McCabe, M.C., Lee, J., Chowdhury, A., Grossman, D., Frieder, O.: On the design and eval-
uation of a multi-dimensional approach to information retrieval. In: Proceedings of the 23rd
annual international ACM SIGIR, New York, NY, USA (2000) 363–365
8. Ravat, F., Teste, O., Tournier, R., Zurlfluh, G.: A conceptual model for multidimensional
analysis of documents. In: Proceedings of the 26th international conference on Conceptual
modeling. ER’07, Berlin (2007) 550–565
9. Lin, C.X., Ding, B., Han, J., Zhu, F., Zhao, B.: Text cube: Computing ir measures for multi-
dimensional text database analysis. In: In ICDM. (2008) 905–910
10. Zhang, D., Zhai, C., Han, J., Srivastava, A., Oza, N.: Topic modeling for olap on multidimen-
sional text databases: topic cube and its applications. Stat. Anal. Data Min. 2(56) (December
2009)
11. P
´
erez, J.M., Berlanga, R., Aramburu, M.J., Pedersen, T.B.: A relevance-extended multi-
dimensional model for a data warehouse contextualized with documents. DOLAP ’05, New
York, NY, USA, ACM (2005) 19–28
12. Keith, S., Kaser, O., Lemire, D.: Analyzing large collections of electronic text using olap.
CoRR abs/cs/0605127 (2006)
13. Porter, M. F.: Readings in information retrieval. Morgan Kaufmann Publishers Inc., San
Francisco, CA, USA (1997) 313–316
14. Golfarelli, M., Maio, D., Rizzi, S.: Conceptual design of data warehouses from e/r schemes.
(1998) 334–343
126