Juan Manuel Pérez, Rafael Berlanga, María José Aramburu


During the last decade, data warehouse and OLAP techniques have helped companies to gather, organize and analyze the structured data they produce. Simultaneously, digital libraries have applied Information Retrieval mechanisms to query their repositories of unstructured documents. In this context, the emergence of XML means the convergence of these two approaches, making possible the development of warehouses for semi-structured information. Although there exist several extensions of traditional data warehouse technology to manage semi-structured information, none of them are based on an underlying document model able to exploit this kind of information. Along this paper we expose our vision of what a semi-structured information warehouse should be, by identifying a set of requirements throughout an example scenario.


  1. Kimball, R., 2002. The Data Warehouse toolkit. John Wiley & Sons.
  2. Codd, E. F.; Codd, S. B. and Salley, C.T., 1993. Providing OLAP to user-analysts: An IT mandate. Technical Report, E.F. Codd & Associates.
  3. Baeza-Yates, R. and Ribeiro-Neto, B., 1999. Modern Information Retrieval. Addison-Wesley.
  4. Navarro, G. and Baeza-Yates, R., 1997. Proximal Nodes: A Model to Query Document Databases by Contents and Structure. ACM Trans. on Information Systems.
  5. Xyleme, L., 2001. A dynamic warehouse for XML data of the Web. IEEE Data Engineering Bulletin 24(2).
  6. Aramburu, M. J. and Berlanga, R., 2001. A Temporal Object-Oriented Model for Digital Libraries of Documents. Concurrency: Practice and Experience 13 (11), John Wiley.
  7. Binh, N. T.; Tjoa, A. M. and Mangisengi, O., 2001. Meta Cube-X: An XML Metadata Foundation for Interoperability Search among Web Warehouses. Proc. Intl. Workshop on Design and Management of Data Warehouses.
  8. Mangisengi, O.; Huber, J.; Hawel, C. and Essmayr, W., 2001. A Framework for Supporting Interoperability of Data Warehouse Islands Using XML. Proc. of the 3rd Intl. Conference on Data Warehousing and Knowledge Discovery. LNCS 2114.
  9. Ishikawa, H. et al, 1999. Document Warehousing Based on a Multimedia Database System. Proc. IEEE 15th Intl. Conference on Data Engineering, pp. 168-173.
  10. Pedersen, D.; Riis, K. and Pedersen, T. B., 2002. XMLExtended OLAP Querying. Technical Report, Department of Conputer Science, Aalborg University.
  11. Pedersen, T. B.; Jensen, C. S. and Dyreson, C. E., 1999. Supporting Imprecision in Multidimensional Databases Using Granularities. Proc. of the Eleventh International Conference on Scientific and Statistical Database Management, pp. 90-101.
  12. Rundensteiner, E. and Bic., L., 1992 Evaluating Aggregates in Possibilistic Relational Databases. DKE, 7(3):239-267.

Paper Citation

in Harvard Style

Manuel Pérez J., Berlanga R. and José Aramburu M. (2004). SEMI-STRUCTURED INFORMATION WAREHOUSES - Requirements and Definition . In Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 972-8865-00-7, pages 579-582. DOI: 10.5220/0002633705790582

in Bibtex Style

author={Juan Manuel Pérez and Rafael Berlanga and María José Aramburu},
title={SEMI-STRUCTURED INFORMATION WAREHOUSES - Requirements and Definition},
booktitle={Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 1: ICEIS,},

in EndNote Style

JO - Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 1: ICEIS,
SN - 972-8865-00-7
AU - Manuel Pérez J.
AU - Berlanga R.
AU - José Aramburu M.
PY - 2004
SP - 579
EP - 582
DO - 10.5220/0002633705790582