Franck Ravat, Olivier Teste, Ronan Tournier



For more than a decade, OLAP and multidimensional analysis have generated methodologies, tools and resource management systems for the analysis of numeric data. With the growing availability of semi-structured data there is a need for incorporating text-rich document data in a data warehouse and providing adapted multidimensional analysis. This paper presents a new aggregation function for keywords allowing the aggregation of textual data in OLAP environments as traditional arithmetic functions would do on numeric data. The AVG_KW function uses an ontology to join keywords into a more common keyword.


  1. Agrawal R., Bayardo R.J., Srikant R., 2000. “Athena: Mining-based Interactive Management of Text Databases”, 7th Int. Conf. on Extending Database Technology (EDBT 2000), LNCS 1777, Springer, pp. 365-379.
  2. Bender M.A., Farach-Colton M., 2000. “The LCA Problem Revisited”, 4th Latin American Symposium on Theoretical Informatics (LATIN 2000), LNCS 1776, Springer-Verlag, pp. 88-94.
  3. Chakrabarti S., Dom B., Agrawal R., Raghavan P., 1998. “Scalable Feature Selection, Classification and Signature Generation for Organizing Large Text Databases into Hierarchical Topic Taxonomies”, in The VLDB Journal, vol.7(3), pp. 163-178.
  4. Golfarelli M., Maio D., Rizzi S., 1998. “The Dimensional Fact Model: a Conceptual Model for Data Warehouses”, in Int. Journal of Cooperative Information Systems, vol. 7, n. 2&3.
  5. Gyssen M., Lakshmanan L.V.S., 1997. “A Foundation for Multi-Dimensional Databases”, in 23rd Int. Conf. on Very Large Data Bases (VLDB 1997), pp. 106-115.
  6. Harel D., Tarjan R.E., 1984. “Fast algorithms for finding nearest common ancestors”, in SIAM Journal on Computing Archive, vol.13(2), pp. 338-355.
  7. Horner J., Song I-Y., Chen P.P., 2004. “An analysis of additivity in OLAP systems”, ACM 7th Int. Workshop on Data Warehousing and OLAP (DOLAP 2004), ACM, pp. 83-91, 2004.
  8. Keith S., Kaser O., Lemire D., 2005. “Analyzing Large Collections of Electronic Text Using OLAP”, 29th Conf. Atlantic Provinces Council on the Sciences (APICS 2005), Wolville, Canada.
  9. Khrouf K., Soulé-Dupuy C., 2004. “A Textual Warehouse Approach: A Web Data Repository”, in Intelligent Agents for Data Mining and Information Retrieval, Masoud Mohammadian (Ed.), Idea Publishing Group, pp. 101-124.
  10. Kimball R., 1996. “The data warehouse toolkit”, Ed. John Wiley and Sons, 2nd ed. 2003.
  11. Lassila O., McGuinness D.L., 2001. “The Role of FrameBased Representation on the Semantic Web”, Knowledge Systems Laboratory Report KSL-01-02, Stanford University. (Also appeared in Computer and Information Science, Vol.6(5), Linköping University, 2001).
  12. McCabe C., Lee J., Chowdhury A., Grossman D. A., Frieder O., 2000. “On the design and evaluation of a multi-dimensional approach to information retrieval”, 23rd Annual Int. ACM Conf. on Research and Development in Information Retrieval (SIGIR 2000), ACM, pp. 363-365.
  13. Mothe J., Chrisment C., Dousset B., Alau J., 2003. “DocCube: Multi-dimensional visualisation and exploration of large document sets”, in Journal of the American Society for Information Science and Technology (JASIST), vol.54(7), pp. 650-659.
  14. Niemi T., Niinimäki M., Nummenmaa J., Thanisch P., 2002. “Constructing an OLAP cube from distributed XML data”, 5th ACM Int. Workshop on Data Warehousing and OLAP (DOLAP 2002), ACM, pp.22-27.
  15. Park B-K., Han H., Song I-Y., 2005. “XML-OLAP: A Multidimensional Analysis Framework for XML Warehouses”, 6th Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK 2005), LNCS 3589, Springer, pp.32-42.
  16. Pérez J.M., Llavori R.B., Aramburu M.J., Pedersen T.B., 2005. “A relevance-extended multi-dimensional model for a data warehouse contextualized with documents”, 8th ACM Int. Workshop on Data Warehousing and OLAP (DOLAP 2005), ACM, pp.19-28.
  17. Pokorný J., 2001. “Modelling Stars Using XML”, in Proc. 4th ACM Int. Workshop on Data Warehousing and OLAP (DOLAP 2001), pp.24-31.
  18. Ravat F., Teste O., Zurfluh G., 2006. “Constraint-Based Multi-Dimensional Databases”, Chapter XI in Database Modeling for Industrial Data Management, Zongmin Ma (ed.), IDEA Group, pp.323-368.
  19. Sullivan D., 2001. Document Warehousing and Text Mining, Wiley John & Sons.
  20. Torlone R., 2003. “Conceptual Multidimensional Models”, Chapter III in Multidimensional Databases: Problems and Solutions, M. Rafanelli (ed.), Idea Group, pp.69-90.
  21. Tseng F.S.C., Chou A.Y.H, 2006. “The concept of document warehousing for multi-dimensional modeling of textual-based business intelligence”, in journal of Decision Support Systems (DSS), vol.42(2), Elsevier, pp. 727-744.
  22. Vrdoljak B., Banek M., Rizzi S., 2003. “Designing Web Warehouses from XML Schemas”, 5th Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK 2003), LNCS 2737, Springer, pp.89-98.
  23. Yin X., Pedersen T.B., 2004. “Evaluating XML-extended OLAP queries based on a physical algebra”, 7th ACM Int. Workshop on Data Warehousing and OLAP (DOLAP 2004), ACM, pp.73-82.
  24. Zhang J., Ling T.W., Bruckner R.M., Tjoa A.M., 2003. “Building XML Data Warehouse Based on Frequent Patterns in User Queries”, 5th Int. Conf. on Data Warehousing and Knowledge Discovery (DaWaK 2003), LNCS 2737, Springer, pp.99-108.

Paper Citation

in Harvard Style

Ravat F., Teste O. and Tournier R. (2007). OLAP AGGREGATION FUNCTION FOR TEXTUAL DATA WAREHOUSE . In Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-972-8865-88-7, pages 151-156. DOI: 10.5220/0002364401510156

in Bibtex Style

author={Franck Ravat and Olivier Teste and Ronan Tournier},
booktitle={Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS,},

in EndNote Style

JO - Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS,
SN - 978-972-8865-88-7
AU - Ravat F.
AU - Teste O.
AU - Tournier R.
PY - 2007
SP - 151
EP - 156
DO - 10.5220/0002364401510156