Galaxy-Gen - A Tool for Building Galaxy Model from XML Documents

Ines Ben Messaoud, Jamel Feki, Gilles Zurfluh

2014

Abstract

A galaxy model is a multidimensional model dedicated for XML document warehouses. It can be seen as a network of entities (i.e., dimensions) connected via nodes. After giving an overview of our four-steps semi-automated method for the generation of galaxy models which aims to build data marts from XML documents. This paper focuses on the software tool, called Galaxy-Gen that implements the proposed method. We illustrate the Galaxy-Gen functionalities and make its first assessment through two experiments. The first experiment is applied to a set of twenty XML documents taken from the academic domain. The second one addressed a set of 1691 XML documents issued from the Clef-2007 collection. The assessment is performed by comparing manual design galaxy models with those produced by the Galaxy-Gen tool. The results are very promising.

References

  1. Aouabed, H., Ben Messaoud, I., Feki, J., Zurfluh, G., 2012. USD : Un outil d'unification des structures des documents XML. In ASD'12 Atelier des Systèmes Décisionnels. Algeria, 83-94.
  2. Ben Abdallah, M., Feki, J., Ben-Abdallah, H. 2008. Patrons multidimensionnels constraints. In SIIE'08 Conférence Internationale des Systèmes d'Information et Intelligence Economique. Tunisia, 14-16.
  3. Ben Messaoud, I., Feki, J., Khrouf, K., Zurfluh, G., 2011a. Unification of XML document structures for Document Warehouse (DocW). In ICEIS'11, 13th International Conference on Entreprise Information Systems. Beijing, 85-94.
  4. Ben Messaoud, I., Feki, J., Zurfluh, G., 2011b. Modélisation multidimensionnelle des documents XML. In RNTI' 2011, Revue des Nouvelles Technologies de l'Information. Vol. B-7, 55-70.
  5. Ben Messaoud, I., Feki, J., Zurfluh, G., 2012. A First Step for Building a Document Warehouse: Unification of XML Documents. In RCIS'12, Sixth International Conference on Research Challenges in Information Science. Spain, 59-64.
  6. Boussaïd, O., Ben Messaoud, R., Choquet, R., Anthoard, S., 2006. X-Warehousing: an XML-Based Approach for Warehousing Complex Data. In ADBIS'06, 10th East-European Conference on Advances in Databases and Information Systems. Germany LNCS, Vol. 4152, Springer, 39-54.
  7. Carpani, F., Ruggia, R., 2001. An Integrity Constraints Language for a Conceptual Multidimensional Data Model. In SEKE'01, 13th International Conference on Software Engineering & Knowledge Engineering. Argentina, 220-227.
  8. Feki, J., Ben Messaoud, I., Zurfluh, G., 2013. Building an XML Document Warehouse. In JDS'13, Journal of Decision System. Ed. Taylor & Francis, Vol. 22 n° 2/2013, pages 122-148, DOI: 10.1080/12460125.2013.780322
  9. Fuhr, N., Grobjohann, Kai., 2001. XIRQL: a query language for information retrieval in XML documents. In SIGIR'01, 24th International ACM Conference on Research and Developement in Information Retrieval. ACM Press, 172-180.
  10. Ghozzi, F., Ravat, F., Teste, O., Zurfluh, G., 2003. Constraints and multidimensional databases. In ICEIS'03, 5th International Conference on Enterprise Information Systems. France, 104-111.
  11. Golfarelli, M., Maio, D., Rizzi, S., 1998. The dimensional fact model: a conceptual model for data warehouses. In IJCIS'98, International Journal of Cooperative Information Systems. 215-247.
  12. Hachaichi, Y., Feki, J., Ben-Abdallah, H., 2010. Modélisation multidimensionnelle de documents XML centrés-données. In JDS'10, Journal of Decision Systems, 313-345.
  13. Hümmer, W., Bauer, A., Harde, G., 2003. XCube-XML for Data Warehouses. In DOLAP'03, Proc. Sixth ACM Int'l Workshop Data Warehousing and OLAP, 33-40.
  14. Hurtado, C. A., Mendelzon, A. O., 2002. OLAP Dimension Constraints. In PODS'02, 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. USA, 169-179.
  15. Kamps, J., Marx, M., De Rijke, M,. Sigurbjornsson, B., 2004. Best-Match Querying from Document-Centric XML. In Proceedings of the Seventh International Workshop the Web and Databases. 55-60.
  16. Khrouf, K., 2004. Entrepôts de documents : De l'alimentation à l'exploitation. PhD thesis, University of Toulouse III, France.
  17. Lee, M. L., Yang, L. H., Hsu, W., Yang, X., 2002. XClust: clustering XML schemas for effective integration. In CIKM'02, Proceeding of the ACM International Conference on Information and Knowledge Management. Virginia, 292-299.
  18. McCabe, M. C., Lee, J., Chowdhury, A., Grossman, D., Frieder, O., 2000. On the design and evaluation of a multi-dimensional approach to information retrieval. In SIGR'00, Proceedings of the 23th Annual International ACM SIGIR Conference. 363-365.
  19. Pérez, M. J. M., Berlanga, L. M. R., Aramburu, C. M. J., Pederson, T. B., 2008. Contextualizing data warehouses with documents. In Decision Support System. Vol. 45. Elseiver, 77-94.
  20. Pujolle, G., Ravat, F., Teste, O., Tournier, R., 2011. Multidimensional Database Design from DocumentCentric XML Documents. In DAWAK'11, International Conference on Data Warehousing and Knowledge Discovery. France , 51-65.
  21. Ravat, F., Teste, O., 2000. A temporal object-oriented data warehouse model. In DEXA'00, Database and Expert Systems Applications. London, 583-592.
  22. Ravat, F., Teste, O., Tournier, R., 2007. Analyse multidimensionnelle de documents via des dimensions OLAP. In Revue Document numérique Entreposage de documents et données semi-structurées. 85-104.
  23. Sullivan, D., 2001. Document Warehousing and Text Mining: Techniques for Improving Business Operations. Marketing and Sales. John Wiley & Sons.
  24. Tournier, R., 2007. Analyse en ligne (OLAP) des documents. PhD thesis, University of Toulouse III, France.
  25. Tseng, F. S. C., Chou, A. Y. H., 2006. The concept of document warehousing for multi-dimensional modeling of textual-based business intelligence. In Decision Support Systems (DSS). Vol. 42. Elsevier, 727-744.
  26. Yoo, C. S., Woo, S. M., Kim, Y. S., 2005. Unification of XML DTD for XML Documents with Similar Structure. In ICCSA'05, Computational Science and its Applications. Singapore, 954-963.
Download


Paper Citation


in Harvard Style

Ben Messaoud I., Feki J. and Zurfluh G. (2014). Galaxy-Gen - A Tool for Building Galaxy Model from XML Documents . In Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2014) ISBN 978-989-758-049-9, pages 84-95. DOI: 10.5220/0005082300840095


in Bibtex Style

@conference{keod14,
author={Ines Ben Messaoud and Jamel Feki and Gilles Zurfluh},
title={Galaxy-Gen - A Tool for Building Galaxy Model from XML Documents},
booktitle={Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2014)},
year={2014},
pages={84-95},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005082300840095},
isbn={978-989-758-049-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2014)
TI - Galaxy-Gen - A Tool for Building Galaxy Model from XML Documents
SN - 978-989-758-049-9
AU - Ben Messaoud I.
AU - Feki J.
AU - Zurfluh G.
PY - 2014
SP - 84
EP - 95
DO - 10.5220/0005082300840095