Data Cube Computational Model with Hadoop MapReduce

Bo Wang, Hao Gui, Mark Roantree, Martin F. O'Connor

2014

Abstract

XML has become a widely used and well structured data format for digital document handling and message transmission. To find useful knowledge in XML data, data warehouse and OLAP applications aimed at providing supports for decision making should be developed. Apache Hadoop is an open source cloud computing framework that provides a distributed file system for large scale data processing. In this paper, we discuss an XML data cube model which offers us the complete views to observe XML data, and present a basic algorithm to implement its building process on Hadoop. To improve the efficiency, an optimized algorithm more suitable for this kind of XML data is also proposed. The experimental results given in the paper prove the effectiveness of our optimization strategies.

References

  1. Dede, E., Fadika, Z., Gupta, C., and Govindaraju, M. (2011). Scalable and distributed processing of scientific xml data. In Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing, pages 121-128. IEEE Computer Society.
  2. Dutta, H., Kamil, A., Pooleery, M., Sethumadhavan, S., and Demme, J. (2011). Distributed storage of largescale multidimensional electroencephalogram data using hadoop and hbase. In Grid and Cloud Database Management, pages 331-347. Springer.
  3. Gui, H. and Roantree, M. (2012). A data cube model for analysis of high volumes of ambient data. Procedia Computer Science, 10:94-101.
  4. Gui, H. and Roantree, M. (2013). Using a pipeline approach to build data cube for large xml data streams. In Database Systems for Advanced Applications, pages 59-73. Springer.
  5. Khatchadourian, S., Consens, M. P., and Siméon, J. (2011). Having a chuql at xml on the cloud. In AMW.
  6. Lin, J. and Schatz, M. (2010). Design patterns for efficient graph algorithms in mapreduce. In Proceedings of the Eighth Workshop on Mining and Learning with Graphs, pages 78-85. ACM.
  7. Rusu, L. I., Rahayu, W., and Taniar, D. (2009). Partitioning methods for multi-version xml data warehouses. Distributed and Parallel Databases, 25(1-2):47-69.
Download


Paper Citation


in Harvard Style

Wang B., Gui H., Roantree M. and O'Connor M. (2014). Data Cube Computational Model with Hadoop MapReduce . In Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-758-023-9, pages 193-199. DOI: 10.5220/0004935001930199


in Bibtex Style

@conference{webist14,
author={Bo Wang and Hao Gui and Mark Roantree and Martin F. O'Connor},
title={Data Cube Computational Model with Hadoop MapReduce},
booktitle={Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2014},
pages={193-199},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004935001930199},
isbn={978-989-758-023-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - Data Cube Computational Model with Hadoop MapReduce
SN - 978-989-758-023-9
AU - Wang B.
AU - Gui H.
AU - Roantree M.
AU - O'Connor M.
PY - 2014
SP - 193
EP - 199
DO - 10.5220/0004935001930199