A Hadoop Open Source Backup Solution
Heitor Faria, Rodrigo Hagstrom, Marco Reis, Breno G. S. Costa, Edward Ribeiro, Maristela Holanda, Priscila Solis Barreto, Aletéia P. F. Araújo
2018
Abstract
Backup is a traditional and critical business service with increasing challenges, such as the snowballing of constantly increasing data. Distributed data-intensive applications, such as Hadoop, can give a false impression that they do not need backup data replicas, but most researchers agree this is still necessary for the majority of its components. A brief survey reveals several disasters that can cause data loss in Hadoop HDFS clusters, and previous studies propose having an entire second Hadoop cluster to host a backup replica. However, this method is much more expensive than using traditional backup software and media, such a tape library, a Network Attached Storage (NAS) or even a Cloud Object Storage. To address these problems, this paper introduces a cheaper and faster Hadoop backup and restore solution. It compares the traditional redundant cluster replica technique with an alternative one that consists of using Hadoop client commands to create multiple streams of data from HDFS files to Bacula – the most popular open source backup software and that can receive information from named pipes (FIFO). The new mechanism is roughly 51% faster and consumed 75% less backup storage when compared with the previous solutions.
DownloadPaper Citation
in Harvard Style
Faria H., Hagstrom R., Reis M., G. S. Costa B., Ribeiro E., Holanda M., Barreto P. and Araújo A. (2018). A Hadoop Open Source Backup Solution.In Proceedings of the 8th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-758-295-0, pages 651-657. DOI: 10.5220/0006809206510657
in Bibtex Style
@conference{closer18,
author={Heitor Faria and Rodrigo Hagstrom and Marco Reis and Breno G. S. Costa and Edward Ribeiro and Maristela Holanda and Priscila Solis Barreto and Aletéia P. F. Araújo},
title={A Hadoop Open Source Backup Solution},
booktitle={Proceedings of the 8th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2018},
pages={651-657},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006809206510657},
isbn={978-989-758-295-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 8th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - A Hadoop Open Source Backup Solution
SN - 978-989-758-295-0
AU - Faria H.
AU - Hagstrom R.
AU - Reis M.
AU - G. S. Costa B.
AU - Ribeiro E.
AU - Holanda M.
AU - Barreto P.
AU - Araújo A.
PY - 2018
SP - 651
EP - 657
DO - 10.5220/0006809206510657