The Parameter Optimization in Multiple Layered Deduplication System

Mikito Ogata, Norihisa Komoda



This paper proposes a multiple layered deduplication system for backup operation in IT environment. The proposed system reduces the duplication in data by using a series of algorithms which are installed with different chunk sizes in descendent order. Our research defines the models and formula for the cumulative deduplication rate and processing time over multiple layers of the system, then, points out the efficiency is heavily affected by how to assign the chunk sizes in each layer in order to achieve the optimal assignment. Finally, the efficiency of the proposal is compared to a conventional single layer deduplication system to assure the improvement.


  1. A. Muthitacharoen, B. Chen, and D. Mazières (2001). A low-bandwidth network file system. In Proceeding of SIGOPS. 18th Symposium on Operating Systems Principles., Banff, Canada.
  2. B. Zhu, K. Li, and H. Patterson (2008). Avoiding the disk bottleneck in the data domain deduplication file systeme. In FASTf08: Proceedings of the 6th USENIX Conference on File and Storage Technologies, Berkley, CA, USA. USENIX Association.
  3. C. Dubnicki, C.Grayz, et al. (2009). Hydrastor: a scalable secondary storage. In FAST 7809, 7th USENIX Conference on File and Storage Technologies.
  4. C. Liu, Y. Lu, C. Shi, G. Lu, D. Lu, and D. Wang (2008). Admad: Application-driven metadata aware de-duplication archival storage system. In Fifth IEEE International Workshop on Storage Network Architecture and Parallel I/Os, 2008., SNAPI'08.
  5. D. Meister and A. Brinkmann (May 2009). Multi-level comparison of data deduplication in a backup scenario. In Proceedings of SYSTOR 2009, The 2nd Annual International Systems and Storage Conference. ACM.
  6. EMC Corporation (2010). EMC Data Domain Boost Software.
  7. G. Wallace, F. Douglis, H. Qian, P. Shilane, S. Smaldone, M. Chamness, and W. Hsu (2012). Characteristics of backup workloads in production systems. In Proceedings of the 10th USENIX Conference on File and Storage Technologies.
  8. J. Burrows and D. O. C. W. DC (Apritl 1995). Secure hash standard.
  9. M. O. Rabin (1981). Fingerprinting by random polynomials. Technical report, Department of Computer Science, Harvard University.
  10. N. Park and D J. Lilj (2010). Characterizing datasets for data deduplication in backup applications. In Workload Characterization (IISWC), 2010 IEEE International Symposium.
  11. Quantum Corporation (2009). Data deduplication background: A technical white paper.
  12. R. Rivest (1992). The MD5 Message Digest Algorithm, RFC 1321.
  13. U. Manber (1994). Finding similar files in a large file system. In Proceedings of the USENIX Winter 1994 Technical Conference.
  14. Y. Tan et al. (2010). Dam: A data ownership-aware multilayered de-duplication scheme. In 2010 Fifth IEEE International Conference on Networking, Architecture and Storage. IDC-Japan.
  15. Y. Won, J. Ban, J. Min, L. Hur, S. Oh, and J. Lee (Sept. 2008). Efficient index lookup for de-duplication backup system. In Modeling, Analysis and Simulation of Computers and Telecommunication Systems, 2008. MASCOTS 2008. IEEE International Symposium on (Poster Presentation).

Paper Citation

in Harvard Style

Ogata M. and Komoda N. (2013). The Parameter Optimization in Multiple Layered Deduplication System . In Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-8565-60-0, pages 143-150. DOI: 10.5220/0004423601430150

in Bibtex Style

author={Mikito Ogata and Norihisa Komoda},
title={The Parameter Optimization in Multiple Layered Deduplication System},
booktitle={Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},

in EndNote Style

JO - Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - The Parameter Optimization in Multiple Layered Deduplication System
SN - 978-989-8565-60-0
AU - Ogata M.
AU - Komoda N.
PY - 2013
SP - 143
EP - 150
DO - 10.5220/0004423601430150