Distributed Data Replication and Access Optimization for LHCb Storage System - A Position Paper

Mikhail Hushchyn, Philippe Charpentier, Andrey Ustyuzhanin

Abstract

This paper presents how machine learning algorithms and methods of statistics can be implemented to data management in hybrid data storage systems. Basicly, two different storage types are used to store data in the hybrid data storage systems. Keeping rarely used data on cheap and slow storages of type one and often used data on fast and expensive storages of type two helps to achieve optimal performance/cost ratio for the system. We use classification algorithms to estimate probability that the data will often used in future. Then, using the risks analysis we define where the data should be stored. We show how to estimate optimal number of replicas of the data using regression algorithms and Hidden Markov Model. Based on the probability, risks and the optimal nuber of data replicas our system finds optimal data distribution in the hybrid data storage system. We present the results of simulation of our method for LHCb hybrid data storage.

References

  1. Lipeng W, Zheng L, Qing C, Feiyi W, Sarp O, Bradley S. (2014) 30th Symposium on Mass Storage Systems and Technologies (MSST): SSD-optimized workload placement with adaptive learning and classification in HPC environments. California. IEEE.
  2. Beermann T., Stewart A., Maettig P. (2014) The International Symposium on Grids and Clouds (ISGC) 2014: A Popularity-Based Prediction and Data Redistribution Tool for ATLAS Distributed Data Management. PoS. p 4.
  3. Hushchyn M., Charpentier P., Ustyuzhanin A. (2015) The 21st International Conference on Computing in High Energy and Nuclear Physics: Disk storage management for LHCb based on Data Popularity estimator. http://cds.cern.ch/record/2022203/files/LHCbPROC-2015-019.pdf
Download


Paper Citation


in Harvard Style

Hushchyn M., Charpentier P. and Ustyuzhanin A. (2015). Distributed Data Replication and Access Optimization for LHCb Storage System - A Position Paper . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015) ISBN 978-989-758-158-8, pages 537-540. DOI: 10.5220/0005647105370540


in Bibtex Style

@conference{kdir15,
author={Mikhail Hushchyn and Philippe Charpentier and Andrey Ustyuzhanin},
title={Distributed Data Replication and Access Optimization for LHCb Storage System - A Position Paper},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)},
year={2015},
pages={537-540},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005647105370540},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)
TI - Distributed Data Replication and Access Optimization for LHCb Storage System - A Position Paper
SN - 978-989-758-158-8
AU - Hushchyn M.
AU - Charpentier P.
AU - Ustyuzhanin A.
PY - 2015
SP - 537
EP - 540
DO - 10.5220/0005647105370540