Offline Scheduling of Map and Reduce Tasks on Hadoop Systems

Aymen Jlassi, Patrick Martineau, Vincent Tkindt

Abstract

MapReduce is a model to manage quantities massive of data. It is based on the distributed and parallel execution of tasks over the cluster of machines. Hadoop is an implementation of MapReduce model, it is used to offer BigData services on the cloud. In this paper, we expose the scheduling problem on Hadoop systems. We focus on the offline-scheduling, expose the problem in a mathematic model and use the timeindexed formulation. We aim consider the maximum of constraints of the MapReduce environment. Solutions for the presented model would be a reference for the on-line Schedules in the case of low and medium instances. Our work is useful in term of the problem definition: constraints are based on observations and take into account resources consumption, data locality, heterogeneous machines and workflow management; this paper defines boundaries references to evaluate the online model.

References

  1. Aws. 2014. Instances-types. Retrieved from Aws: http://aws.amazon.com/fr/ec2/instance-types/
  2. Dean, J., & Ghemawat, S., 2004. MapReduce: Simplified Data Processing on Large Clusters. In Communications of the ACM.
  3. Dyer, M. E., & Wolsey, L. A., 1990. Formulating the single machine sequencing problem with release dates as a mixed integer program.
  4. Evripidis Bampis, V. C., 2014. Energy Efficient Scheduling of MapReduce Jobs. In 20th International Conference.
  5. Fotakis, D., Milis, I., & Zampetakis, E., 2014. Scheduling MapReduce Jobs on Unrelated Processors. In the Workshop Proceedings of the EDBT/ICDT.
  6. Gupta, S., Fritz, C., Price, R., Hoover, R., de Kleer, J., & Witteveen, C., 2013. Throughput Scheduler: learning to schedule on heterogeneous Hadoop clusters. In (ICAC 7813), International Conference on Autonomic Computing.
  7. Hadoop Project, 2005. (A. foundation, Producer) Retrieved from http://hadoop.apache.org/
  8. Kodialam, M. S., Lakshman, T., Mukherjee, S., Chanwg, H., & Lee, M. J., 2012. Scheduling in mapreduce like systems for fast completion time. In Patent Application Publication.
  9. Lim, N., Majumdar, S., & Ashwood-Smith, P., 2014. A Constraint Programming-Based Resource Management Technique for Processing MapReduce Jobs with SLAs on Clouds.
  10. Lin, M., Zhang, L., Wierman, A., & Tan, J., 2013. Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems. In P. o. Conference (Ed.).
  11. Lionel, E.-D., Adrien, L., Patrick, M., Ameur, S., Vincent, T., & Denis, T., 2013. A Server Consolidation Problem: Definition and Model. In Proceedings of the 14th conference ROADEF.
  12. Queyranne, M., & Schulz, A., 1997. Polyhedral Approaches to Machine Scheduling. In Mathematical Programming.
  13. Schulz, A. S., & Skutella, M., 2002. Scheduling Unrelated Machines by Randomized Rounding. In SIAM Journal on Discrete Mathematics.
  14. Sousa, J. P., & Wolsey, L. A., 1992. A time indexed formulation of non-preemptive single machine scheduling problems. In Mathematical Programming.
  15. Verma, A., Cherkasova, L., Kumar, V. S., & Campbell, R. H., 2012. Deadline-based Workload Management for MapReduce Environments: Pieces of the Performance Puzzle.
  16. White, T., 2012. Hadoop, The Definitive Guide (3rd Edition ed.). O'REILLY. 3rd edition.
  17. Zhou, W., Han, J., Zhang, Z., & Dai, J., 2012. Dynamic Random Access for Hadoop Distributed File System. In (ICDCSW), Distributed Computing Systems Workshops.
Download


Paper Citation


in Harvard Style

Jlassi A., Martineau P. and Tkindt V. (2015). Offline Scheduling of Map and Reduce Tasks on Hadoop Systems . In Proceedings of the 5th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-758-104-5, pages 178-185. DOI: 10.5220/0005483601780185


in Bibtex Style

@conference{closer15,
author={Aymen Jlassi and Patrick Martineau and Vincent Tkindt},
title={Offline Scheduling of Map and Reduce Tasks on Hadoop Systems},
booktitle={Proceedings of the 5th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2015},
pages={178-185},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005483601780185},
isbn={978-989-758-104-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - Offline Scheduling of Map and Reduce Tasks on Hadoop Systems
SN - 978-989-758-104-5
AU - Jlassi A.
AU - Martineau P.
AU - Tkindt V.
PY - 2015
SP - 178
EP - 185
DO - 10.5220/0005483601780185