Authors:
Aymen Jlassi
1
;
Patrick Martineau
2
and
Vincent Tkindt
2
Affiliations:
1
Cyres Group and University François-Rabelais of Tours, France
;
2
University François-Rabelais of Tours, France
Keyword(s):
Big Data, MapReduce Model, Hadoop Scheduling Problem, Time Indexed Formulation.
Related
Ontology
Subjects/Areas/Topics:
Cloud Computing
;
Cloud Computing Architecture
;
Cloud Computing Enabling Technology
;
Fundamentals
;
Monitoring of Services, Quality of Service, Service Level Agreements
;
Performance Development and Management
Abstract:
MapReduce is a model to manage quantities massive of data. It is based on the distributed and parallel
execution of tasks over the cluster of machines. Hadoop is an implementation of MapReduce model, it is
used to offer BigData services on the cloud. In this paper, we expose the scheduling problem on Hadoop
systems. We focus on the offline-scheduling, expose the problem in a mathematic model and use the timeindexed
formulation. We aim consider the maximum of constraints of the MapReduce environment.
Solutions for the presented model would be a reference for the on-line Schedules in the case of low and
medium instances. Our work is useful in term of the problem definition: constraints are based on
observations and take into account resources consumption, data locality, heterogeneous machines and
workflow management; this paper defines boundaries references to evaluate the online model.