its pseudo polynomial number of variables. It has
already been shown as an efficient formulation
compared to other integer programming
formulations. We use the commercial solver CPLEX
to find the optimal solution for small and medium
size of instances. We give community a boundary to
reference with and to evaluate their scheduling
algorithms for this size of instances. It turns out that
the offline problem is interesting in it self and can be
used to design good online strategies. Solution for
this model would be a reference for the on-line
schedules in smaller dimension to validate first
result. Future work will deal with the online aspect
concerning the scheduling problem; we plan to
propose a heuristic solution and use this work in the
evaluation.
Online solution considers at first Total completion
time, in a second time we take into account the
resources consumption (energy) in a multi-criteria
scheduling aspect.
The final solution will be implemented over Hadoop
simulation system and evaluated in a large
scalability face to default scheduler in Hadoop.
ACKNOWLEDGMENTS
This work was sponsored in part by the CYRES
GROUP in France and French National Research
Agency under the grant CIFRE n°2012/1403.
REFERENCES
Aws. 2014. Instances-types. Retrieved from Aws:
http://aws.amazon.com/fr/ec2/instance-types/
Dean, J., & Ghemawat, S., 2004. MapReduce: Simplified
Data Processing on Large Clusters. In
Communications of the ACM.
Dyer, M. E., & Wolsey, L. A., 1990. Formulating the
single machine sequencing problem with release dates
as a mixed integer program.
Evripidis Bampis, V. C., 2014. Energy Efficient
Scheduling of MapReduce Jobs. In 20th International
Conference.
Fotakis, D., Milis, I., & Zampetakis, E., 2014. Scheduling
MapReduce Jobs on Unrelated Processors. In the
Workshop Proceedings of the EDBT/ICDT.
Gupta, S., Fritz, C., Price, R., Hoover, R., de Kleer, J., &
Witteveen, C., 2013. Throughput Scheduler: learning
to schedule on heterogeneous Hadoop clusters. In
(ICAC '13), International Conference on Autonomic
Computing.
Hadoop Project, 2005. (A. foundation, Producer)
Retrieved from http://hadoop.apache.org/
Kodialam, M. S., Lakshman, T., Mukherjee, S., Chanwg,
H., & Lee, M. J., 2012. Scheduling in mapreduce like
systems for fast completion time. In Patent
Application Publication.
Lim, N., Majumdar, S., & Ashwood-Smith, P., 2014. A
Constraint Programming-Based Resource
Management Technique for Processing MapReduce
Jobs with SLAs on Clouds.
Lin, M., Zhang, L., Wierman, A., & Tan, J., 2013. Joint
Scheduling of Processing and Shuffle Phases in
MapReduce Systems. In P. o. Conference (Ed.).
Lionel, E.-D., Adrien, L., Patrick, M., Ameur, S., Vincent,
T., & Denis, T., 2013. A Server Consolidation
Problem: Definition and Model. In Proceedings of the
14th conference ROADEF.
Queyranne, M., & Schulz, A., 1997. Polyhedral
Approaches to Machine Scheduling. In Mathematical
Programming.
Schulz, A. S., & Skutella, M., 2002. Scheduling Unrelated
Machines by Randomized Rounding. In SIAM Journal
on Discrete Mathematics.
Sousa, J. P., & Wolsey, L. A., 1992. A time indexed
formulation of non-preemptive single machine
scheduling problems. In Mathematical Programming.
Verma, A., Cherkasova, L., Kumar, V. S., & Campbell, R.
H., 2012. Deadline-based Workload Management for
MapReduce Environments: Pieces of the Performance
Puzzle.
White, T., 2012. Hadoop, The Definitive Guide (3rd
Edition ed.). O'REILLY. 3
rd
edition.
Zhou, W., Han, J., Zhang, Z., & Dai, J., 2012. Dynamic
Random Access for Hadoop Distributed File System.
In (ICDCSW), Distributed Computing Systems
Workshops.
OfflineSchedulingofMapandReduceTasksonHadoopSystems
185