Authors:
Matthew Forshaw
1
;
A. Stephen McGough
2
and
Nigel Thomas
1
Affiliations:
1
Newcastle University, United Kingdom
;
2
Durham University, United Kingdom
Keyword(s):
Energy Efficiency, Checkpointing, Migration, Fault Tolerance, Desktop Grids.
Related
Ontology
Subjects/Areas/Topics:
Algorithms for Reduced Power, Energy and Heat
;
Energy and Economy
;
Energy-Aware Systems and Technologies
;
Green Communications Architectures and Frameworks
;
Green Computing Models, Methodologies and Paradigms
;
Green Data Centers
;
Sustainable Computing and Communications
;
Virtualization for Reducing Power Consumption
Abstract:
Checkpointing is a fault-tolerance mechanism commonly used in High Throughput Computing (HTC) environments
to allow the execution of long-running computational tasks on compute resources subject to hardware
and software failures and interruptions from resource owners. With increasing scrutiny of the energy consumption
of IT infrastructures, it is important to understand the impact of checkpointing on the energy consumption
of HTC environments. In this paper we demonstrate through trace-driven simulation on real-world datasets
that existing checkpointing strategies are inadequate at maintaining an acceptable level of energy consumption
whilst reducing the makespan of tasks. Furthermore, we identify factors important in deciding whether to
employ checkpointing within an HTC environment, and propose novel strategies to curtail the energy consumption
of checkpointing approaches.