cluster systems, which run batch jobs. That working
pattern match many scientific computing clusters
which apply queue managers such as Portable Batch
System (PBS), Load Sharing Facility (LSF), or
others, using similar ideas than shown in (Da-Costa,
209).
These techniques can be also applicable to
clusters shared using Grid middlewares (Globus
Toolkit (Globus Alliance, 2009), gLite (EGEE,
2009)), since the job submission is performed using
the same queue managers.
The next section shows the special features of the
working modes of the clusters in the described cases.
Then some approaches to reduce the power
consumption are proposed. Finally it is show a case
analyzing the impact of the raised measures in a real
cluster.
2 CLUSTER BEHAVIOUR
A cluster is composed by, besides the passive
elements (screws, cables, etc.), the following
elements: working nodes, administration nodes,
front-end node, storage systems, internal network
switches, external switches, firewalls, etc. (Lucke,
2004).
The power needed to feed this infrastructure is
great and it is convenient to have a power saving
plan, to minimize the consumption. This plan must
take into account the way that the cluster is used,
and should try to reduce the impact in the throughput
of the applications and the system as a whole.
A common way to use a cluster consist in having
a front-end which is in charge of coordinating the
execution of user’s jobs by a Local Resource
Manager System (LRMS), such as a queue system
(Torque, OpenPBS, LSF, Sun Grid Engine, etc.). In
this case, the users submit jobs to a queue, and they
are executed as they have enough nodes to satisfy
the job’s requirements.
In the last years, a common way of using a
cluster was by sharing it by a Grid Computing
middleware such as Globus Toolkit. The way of
running jobs in the cluster by Grid techniques is
partly integrated with the LRMS. So the usage of the
cluster consists in submitting the jobs to the local
queue. The main difference in this case is that it is
needed an extra information reporter, which must
provide the number of free online nodes (among
other characteristics of the cluster).
A recent manner of managing a cluster consists
in having it as a virtualization platform. Some
examples of Virtual Appliance management
platforms are VMWare, Eucalyptus or Open Nebula.
Using these middlewares, the Virtual images are
started in the form of running virtual machines, in
the internal nodes of the cluster.
A power saving plan must take into account the
way of using the clusters, but also the hardware
issues. The most common way to reduce the power
consumption is to power off the machines, but the
problem can arise when trying to power on the nodes
again. In this sense, there are mainly two
approaches: the usage of Power Distribution Units
(PDU), to manage the power access to the nodes, or
the usage of Wake-on-Lan alternative.
Each of the approaches need that the nodes are
configured in a specific way: using a PDU requires
that the node powers on when power is restored, and
WOL alternative needs that the network card is well
tuned by the operating system.
Regarding any of the alternatives, there is always
a residual consumption associated to the PDU
controlling system or to the network card WOL
monitoring. So it will impossible to achieve the zero
consumption.
3 TECHNIQUES
Different techniques can be used to save energy in
the computer cluster environment. These techniques
are based in powering-off idle nodes, or
alternatively, hibernating them or putting them in
standby mode.
These two last approaches may have problems in
Linux environments since (1) many devices do not
support the standby mode and (2) in many cases
(depending on the memory used in the machine, as
the main factor) hibernating the node and starting up
the machine again implies far more time than
switching-off and on the computer.
On the other hand, the basic criterion for
applying any of these techniques is that the node
must be idle. Furthermore, the process of switching-
on a working node in a cluster is quite fast, as during
the boot process only the minimum applications are
needed to be loaded. So the approach of powering
off an idle node and powering it on when it is
needed is an appropriate solution.
In any case, the aim of using these techniques is
to achieve the maximum power saving, trying to
maintain the response times for the users, and
searching for the easier measures to apply.
In order to achieve those objectives the
following aspects must be considered:
To select the power-on/off block size: If a
group of nodes are idle, they can be switched
INNOV 2010 - International Multi-Conference on Innovative Developments in ICT
40