Multi-layer Method for Data Center Cooling Control and System
Integration
Winston Garcia-Gabin and Xiaojing Zhang
ABB, Corporate Research, SE72178, V¨aster˚as, Sweden
Keywords:
Data Center, Energy Efficiency, Cooling Control, System Integration.
Abstract:
This paper describes an integral management hierarchical structure for data center server room ventilation and
IT equipment cooling based on multi-layer approach. The approach contains two hierarchical layers, the lower
layer is responsible to assure the right cooling power of each server-rack avoiding overcooling and waste of
energy. The upper layer is responsible to perform the tasks of optimization, supervision and coordination of
all subsystems in the lower layer for holistic control. The approach aim is to make data centers more energy
efficient, here we focus on proposing a general approach to improve the efficiency of their thermal cooling
systems.
1 INTRODUCTION
Due to rapid growth of large data centers worldwide,
data centers become energy intensive processes ac-
counting for over 1% of the worlds electricity usage
(Koomey, 2011). Energy efficiency becomes even
more important for these data centers. IT equip-
ment and cooling infrastructure are the two major
power consumers. Investigation showed that energy
consumption by cooling data center IT equipment is
between 30% and 55% of data center total opera-
tion energy consumption (Song et al., 2015). Sev-
eral data center cooling and cooling control have been
investigated. Data center Power Usage Effectiveness
(PUE) were well defined (Patterson, 2012). PUE is
one of most used data center matrices. The distribu-
tion of airflow and resulting cooling in a data center
was studied with using computational fluid dynamics
(CFD) to find out various factors affecting the airflow
(Patankar, 2010). Cooling control for data center op-
eration with higher ambient temperature was studied
by Ahuja et al. (Ahuja et al., 2011). A platform-
assisted thermal management approach was applied
to use new sensors providing server airflow and server
outlet temperature to improve control of the data cen-
ters cooling solution. An online control algorithm of
data center power supply under uncertain demand and
renewable energy was developed by applying two-
stage Lyapunov optimization techniques to make on-
line decisions on fully utilizing renewable energy and
two-timescale power purchasing and uninterruptible
power source (UPS) charging/discharging in a com-
plementary manner to minimize operation cost (Deng
et al., 2013). A comprehensive model of the data
center cooling power consumption was developed by
Zhang, et al. (Zhang et al., 2016). In this model,
power consumptions of servers, racks, CRAH, chiller,
cooling tower, UPS and PDU were modeled to cover
data center design and operation scenarios. The mod-
els was intended to be used for the data center cool-
ing simulation and control. Optimal fan speed control
for data center servers was investigated by Wang, et
al. (Wang et al., 2009). A multi-input multi-output
fan controller that utilizes thermal models was devel-
oped from first-principles to manipulate the operation
of fans. The controller tunes the speeds of individ-
ual fans proactively based on prediction of the sever
temperatures. Controlling the temperature rise dur-
ing the power failure is crucial issue for data center
availability. In order to analyze the temperature rise
characteristics, a real-time transient thermal model to
demonstrate the heating of a data center following the
loss of utility power was developed by Lin et al. (Lin
et al., 2014). Strategies of placing critical cooling
equipment on backup power, maintaining adequate
reserve cooling capacity, and employing thermal stor-
age, are provided and checked whether they can han-
dle the power outages in a predictable means using
this model. Control strategies were recommended
to achieve the desired temperature control during the
power outages. A three-level data center control ar-
chitecture with server level, zone level and data cen-
562
Garcia-Gabin, W. and Zhang, X.
Multi-layer Method for Data Center Cooling Control and System Integration.
DOI: 10.5220/0006424405620567
In Proceedings of the 14th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2017) - Volume 2, pages 562-567
ISBN: Not Available
Copyright © 2017 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
ter level was proposed by Parolini (Parolini, 2012).
At the data center level, a single controller directs
the lower level controllers and the set points for the
computer room air conditioning (CRAC) or computer
room air handler (CRAH) units are defined, it defines
the values of the reference temperature vector at every
time. At the zone level, multiple controllers operate
and manage those controllers and systems related to
zones. A framework integrating online learning and
optimal decision is proposed to make real-time con-
trol of the airflow and temperature distribution (Xiong
and Zhang, 2016). By online learning, the model is
updated in run time according to newly arrived sam-
ples. At the same time, control actions are decided
based on the current process model to figure out the
best values of control variables to optimize the con-
trol performance. Data center thermal management
with flexible humidity control was studied by Bere-
zovskaya et al. (Berezovskaya et al., 2016). Com-
bined cooling and humidity control strategy showed
significant energy consumption reduction. Cyber-
physical design of data center cooling systems was
investigated by Mousavi et al. (Mousavi et al., 2015).
Distributed adaptive automation architecture was de-
signed to improve energy efficiency, flexibility, bet-
ter decision-making ability and controlling the cool-
ing systems In the conventional data center with hot
aisle/cold aisle, cold air generated by the cooling sys-
tem is supplied through a plenum under the floor and
perforated air flow panels. The cold air flows up hori-
zontally entering server rack from front side and leav-
ing from back side. The classical way to control it is
to consider all server racks as an entire cooling load
and the cooling system control this load as a whole.
This paper proposes an integrated approach, where
individual control for each server is performance in
coordination with a global control of the overall cool-
ing equipment of the data center. The paper has the
following structure: This section has presented the
introduction and relevant previous works. Section 2
describes the multi-layer structure for decisions and
control. Section 3 presents functions of the proposed
multi-layer methodology for data center cooling con-
trol. Finally, conclusions are summarized in section
4.
2 MULTI-LAYER STRUCTURE
FOR DECISIONS AND
CONTROL
In order to obtain the holistic control of a data center
power, IT and cooling system, it is proposed a hier-
archical structure for decisions and control, where the
information about the process has a different accuracy
and complexity, and the amount of exchanged infor-
mation and its update frequency changes according
to the layer. At the upper layer the control strategy
manages a big amount of complex information. In
the upper layer each subsystems have different fre-
quency updates, for example, optimization and CFD
subsystems may require hours, a failure detected in an
equipment can be monthly or bimonthly, but others
subsystems are able to exchange information faster.
On the other hand, at the lowest level the control
strategy is based on the principle of increasing accu-
racy and decreasing intelligence and complexity, but
its frequency increases to the range required for the
equipment, for example, seconds for cooling units,
milliseconds for power devices. The way to achieve
the control implementation is to perform the control
of each server-rack using local control devices in the
lower control layer and data center coordination in the
upper control layer. This model establishes the differ-
ent functional layers used for controlling a data center.
Figure 1 shows the control scheme.
Figure 1: Multi-layer method for server-rack cooling con-
trol.
The method proposes system integration, since
this ensures the control of the processes in their nat-
ural form. Then, coordination in the upper layer is
achieved through the reception of information about
the states of the processes (ascending path to the up-
per layer) and the transmission of commands to the
processes (descending path to the lower layer). The
exchange of information is performed between dif-
ferent actors in the upper layer and local controllers
in the lower layer. The existence of communication
devices and computer networks that ensures commu-
nication among the control equipment is a fundamen-
tal factor to achieve integration. The lower layer is
responsible for the local continuous control of each
server rack. Each one is controlled by local control
schemes, and they represent the direct controllers over
the controlled variables. In upper layer the tasks of
optimization, supervision and coordination for server
racks to achieve a coherent operation among data cen-
Multi-layer Method for Data Center Cooling Control and System Integration
563
ter are accomplished. The operation of server rack
and its direct controller is described as the opera-
tion regions, in which the parameters and rates stay
constant for an undefined time until there appears a
change of the operation state. The upper layer de-
termines the operation region for the lower layer and
generates the parameters for the direct control level.
Signals coming into this layer allow determination of
the state of the process units as well as the production
objectives that have been fixed in the superior layer.
At this level, local supervisors for each subsystem are
defined. They determine the operation mode of each
unit, and the changes of the parameters or local rules
in the lower layer. With a superior hierarchy, a global
coordinator is responsible to fix the directives for each
one of the local cooling controllers, as a function of
the joint operation mode of the entire data center.
2.1 Layer Functionalities
This approach provides a method for data center cool-
ing control. The method includes at least two cool-
ing control layers, a lower layer and an upper layer.
In the lower layer, see figure 2, electrical power de-
mand of each server or IT load is used as feedforward
signal for estimating the cooling power required for
each rack. Basic temperature sensor network and air
flow sensors are required to measure each server rack.
Humidity sensors do not need to be installed at each
server rack, but with one or two in each server row.
Information about temperatures and air flow of each
server rack are used as feedback signal to achieve the
right dose of power cooling on the server. In the low
layer a network of humidity, temperature and airflow
sensors are used for controlling the humidity, temper-
ature and air flow distribution on the room. Tempera-
ture of server rack, outlet temperature or inlet temper-
ature and air flow can be used as feedback signal.
Figure 2: Lower layer.
The upper cooling control layer is responsible for
supervising and coordinating local controller in the
lower layer and optimizing the total cooling power
and the distribution of cooling power required for
each server rack. Figure 3 shows the diagram of a
coordinator, in this case, it gets information about
cooling power of each server rack, sensor network
and cooling equipment status. Using this informa-
tion the coordinator manage all data and send to the
lower layer parameters and configurations for each
single cooling equipment and server-rack controllers.
A similar scheme can be used to coordinate the power
devices e.g. uninterruptible power source (UPS),
power distribution unit (PDU) and batteries. Coordi-
nators manage the communication to the lower layer
at the frequency required for the lower layer units, but
also, can get updated information for other subsys-
tems of the upper layer at low frequency of the range
of hours e.g CFD subsystem or optimization subsys-
tem.
Figure 3: Coordinator on the upper layer.
In this upper layer tools are used, which manage
a big amount of complex data with low update fre-
quency for achieving the optimal distribution of cool-
ing power, which means air flow, temperature and hu-
midity distribution and patterns and generate param-
eters for the lower layer cooling control. The upper
layer can be used or combined many different tools,
including, but not limited to CFD simulations, ma-
chine learning, artificial intelligence and so on which
are all less time critical compared to the operations
in the lower layer. These tools optimize the global
demand of cooling power of the data center taking
into account complex data e.g. actual operating con-
ditions and future expectations of IT load, electric-
ity price, power availability, and data center operat-
ing conditions related to electrical and cooling power
consumption. The upper layer provides to the lower
layer, operational information for the cooling units,
for example, CRAC/CRAH fan speed, air temperature
and humidity references for the equipment. A general
diagram of the upper layer is illustrated in figure 4. It
shows how subsystems exchange information in the
upper layer through a common communication infras-
tructure. For example, if the analytic subsystem pre-
dicts an imminent failure on a cooling equipment, this
information can be sent to the coordinator of cooling
ICINCO 2017 - 14th International Conference on Informatics in Control, Automation and Robotics
564
units. This coordinator turns off the corresponding
cooling equipment and sends a new set of parameters
for the other server-rack cooling controllers and cool-
ing equipment. Coordinators exchange information
with the lower layer at the frequency required for the
lower layer, but it is based on information obtained
from other subsystem which can exchange informa-
tion at low frequency, hours in the case of CFD simu-
lations and electricity price changes or monthly when
a failure is detected.
Figure 4: General diagram of the upper layer.
The upper layer provides also recommendations
for operational information for IT load, for exam-
ple, the servers that must be turn off according to the
probability of being used in the near future, placing
and moving IT loads into proper locations can make
the cooling infrastructure operate more efficiently and
hence can result in substantial reduction in cooling
power. This methodology aims a holistic manage-
ment of the data center for avoiding waste of energy
due to for example overcooling.
3 FUNCTION DEFINITION AND
METHOD DISCUSSION
A number of functions have been defined in the upper
layer of the developed method include the following,
CFD generated data for cooling control. Param-
eterized CFD simulation tool is required to be
able to simulate a data center air flow as well as
temperature, humidity distributions and patterns
in order to generate parameters for cooling con-
trol. This CFD tool is also able to simulate air flow
pattern and temperature distribution in air plenum
and identify correlations among each cooling unit
and individual server rack with a degree of in-
fluence, see Figure 5. Since CFD will take a
long time to generate results, the CFD simulations
should be pre-run in order to be used for real time
control.
Use of operation historical data. Based on pre-
stored operation historical data and the above
described CFD simulated data, offline machine
learning, deep learning, fuzzy logic, and neural
network modeling will provide sufficient data and
information for advanced control and can even
be used for failure detection. (Machine learning
itself is limited by operation parameter ranges,
for example, rack supply air temperature is nor-
mally between 18 to 27C, however, CFD simu-
lated data are not limited in such a range, there-
fore, combined CFD and machine learning will
provide complete information for cooing control).
Ventilation and air flow control method. To reach
data center rack-level cooling control, cooling air
needs to be adjusted or controlled to individual
server rack. In this approach the distributed air
flow control method is extended to include under
floor (air plenum) adjustable air separation and
adjustable ventilation shutters in server racks. The
design of the adjustable air separation is based on
CFD simulation for a specific data center layout.
Even in cases where many cables are laid in the air
plenum, the air separation can be installed. For
each blade server, one adjustable shutter will be
installed.
Power consumption data of power devices. Each
power devices, e.g., UPS, PDU, and battery are
monitored with online power consumption.
IT load and load shifting strategies for cooling
control. The upper layer can simulate load shift-
ing strategies while all customer service level
agreements are fulfilled. Furthermore, the system
can simulate the arrival of new IT loads corre-
sponding to IT load predictions which are learnt
from historical data. Both, the arrival of new IT
jobs and the analysis of possible IT load migra-
tions can support the cooling control of the whole
data center.
Climate data and electrical price. These data will
be used for upper layer to decide which cooling
strategies to apply, free air cooling or mechanic
cooling. The upper layer will take care of energy
source optimization based on electrical price and
other available energy sources.
Provided main cooling parameters will be as fol-
lowing,
Supply cooling air temperature in rack-level,
Air flow flowrate in rack-level,
Humidity in row or rack level,
Multi-layer Method for Data Center Cooling Control and System Integration
565
Flow control through adjust the air separation an-
gle, close/open or through ventilation ducks,
CRAC/CRAH fan speed, air temperature,
Cooling water flow and temperature in mechani-
cal cooling system.
Figure 5: CFD simulation of cooling air flow patterns from
CRAC/CRAH.
Modern data centers are equipped with Data Cen-
ter Infrastructure Management (DCIM), for example,
ABB Decathlon for data center. DCIM is a software
platform to manage data center IT facilities, cool-
ing system and power supply system with a number
of functions including process monitoring and con-
trol. The methodology developed in this work can
be implemented to DCIM to solve the problem of hot
spots and over cooling, which are common problems
in data centers. This approach can be used for data
center zone-level, rack-level and server-level cooling
control. The principal advantage of the proposed ap-
proach is the capacity of holistic control, where de-
cisions are taken based on integrated information of
different subsystems, for example power devices, IT
load, deep learning subsystems, Computational fluid
dynamics simulations, optimization routines, big data
algorithms. All this abstract information in the upper
layer is translated to the server-rack cooling controller
in the lower layer, through a coordinator of the cool-
ing units, which is responsible for providing the pa-
rameters for the server-rack cooling controller. Thus,
the right dosage of cooling power suits the particular
needs of each server-rack. This is another benefit of
this methodology as opposed to the classical control
way, which is to consider all server racks as an en-
tire cooling load and the control scheme manages the
entire load as a whole.
4 CONCLUSIONS
This paper proposed a cross layer methodology for
facing the global cooling control of data center. The
multi-layer approach includes two layers. A lower
layer which is responsible of the cooling control at
rack and the server level. It uses the electrical power
consumption of the servers and information provided
by the sensor network e.g. inlet and outlet temper-
ature, air flow. And an upper layer which incorpo-
rates Computational fluid dynamics simulations, ma-
chine learning, and artificial intelligence (all upfront
to the real operation of the data center) to provide
control parameters and configurations to the lower
layer. The upper layer integrated management of all
the variables that have influence on the cooling per-
formance of the data center air flow, temperature, and
humidity and provides recommendations for turning
off servers, placing, and moving the IT load according
to thermal distribution of the data center and customer
service level agreements. In the future, we will im-
plement developed method and structure to ABB De-
cathlon software platform to extend Decathlon func-
tionalities and capacity.
REFERENCES
Ahuja, N., Rego, C., Ahuja, S., Warner, M., and Docca,
A. (2011). Data center efficiency with higher ambi-
ent temperatures and optimized cooling control. In
Semiconductor Thermal Measurement and Manage-
ment Symposium (SEMI-THERM), 2011 27th Annual
IEEE, pages 105–109.
Berezovskaya, Y., Mousavi, A., Vyatkin, V., Zhang, X., and
Minde, T. B. (2016). Improvement of energy effi-
ciency in data centers via flexible humidity control.
In Industrial Electronics Society, IECON 2016-42nd
Annual Conference of the IEEE, pages 5585–5590.
Deng, W., Liu, F., Jin, H., and Liao, X. (2013). Online
control of datacenter power supply under uncertain
demand and renewable energy. In Communications
(ICC), 2013 IEEE International Conference on, pages
4228–4232.
Koomey, J. (2011). Growth in data center electricity use
2005 to 2010. A report by Analytical Press, completed
at the request of The New York Times, 9.
Lin, M., Shao, S., Zhang, X. S., VanGilder, J. W., Avelar, V.,
and Hu, X. (2014). Strategies for data center temper-
ature control during a cooling system outage. Energy
and Buildings, 73:146–152.
Mousavi, A., Vyatkin, V., Berezovskaya, Y., and Zhang, X.
(2015). Cyber-physical design of data centers cooling
systems automation. In Trustcom/BigDataSE/ISPA,
2015 IEEE, volume 3, pages 254–260.
Parolini, L. (2012). Models and control strategies for data
center energy efficiency. PhD thesis, Carnegie Mellon
University Pittsburgh, PA.
Patankar, S. V. (2010). Airflow and cooling in a data center.
Journal of Heat Transfer, 132(7):1–17.
Patterson, M. K. (2012). Energy efficiency metrics. In
Joshi, Y. and Kumar, P., editors, Energy Efficient Ther-
ICINCO 2017 - 14th International Conference on Informatics in Control, Automation and Robotics
566
mal Management of Data Centers, pages 237–271.
Springer.
Song, Z., Zhang, X., and Eriksson, C. (2015). Data center
energy and cost saving evaluation. Energy Procedia,
75:1255–1260.
Wang, Z., Bash, C., Tolia, N., Marwah, M., Zhu, X., and
Ranganathan, P. (2009). Optimal fan speed control
for thermal management of servers. In ASME 2009 In-
terPACK Conference collocated with the ASME 2009
Summer Heat Transfer Conference and the ASME
2009 3rd International Conference on Energy Sustain-
ability, pages 709–719.
Xiong, N. and Zhang, X. (2016). Towards a framework for
online modeling and optimization of airflow and tem-
perature distribution in server rooms of data centers.
In Industrial Electronics Society, IECON 2016-42nd
Annual Conference of the IEEE, pages 5597–5602.
Zhang, X., Lindberg, T., Svensson, K., Vyatkin, V., and
Mousavi, A. (2016). Power consumption modeling of
data center it room with distributed air flow. Interna-
tional Journal of Modeling and Optimization, 6(1):33.
Multi-layer Method for Data Center Cooling Control and System Integration
567