is used to encrypt the communication, and mutual au-
thentication is based on X.509 certificates. This pro-
cedure guarantees the integrity and the security of the
information from its source to its destination.
After being securely transferred to a central server,
operating system logs coming from multiple comput-
ers can be sorted and classified based on their content
and various parameters, such as source host, appli-
cation and priority. Directories, files, and database
tables can be created dynamically using macros. In
addition, complex filtering using regular expressions
and boolean operators offers almost unlimited flexi-
bility to forward only the important log messages to
the selected destinations.
These tools allow operators to transform the data
contained in the system logs in easily readable infor-
mation. Once this information is properly classified,
trivial messages will be discarded and any important
message will rapidly pop up. Operators can spot any
current situation and anticipate future problems, espe-
cially those related to the cluster overall performance
and the security and integrity of the scientific data
stored.
3.2 Reports, Graphs and Trends
Operating system logs provide a valuable source of
information about programs, applications and dae-
mons. Nevertheless, they provide very little informa-
tion about the system hardware, not to mention other
devices such as routers and switches. In a complex
computing cluster, another tool has to be used to mon-
itor local hardware and network devices and thereby
guarantee a proper use of the computing resources.
In addition to the information provided by op-
erating systems logs, specific information will be
recorded:
• Hardware health, e.g. CPUs, mass storage devices
and memory.
• Specific software, e.g. communication services
and scientific applications.
• Network devices, e.g. routers, switches, band-
width and latency.
In order to track these parameters, NEMO uses
Cacti (Berry, 2007), a graphing program for network
statistics. Cacti provides a fast poller, advanced graph
templating, and multiple data acquisition methods.
This is a system to store and display time-series data
such as network bandwidth, machine-room tempera-
ture, and server load average, which is a perfect com-
plement to the centralized and classified system logs
and together provide a complete image of the system.
Cacti provides both another way to retrieve informa-
tion from the computing nodes and a way to present
this information.
3.2.1 Retrieving the Information
Data is retrieved via Simple Network Management
Protocol (SNMP) (Harrington, 2004) or external shell
scripts. Monitored systems (also called Slaves), ex-
ecute locally a software component called an agent,
which reports information via SNMP to the moni-
toring systems (also called Masters). SNMP agents
expose management data on the monitored systems
as variables (such as “free memory”, “system name”,
“number of running processes”, “number of users”).
SNMP also permits active management tasks, such as
modifying and applying a new configuration.
The monitoring system can retrieve the informa-
tion through several protocol operations or the agent
(installed on the monitored system) will send data
without being asked. Monitored systems can also
send configuration updates or controlling requests to
actively manage a system. The variables accessible
via SNMP are organized in hierarchies to simplify
management.
3.2.2 Presenting the Information
In a complex network, made of hundreds or even
thousands of different devices, all the collected in-
formation is useful only if it is presented properly.
Therefore, the information provided by the SNMP
agent is displayed by Cacti using graphs. These
graphs allows the operator to quickly check the over-
all status of the cluster, and its short-time and long-
time evolution, and therefore any malfunction can be
easily spotted. Threshold alerts can be set up to au-
tomatically identify any anomaly, e.g. free memory
below a previously defined limit, and notify the oper-
ator immediately.
Presenting the information is the last but not the
least step of monitoring after retrieving and storing
it. In a similar fashion as system logs are submit-
ted to intense classification in the centralised mon-
itoring node, all other information retrieved from
the managed nodes is processed to be shown using
graphs. This graphs allow to better identify current
performance and easily compare it with historic data.
NEMO makes use of Cacti trending capabilities to es-
timate possible future evaluation of performance.
3.3 Intrusion Detection
As it was shown in section 2.1, previous develop-
ments were mainly focused on host and network secu-
DCNET 2010 - International Conference on Data Communication Networking
64