is crucial for the cluster operation because it
implements a single access point from where the
users submit applications to be executed in nodes.
Often, nodes are personal computers interconnected
with each other through the cluster network. This
network is responsible to allow the communication
among processes running in different nodes.
Differently from what happens in the network
management area, there is no consolidated and
widely accepted notion of what cluster management
exactly means. In this scenario, we try to organize
the current cluster management tools in three broad
and general classes: cluster monitoring, user task
management, and administrative tools. Since the
functionalities found in these tools may be very
similar, is not rare that some tools implement
functionalities that would fit in more than one class.
Cluster monitoring tools are used to check the
internal status and utilization of the cluster
resources. Ganglia (Massie, 2005), the most spread
monitoring tool, is a distributed system able to
monitor inter-cluster and intra-cluster information.
Another tool is SIMONE (SNMP-based Monitoring
System for Network Computing) (Subramanyan et
al., 2000), that offers node description, processes
information (e.g. CPU time, memory usage), nodes
information (e.g. percentage of CPU usage, memory
usage, network interfaces) and information about
traffic on each node. Basically, SIMONE
implements most of the management information
that Ganglia does.
User task management tools are designed for job
scheduling. Such tools allow the allocation of a
subset of the cluster nodes for the execution of the
tasks of a user. The most spread tool here is Open
PBS (Portable Batch System) (PBS, 2005). The
basic PBS operation is implemented as a FIFO
queue. It offers mechanisms to view and interact
with the execution queue, and uses policies to
control the abusive use of clusters resources by just
one user, harming others users.
Administrative tools automate cluster operation
tasks such as node image replication and parallel
commands. Basically, the main objective of these
tools is to offer scalable operations for clusters.
System Imager (System-Imager, 2005) is a tool of
this group. It offers functionalities for maintenance
of a same operating system in all cluster nodes.
From a network management perspective, all
these cluster tools may not always be identified as
management tools in the strict word mean, but in
cluster terms it is not rare to find such denomination.
A more critical problem is the fact that these tools
can not be integrated with network management
without complex adaptation work. SIMONE is the
unique cited tool that offers facilities for SNMP
integration, however it is focused in the management
of general distributed systems, and does not deal
with specific cluster issues such as nodes allocation
and internal/external network interation.
3 CLUSTER MANAGEMENT
MIBS AND SNMP AGENT
The developed management system handles SNMP
agents inside clusters. The cluster management
information supported for each SNMP agent is
defined in MIBs for clusters. These defined MIBs
relay on the analysis of information required to
manage a cluster and on the analysis of other
existent MIBs. Two already standardized MIBs were
important in this work: MIB-II (Mccloghrie and
Rose, 1991) and HostResources (Grillo and
Waldbusser, 1993). They allow us to use several
data of these MIBs, instead of redefining them on
the clusters MIBs. From MIB-II we borrowed the
system group, which provides general machine
information. The network information is obtained
from the interfaces group. From the HostResources
MIB we borrowed information from three following
groups: (a) hrSystem: provides information related to
the system status; (b) hrStorage: provides
information about disk and memory usage; (c)
hrDevice: provides data about disk partitions.
Complementarily, the specific information
related to clusters, not covered by MIB-II and
HostResources, were defined in two new MIBs: one
referring to nodes management and the other
referring to front-end management. The
functionalities supported in the current version of
these new MIBs include some features of two
management groups (described in the previous
section): monitoring tools and users tasks
management tools.
3.1 Cluster MIBs Description
Figure 1 presents the MIBs designed to manage
cluster nodes and front-ends. As showed in Figure 1,
ClusterNode MIB is composed by four groups:
clAllocation, clCpu, clTemperature and clProcesses.
Node allocation and deallocation are performed
through clAllocation group. The read-and-write
object clNodeAllocUser stores the name of the user
who has allocated a node or the special value “ALL”
if the access to the node is free. After allocation, any
user, except the one that allocated the node, is
unable to login in the node. The node deallocation is
performed by the system administrator, whom must
attribute the value “ALL” in clNodeAllocUser
object. The clNodeAllocDuration should be adjusted
CLUSTER AND NETWORK MANAGEMENT INTEGRATION-An SNMP-based Solution
319