the node which served the NFS service the cluster is
reconfiguring as follows:
1. The server which is still running writes in the
Quorum volume which is taking the functions of
the NFS server, then
2. Mounts the NFS volume, then
3. Takes the IP of the other server and
4. Starts the NFS service.
In this mode the Fabrication Cluster is not
aware about the problems from the NFS cluster,
because the NFS file system is further available.
The Fabrication Cluster can be composed by at
least two robot controllers (nodes) – group leader
and a common node. The nodes have resources like:
robot manipulators (with attributes like: collision
detection, current robot position, etc...), serial lines,
Ethernet adapter, variables, programs, NFS file
system. The NFS file system is used to store
programs, log files and status files. The programs
are stored on NFS to make them available to all
controllers, the log files are used to discover the
causes of failure and the status files are used to
know the last state of a controller.
In the event of a node failure, the production
flow is interrupted. In this case, if there is a
connection between the affected node and the group
leader, the leader will be informed and the GL takes
the necessary actions to remove the node from the
cluster. The GL also reconfigures the cluster so the
fabrication process will continue. For example if one
node cluster fails in a three-node cluster, the
operations this node was doing will be reassigned to
one of the remaining nodes.
The communication paths in the multiple-robot
system are: the Ethernet network and the serial
network. The serial network is the last resort for
communication due to the low speed and also to the
fact that it uses a set of Adept controllers to reach
the destination. In this case the ring network will be
down if more than one node will fail.
5 CONCLUSIONS
The high availability solution presented in this paper
is worth to be considered in environments where the
production structure has the possibility to
reconfigure, and where the manufacturing must
assure a continuous production flow at batch level
(job shop flow).
There are also some drawbacks like the need of
an additional NFS cluster. The spatial layout and
configuring of robots must be done such that one
robot will be able to take the functions of another
robot in case of failure. If this involves common
workspaces, programming must be made with much
care using robot synchronizations and monitoring
continuously the current position of the manipulator.
The advantages of the proposed solution are that
the structure provides a high availability robotized
work structure with a insignificant downtime.
The solution is tested on a four-robot assembly
cell located in the Robotics and IA Laboratory of the
University Politehnica of Bucharest. The cell also
includes a CNC milling machine and one Automatic
Storage and Retrieval System, for raw material
feeding and finite products storage.
During the tests the robot network has detected a
number of errors (end-effector collision with parts,
communication errors, power failure, etc.) The GL
has evaluated the particular situation, the network
was reconfigured and the abandoned applications
were restarted in a time between 0.2 and 3 seconds.
The most unfavourable situation is when a robot
manipulator is down; in this case the down time is
greater because the application which was executed
on that controller must be transferred, reconfigured
and restarted on another controller. Also if the
controller still runs properly it will become group
leader to facilitate the job of the previous GL.
In some situations the solution could be
considered as a fault tolerant system due to the fact
that even if a robot controller failed, the production
continued in normal conditions.
REFERENCES
Anton F., D., Borangiu, Th., Tunaru, S., Dogar, A., and S.
Gheorghiu, 2006. Remote Monitoring and Control of a
Robotized Fault Tolerant Workcell, Proc. of the 12
th
IFAC Sympos. on Information Control Problems in
Manufacturing INCOM'06, Elsevier.
Borangiu, Th., Anton F., D., Tunaru, S., and A. Dogar,
2006. A Holonic Fault Tolerant Manufacturing
Platform with Multiple Robots, Proc. of 15
th
Int.
Workshop on Robotics in Alpe-Adria-Danube Region
RAAD 2006.
Lascu, O. et al, 2005. Implementing High Availability
Cluster Multi-Processing (HACMP) Cookbook, IBM
Int. Technical Support Organization, 1
st
Edition.
Harris, N., Armingaud, F., Belardi, M., Hunt, C., Lima,
M., Malchisky Jr., W., Ruibal, J., R. and J. Taylor,
2004. Linux Handbook: A guide to IBM Linux
Solutions and Resources, IBM Int. Technical Support
Organization, 2
nd
Edition.
Matsubara, K., Blanchard, B., Nutt, P., Tokuyama, M.,
and T. Niijima, 2002. A practical guide for Resource
Monitoring and Control (RMC), IBM Int. Technical
Support Organization, 1
st
Edition.
ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics
136