that
dered ideas
inc
han
ue to
failure. One facet of fault-tolerance provides for a
f
to prepare for ultimate failure. More central to this
processor
bec
fault tolerance via test and reconfiguration’, Proc.
any.
bility of Large Scale Systems.
Ph.D. dissertation, New Mexico State University.
d Distributed Systems, vol.
Das ards
Fal al. 2004, ‘A generic pipelined task
Kha 001, ‘Fault-tolerant embedded
Rei ransient fault
Rob
group-
Sie
g at Carnegie Mellon
Sto
Time When Galaxies Were
Thi nt
Wo erview of
bypasses hardware limitations. Furthermore, an
implementation will demonstrate proof-of-concept
that the proposed solution does indeed support fault-
tolerant real-time decentralized control.
Although fault detection is a rich area of interest
in its own right, it is briefly discussed here as it
pertains to the SPACE testbed. The shared memory,
message passing, and interprocessor interrupt
resources can be used to construct various fault-
tolerance mechanisms. Already consi
lude using watchdog, neighbor, and ad hoc
detection methods to indicate the state of processors.
Reconfiguration for faults and recovery must be
efficient in real-time systems. Pipelined task
mapping performs this reconfiguration at the control
task level by dynamically assigning tasks based on
the working state of processors. At the data flow
level, working processors can assume the data
dling duties of failed processors in a state
machine-like fashion. That is, the sensor and
actuator channels that processors access will be
determined by the quantity and identities of the
processors that are failed. The mechanical attribute
of such an approach will foster efficiency.
7 SUMMARY & FUTURE WORK
Costly and mission-critical systems must exhibit
fault-tolerance in order to minimize loss d
grace period between fully functional and
non unctional states during which steps can be taken
project, however, is the uptime. It is desirable for a
space telescope to smoothly continue operation
despite the failure of processors on a multiprocessor
platform, which is a single-event upset in nature and
is a likely occurrence given the operating
environment. With tolerance for processor failure
enabled, a telescope can perform its scientific and
logistical duties with minimal downtime.
The distributed data flow architecture proposed
here has been conceived for fault-tolerant, real-time
decentralized control of a segmented reflector
telescope testbed. In contrast with a master-slave
configuration that has already been implemented,
this approach does not rely on a single
ause the data input-output can be handled by any
processor. This arrangement facilitates continuous
system operation despite any processor failure.
Future work includes completion of the distributed
data flow architecture, detection and reconfiguration.
Various fault detection and reconfiguration schemes
will be tested and analyzed in addition to issues of
sensor, actuator, and signal converter failure.
ACKNOWLEDGEMENTS
This work was supported by NASA under Grant
URC NCC 4158. Special thanks go to all the faculty
and students associated with the SPACE Laboratory.
REFERENCES
Baratloo, A. et al. 1995, ‘CALYPSO: a novel software
system for fault-tolerant parallel processing on
distributed platforms’, Proc. IEEE HPDC, PC, VA.
Blanton, R., Goldstein, S., & Schmidt, H. 1998, ‘Tunable
FTCS, Munich, Germ
Boussalis, H. 1979, Sta
Boussalis, H. 1994, ‘Decentralization of large space-borne
telescopes’, Proc. SPIE Symposium on Astronomical
Telescopes.
Choudhary, A. et al. 1994, ‘Optimal processor assignment
for a class of pipelined computations’, IEEE
Transactions on Parallel an
5, no. 4, pp. 439-445.
Gupta, B. et al. 1999, ‘Generalized approach tow
the fault diagnosis in any arbitrarily connected
network’, Proc. HiPC, Calcutta, India.
lorina, S. et
scheduling algorithm for fault-tolerant decentralized
control of a segmented telescope testbed’, Proc. ASME
DETC/CIE, Salt Lake City, UT.
n, G., & Wee, S. 2
computer system-on-chip for endoscope control’,
Proc. ISIC, Singapore.
nhardt, S. & Mukherjee, S. 2000, ‘T
detection via simultaneous multithreading’, Proc.
ISCA, Vancouver, BC.
erts, J. et al. 2004, ‘Efficient real-time parallel signal
processing for decentralized control using
pipelined scheduling’, Proc. ISNG, Las Vegas, NV.
wiorek, D. et al. 2004, ‘Experimental research in
dependable computin
University’, Proc. WCC, Toulouse, France.
ckman, H. et al. 1997, The Next Generation Space
Telescope: Visiting a
Young, The Association of Universities for Research
in Astronomy, Baltimore, MD.
enphrapa, P. et al. 2004, ‘A generalized fault-tolera
pipelined task scheduling for decentralized control of
large segmented systems’, Proc. CCCT, Austin, TX.
rden, K. & Dulieu-Barton, J.M. 2004, ‘An ov
intelligent fault detection in systems and structures’,
Structural Health Monitoring, vol. 3, no. 1, pp. 85-98.
A FAULT-TOLERANT DISTRIBUTED DATA FLOW ARCHITECTURE FOR REAL-TIME DECENTRALIZED
CONTROL
115