4.2.3 Workflow Engine
When a new request or task is inserted into the
database, the Workflow Engine examines and allo-
cates the task. If the task is a primitive task, it is sub-
mitted directly to the Portable Batch Scheduler (PBS)
for processing. If the task is a compound task, it is
broken down into child tasks, which themselves could
be compound or primitive. As the primitive tasks
complete, their results are listed in the database. If a
primitive task has a parent compound task, the results
are rolled up to the parent. A compound task is com-
pleted only after all the primitive tasks it depends on
complete. For example, there is a primitive method
DailyImage which produces an image from a single
day of AIRS or AMSU data. One of its parameters
is the data day to process. Another method DailyIm-
ageRange is a compound method that takes parame-
ters startdate and enddate. When the work flow engine
considers a new DailyImageRange request, it creates
additional DailyImage tasks for each day of the range.
All the primitive tasks are submitted at once to PBS.
4.2.4 Portable Batch Scheduler
When a primitive task is submitted to SOAR, the qsub
process of PBS is used to submit the task to PBS. PBS
monitors the computation resources available on the
Bluegrit Computation Cluster and schedules the tasks
to run as resources are available. PBS will execute
as many jobs in parallel as it can within the comput-
ing resources available, and as jobs complete, the next
queued job will be executed.
4.2.5 Bluegrit Computation Cluster
Bluegrit is an IBM cluster comprised of 32 blades
with 2.2 GHz IBM dual powerpc CPUs. It is con-
nected to the outside world through a management
node. The SOAR work ow software submits software
through PBS by an SSH connection from the database
on Matisse to the Bluegrit management node. PBS al-
locates one of the blades to execute each job.
4.2.6 Science Data Archive
The Bluegrit Computation Cluster includes a 2.2 TB
shared filesystem which is available on all the individ-
ual nodes of the cluster from a large Intel based NFS
server. The input science data sets needed for pro-
cessing reside on the disk, and any files created during
processing are output to the filesystem. Currently, for
the test system, it holds 15 months of AIRS/AMSU
gridded data. Eventually other tasks will be inte-
grated in the system that will extend the current static
Science Data Archive to interact with external data
archives at NASA and NOAA to retrieve needed in-
put data on demand. Additionally, whenever a science
algorithm task is submitted the cluster, prior to ac-
tual execution, the programs check the Science Data
Archive to see if a meeting the requested criteria al-
ready exists from a previous request. If so, the request
is not recreated, it is simply returned as the result of
the new request.
4.2.7 HTTP File Server
When data files, images, or animations are created on
the cluster and stored on the Science Data Archive,
they are made available to the end user through a
read/only HTTP file server running on the Bluegrit
management node. The URLs for the result files
on the Bluegrit HTTP server are stored in the task
database and returned to the client through the SOAP
server on request.
4.3 Directory Server
The traditional mechanism for discovery of web ser-
vices is through a UDDI server, and WSDL defini-
tion for SOAR could be registered with a UDDI server
with appropriate keywords for an independent user to
discover and access the server. The WSDL includes a
complete description of the SOAP web services avail-
able from the Process Server, and sufficient informa-
tion for a SOAP client to access the system.
4.4 Web Services
As previously mentioned, in addition to the science
algorithms, the system includes a number of ancillary
web services for interacting with the system. These
include login, UserTaskStatus, CompletedTasks, Ge-
tResultsById and RemoveTaskById.The focus of the
current development has been to construct a frame-
work for web services which can later be extended
by adding additional algorithms. As discussed above,
the tasks are processed individually, while compound
tasks are composed of multiple primitive tasks. These
algorithms utilize the GrADS system(Goldberg et al.,
2003), a tool that is widely used by the atmospheric
science community for performing science data visu-
alizations.
SERVICE ORIENTED ATMOSPHERIC RADIANCES (SOAR) - A Web Service Research Tool for the Gridding and
Synthesis of Multi-Sensor Satellite Radiance Data for Weather and Climate Studies
375