2 MONITORING TOOLKIT
REQUIREMENTS
The RTS application suite is heavily based on
distributed processes calling for instruments to
schedule tasks and assess network availability and
the overall Catalogue update performance.
All middle to extensive weights systems must
have a monitoring tool to understand system
performance and react accordingly to system
behaviour. Nevertheless, RTS represents a larger
challenge: a distributed records system, supported by
a distributed network connecting several different
sources spread along a larger geographical area.
(Tanenbaum, 2002)
RTS works to unify and centralize clinical
information collected from heterogeneous data
sources, differing not only in operating systems but
also in the healthcare system encapsulating the
clinical information. In Portugal there are some
applicational systems distributed by hospitals,
regional health care centers and doctor support
systems. All of them produce clinical data, stored in
different database structures with certain associated
semantics, and all of these data are supposed to be
integrated into the RTS central database. If the
semantics are lost in the data conversion and
transmition, the data is useless, and the integration
purpose is lost.
This distributed infrastructure, the need of
keeping the data semantics, and the physical
obstacles like network failures and transmition delay
(which are inevitable) are the monitoring toolkit
challenges.
2.1 Monitoring Scenarios in the RTS
The Monitoring Toolkit requirements are organized
in three functional packages: Process management,
Catalogue Management and Probe management
(Figure 1).
Processes Management includes the use cases
to manage the integration scheduler, allowing,
through a graphical interface, to configure the RTS
scheduler, which defines when the configured
processes must execute.
Probe Management is used to monitor the
physical network layer and the information sources
availability. In the RTS monitoring system, a probe
is a message sent to the network for the purpose of
monitoring and collecting data about network
activity. It also compares a set of pre-selected probes
with the actual data coming from sources to assess
data correctness and individual source performance.
Figure 1: Monitoring Toolkit module use cases.
The Catalogue management is used to monitor
the data integration data process globally. Here, the
user is able to see the detailed integration process,
from the sub-process which builds the sources list to
the sub-process which saves the data in the database.
The Monitoring Toolkit builds charts of the
observed variables and provides a dashboard to
allow a global system state view and analysis to the
end-user.
While the scheduler management is quite trivial,
delegating the underling work to the Quartz library
(by Apache Jakarta Project, 2008) the two other
packages constitute the core of the monitoring tool:
the Sources availability probe, and the Catalogue
integration probe.
2.2 Sources Availability Probe
This task aims to check the network functional
status, selecting probe patients for reintegrate them
into the Catalogue. The fact of pre-selecting a
reduced number of patients and reuse it as a probe
set allows to obtain comparable results at different
levels, namely network availability, network
performance over time and integration sources
availability. The task starts to construct a list of all
available health care institutions in the network, and
while exists a source (from the built list) without a
probe patient, the process will keep selecting
patients already integrated in the catalogue to assign
them as a patient probe in a source without probe
patient. The assignment occurs when the selected
patient have episodes into the source which is being
processed. When all sources have probe patients, the
integration process will occur to the “source-probe
patient” combination (Listing 1).
A MONITORING TOOLKIT FOR A DISTRIBUTED CLINICAL DATA INTEGRATION ENGINE
301