NEURALTB WEB SYSTEM
Support to the Smear Negative Pulmonary Tuberculosis Diagnosis
Carmen Maidantchik, José Manoel de Seixas, Afrânio Kritski, Fernanda C. de Q Mello
Rony T. V. Braga, Pedro H. S. Antunes
Federal University of Rio de Janeiro, Cidade Universitária, C.P. 68504, 21945-970, Rio de Janeiro, Brazil
João Baptista de Oliveira e Souza Filho
Federal University of Rio de Janeiro, Cidade Universitária, C.P. 68504, 21945-970, Rio de Janeiro, Brazil
Graduated Dept., Celso Suckow Technological Education Center, Av. Maracanã 229, 20271-110, Rio de Janeiro, Brazil
Keywords: Decision support systems, neural networks, web technology, SNPT diagnosis.
Abstract: The World Health Organization estimates that one third of the world population is infected by
mycobacterium tuberculosis. Tuberculosis (TB) affects mainly poor health places in developing countries.
Therefore, it became mandatory to develop more efficient, fast, and inexpensive analysis methods. This
paper presents a decision support system that uses neural networks to sustain TB diagnosis. The output is
the probability that a patient has or not the illness and an assigned risk group. The NeuralTB system
encapsulates the knowledge needed for efficient anamnesis interview integrated to demographic and threat
factors typically known for tuberculosis diagnosis. It was developed with the Web technology and data were
described with a markup language to enable an efficient communication and information exchange among
experts. Data collected during the whole process can be used to identify possible new factors or symptoms,
since the infection transmission may evolve. This information can also support tuberculosis control
governmental entities to define effective actions to protect the health and safety of the population.
1 INTRODUCTION
Although effective antimicrobial therapies and
suitable diagnosis tests are already available, the
number of tuberculosis cases increases each year. In
particularly, the smear negative pulmonary cases
(SNPT) are hardly diagnosed. The World Health
Organization (WHO) estimates that one third of the
world population is infected by mycobacterium
tuberculosis (WHO 2002). So, there are
approximately already 2 billion infected persons.
Every year, 8 to 9 million new cases appear and 1.7
million individuals die. Therefore tuberculosis (TB)
is still a serious public health problem worldwide.
The rapid growth of the disease is related to
several factors, particularly the HIV epidemic, the
increase of social differences in many countries, and
the deterioration of health services mainly among
poverty population. These problems occur more
frequently in urban areas due to migrations that have
been happening over the last decades. Additionally,
the TB vaccine is not very effective.
Diagnostic tests either fail to identify at least half
of cases or are accurate but expensive, and it is often
difficult for patients to complete the necessary six-
month course of treatment, which contributes to new
drug-resistant strains of the disease. Currently, the
pulmonary tuberculosis diagnosis is still based on
bacilloscopy directly from the sputum smear, which
lacks sensitivity (around 50%). Besides that, this
method has no utility in the diagnosis of extra-
pulmonary TB. On the other hand, the
mycobacterium tuberculosis culture that presents a
higher sensitivity (80%) requires 4 to 6 weeks for
the result. Such long period for the confirmation of
the infection delays the beginning of the treatment
and allows the contagion among other people. The
smear-negative transmission rate of mycobacterium
tuberculosis corresponds to 17% among exposed
individuals (Sarmiento et al, 2003). Moreover, in
deprived countries, only some control programmes
permit culture performance in their primary-care
diagnostic (Santos et at, 2006). Consequently, fast
and accurate diagnosis of SNPT could provide lower
198
Maidantchik C., Manoel de Seixas J., Kritski A., C. de Q Mello F., T. V. Braga R., H. S. Antunes P. and Baptista de Oliveira e Souza Filho J. (2007).
NEURALTB WEB SYSTEM - Support to the Smear Negative Pulmonary Tuberculosis Diagnosis.
In Proceedings of the Ninth International Conference on Enterprise Information Systems - AIDSS, pages 198-203
DOI: 10.5220/0002366401980203
Copyright
c
SciTePress
morbidity and mortality, and case detection at a less
contagious grade.
The culture result for the mycobacterium
tuberculosis can be obtained by automated
diagnoses methods, commercialized in the health
care area. Besides being expensive, those methods
have not been validated in different epidemic
situations. Their use in routine conditions is
restricted to reference or research laboratories
(Perkins and Kritski, 2002). New diagnosis tests as
well as the use of statistical models to support the
SNPT analysis constitute a real challenge. To predict
the patient probability on having TB researchers
employ neural network (El-Solh et al., 1999) or
multivariate logistic regression and classification
tree (Mello, 2001). Santos (2003) uses neural
networks and classification trees to identify patients
with clinical-radiological suspicion of SNPT. When
formulated in a systematic way and implemented
with high qualified data, statistical models can be
representative of the clinical problem under
evaluation and could be useful for physicians in their
clinical routine, as well as for public health policy
administration (Castelo et al., 2004).
This paper presents the NeuralTB Web system
that aims at supporting the SNPT diagnosis in health
care units of limited resource areas. The system
comprises artificial neural networks as a model for
diagnosing the infection. The software was
developed using the Web technology and offers
user-friendly and intuitive interfaces for symptoms
registering, patient monitoring, and result retrieval.
The data stored in each health care unit are easily
merged in a central database.
This paper is organized as follows. Section 2
describes the artificial neural network diagnosis
model and the data set in study. Section 3 presents
the NeuralTB Web system and Section 4 explains
implementation details. Conclusions and future work
are described in Sections 5.
2 THE DIAGNOSIS MODEL
In order to avoid TB becoming a pandemic that
would cause serious illness in people and spread
quickly throughout populations, the fight against the
disease includes discovering new tools for
prevention, diagnosis support, and treatment.
Software engineering and the Internet have an
important role in the battle against infirmities that
are quickly spread among different countries.
Computing programs register diseases, symptoms,
and locations where infected people live. Internet
sites communicate new drugs, treatments, and risk
factors related to maladies, allowing an exchange of
expertise. Searchable indexes provide access to
medical directories and research programs.
We propose a decision support system that
health and medical professionals may use to sustain
the diagnosis of SNPT under routine conditions in
the hospitals and health care units. It should be
clearly stated that the purpose of the system is not to
replace physicians. The proposed model suggests
that mathematical modeling for classifying SNPT
cases could be an useful tool for optimizing the
utilization of expensive tests, and to avoid costs of
unnecessary anti-TB treatment. The diagnosis
corresponds to an ongoing process that requires
accurate investigation and, therefore, the system
output should be analyzed together with
interviewing, inspection, auscultation, and
examination of the laboratory results.
One concern of the project was to develop a tool
that would be suitable for areas of limited resources.
Therefore, the requirements of low cost, easy access,
and user-friendliness were considered. Since the
target disease is geographically spread among
different places, our group decided to implement the
system using the Web technology.
SNPT experts defined a set of symptoms that
would determine whether a patient would have or
not the disease. Based on this an artificial neural
network model was developed. The network output
corresponds to the probability that a patient have or
not SNPT and the risk group (low, medium, high
level risk) for which the patient would belong to.
The developed Web system registers the input
information, executes the neural networks code,
stores the result, monitors the patients data, and
manages data files. TB experts supported the project
development process, validating each step to
guarantee that the resulting system would achieve
the project goals.
2.1 Data Set for Modelling
In order to determine the set of symptoms and
characteristics that would indicate the infection, 136
patients agreed to participate. They were referred to
the University Hospital of Federal University of Rio
de Janeiro, from March, 2001 to September, 2002,
with clinical-radiological suspicion of SNPT.
The input data set corresponds to information
from anamnesis interview integrated to demographic
and risk factors typically known for tuberculosis
diagnosis. Forty three per cent of the patients
actually showed TB in activity. Initially, clinical
variables were considered: age, coughs, spit, sweat,
fever, weight loss, chest pain, shiver, dyspnea,
diabetes, alcoholism, and others.
NEURALTB WEB SYSTEM: Support to the Smear Negative Pulmonary Tuberculosis Diagnosis
199
2.2 The Artificial Neural Network
The artificial neural network model is fed from data
collected from the questionnaires filled by patients
in the health care units. The dichotomy variables
were codified as -1 and 1, representing the absence
or the presence of a symptom, respectively. Three
categories were allowed for qualitative variables: -1
(lack of an indication), 1 (presence of the symptom),
and 0 (ignored). In model development, relevance of
variables was also addressed, which allowed more
compact network designs. Starting from 26
variables, the relevance analysis (Seixas et al,. 1996)
showed that models could be developed considering
12 or just 8 variables. Such variable suppression was
also validated by TB experts.
With respect to network topology, a fully-
connected multilayer feedforward architecture
trained with backpropagation algorithm was
designed. Input nodes varied, according to data
compaction scheme, from 26 to 8. The network has
a single output neuron, and training targets were
defined as 1 (active TB) and -1 (otherwise). The
number of neurons in the single hidden layer also
varied according to model complexity, from 3 to 4
neurons. The hyperbolic tangent is the activation
function for all neurons.
The risk group assignment was obtained by
means of a modified- ART clustering procedure
(Vassali et al., 2002). Risk group assignment was
certified by TB experts as symptoms identified in
each risk group are also considered by the TB
experts in a detailed exam.
Due to restrictive statistics of the database, cross
validation (
Kohavi, 1995) was used for defining both
training (network design) and testing (performance
evaluation) tests. For each cross validation test, the
training set comprised 80% of the patients and the
remaining 20%, formed the test set. Performance
was evaluated in terms of sensitivity and specificity
for the testing set. Considering twelve input
variables, it was possible to obtain both high
sensitivity (100%) and specificity (80%).
3 THE NEURALTB WEB SYSTEM
Within this project, our group aims at providing an
open and secure platform that supports an efficient
and fast distribution of collaborative applications.
An untied architecture allows the integration with
other systems, dynamic processes, and
heterogeneous data repositories. The computing
solution must also provide a good connectivity to
any data placed anywhere. These requirements
guarantee the accessibility of the system either in
health care units or hospitals, independently of their
location. The software group used interoperable
technologies for the system development.
Initially, a system version that could work over
the Internet was developed. The health care units
would only need to have a browser and an Internet
connection. All data and processing would be
respectively stored and performed in a central
server. For the units that do not have Internet access,
a local version is used. The neural network program
is also locally executed and the result is placed
together with the patient data. The information that
is stored in the computer of all health care units is
periodically transferred to a central server that
collects the data into a main repository.
The local version does not require an Internet
access for its execution. However, it is more
laborious to update and maintain the system due to
its geographical distribution. On the other hand, a
version that works over the Internet avoids
compatibility problems since an unique version runs
in the server. It also facilitates the data transfer from
the health care units to the central repository.
Within the system, there are three user
categories: administrator, attendant, and physician.
Administrators can insert new users, modify user
attributes, and perform actions related to data files,
and system installation. Attendants may include and
edit patient personal data and his/her symptoms.
Physicians perform actions on patient data and are
the only ones to have access to the network output.
3.1 Input Data Form
In order to fill the questionnaire, attendants are
taught to analyse individual’s physical condition, to
give further information about each question, and to
explain the importance of providing the correct
answer. Therefore, in order to facilitate the data
input into the system, a hypertext form was
designed. The form is composed by text fields to
include the patient name and date of birth. Other
items allow the selection of only one option among
three available alternatives (lack of an indication,
presence of the symptom, and the patient do not
know the answer). In case of relationships among
items, the choice of an option automatically obliges
the selection of the respective option in another item.
For example, if hemoptysis (coughing up blood) is
chosen as an existing symptom than the existence of
cough has also to be selected.
Two kinds of support needed during the
questionnaire filling were implemented. The first
group is related with typing errors, data mismatch,
and correlations between items. As an example,
empty data is not accepted. Concerning the date of
ICEIS 2007 - International Conference on Enterprise Information Systems
200
birth, the system automatically validates the days
and months, bissextile years, etc. Then, the system
calculates the patient age that can be confirmed right
away. In case of errors, an alert window comes out
informing the mistake.
The second group of help appear as an alert
window with further information about the item,
explaining how to make a question, and how one can
interpret the answer. The specification of this type of
support required the extraction of the knowledge
used during an anamnesis interview. Hendriks and
Vriens (1999) and Probst et al (1999) suggest a basic
set of fundamental activities to systematically
manage knowledge: identify important knowledge
that can be used; capture and store useful knowledge
in a repository; maintain knowledge in the storage
area through update or removal of outdated
information.
Subsequent to the inclusion of a new patient, the
system automatically executes the neural network
program and stores the output together with the
information that was entered through the form. Later
on, physicians may analyze the result together with
other clinic and laboratorial information.
3.2 Patient Monitoring
In order to recover data from the system repository,
the user may define one or more attributes, such as
cough, sputum, fever, etc and associate with a
specific value. The attributes operate as filters that
trigger the patients which data fits the query. The
“+” option allows the definition of other conditions
in the query, i.e., the inquiry can combine several
attributes using logical operators (and, or). One
condition within a query can be removed by
selecting the “-” option.
In case the option “Search for” is selected
without specifying values to an attribute, all patients
and respective data are presented, ordered by name,
in a table format. The attribute names are placed in
the heading of the table. The system also provides a
facility through which a physician can set the patient
as already analyzed. Therefore, it is also possible to
search for patients which data were not investigated
yet, supporting the information management.
3.3 Probabilities and Risk Groups
When the neural network is fed from a new patient
data, it provides as an output the classification
probability of the patient to have or not the TB. In
case of TB identification, the system also provides
the risk group to which the patient belongs.
In case the neural network classifies the patient
as having the TB, the output will be presented as the
sentence “the patient has P% of having the TB”,
where “P%” represents the probability for active TB
according to the artificial neural model. On the other
hand, in case the neural network classifies the
patient as not having TB, the output will be
presented as the sentence “the patient has P% of not
having the TB”, where “P%” represents the
probability for no active TB, according to model.
The risk groups are presented in a graphical way
like a car traffic light using a universal color code. A
patient fits in only one of the three risk groups that
are drawn as circles painted with red, yellow, and
green colors to symbolize, respectively low,
medium, and high risk, as presented in Figure 1. The
patient is represented in the figure as the “x” letter
and the closer he/she is to the center, higher is the
probability that the patient belongs to the group.
Figure 1: Risk group representation (in grey scale).
3.4 File Management
The NeuralTB Web system was designed to operate
in geographically distributed environments.
Therefore, the information obtained during the
anamnesis interview together with the neural
network output is stored in an archive. The files
from different units are transferred to a central
repository were all patient information are inserted
into a database. In case of changes in the patient
data, the system manages the records to be copied or
sent again in order to update the main database. For
security reasons, NeuralTB also provides a backup
functionality.
The central repository stores the name of the
health care units, associating the patients data with
the units where their information was collected.
Further information about the units region, such as
geographical relation of health conditions to socio-
economic status and poverty rates, may be specified,
structured and integrated into the database.
NEURALTB WEB SYSTEM: Support to the Smear Negative Pulmonary Tuberculosis Diagnosis
201
Medical and health professionals of TB research
groups can access the central repository to extract
information that can help on the development of new
disease analysis methods and the identification of
new demographic and risk factors that can be used
later for the tuberculosis diagnosis.
4 THE IMPLEMENTATION
The data representation format is an important
aspect that was considered to efficiently manage the
whole information. Markup languages, as XML, can
be used to describe knowledge structures and to
support institutional memory development
(Rabarijaona et al, 2000, Cook, 2000). XML may
provide a standard structure to communicate and
interchange data and knowledge among diverse
systems. The language allows the creation of
multiple visions of the same item and also provides
an easy mechanism to capture, store, present and
recover information. Considering these benefits, we
developed an XML-based approach to describe the
different types of information manipulated during
the whole process of the SNPT diagnosis.
The group identified three stages where data had
to be properly represented: during the anamnesis
interview, for describing the patient data, and to
extract statistical information within research
activities. TB specialists warned that risk factors, the
questions made to the patients, and relationships
among the stored records may vary according to
locations or other factors, such as multidrug
resistance (MDR) that is one of the main causes of
ineffective treatment of new TB cases. Therefore,
the use of XML facilitates the maintenance of the
knowledge represented in the three stages. The tags
identify the current data and new tags can be easily
defined. The language also allows the definition of
associations among diverse types of information.
In order to assure the compatibility between the
data structure and the system functionalities, the
NeuralTB interface and operations were conceived
and designed in a way to guarantee its correct
execution independently on both the way
information is organized and the kind of records that
are manipulated. This requirement is achieved by
creating the interface with the system operations in
the moment the application is executed. The
interface reads the XML and presents all commands
associated with the tags. So, in case one record type
is excluded, the system will do not perform any
operation related to this information. On the other
hand, in case a new record type is included, it is
mandatory to define both the tag that identifies the
data and the corresponding operation.
Another advantage of using XML is that it
facilitates the integration among data that comes
from different health care units and hospitals.
Markup languages make easy the combination of
heterogeneous records. The use of XML also allows
uniform systems interoperability and offers efficient
mechanisms for information recovery.
The system was designed in modules to facilitate
its integration with other applications. The interface
between the system and the neural networks
program is also defined through a XML file. This
archive describes the name of the application, the
neural network weight vector to be used, and the
output. This approach facilitates when users want to
execute a different neural networks program or
update the weight vector.
4.1 Computing Requirements
The NeuralTB Web System runs over the Apache
HTTP Server for both UNIX and Windows XP
operating systems. The system provides a shell
executable of setup programs that automatically
install a directory structure and respective files in the
computer of the health care unit or hospital. The
hardware requirements are: PC computers with USB
driver for file transfer (in case of local version) or an
Internet connection, and with 128 MB, or
preferentially, 256 MB RAM memory.
The system operations were implemented as CGI
(Common Gateway Interface) programs, using the C
language. The Javascript language is used to write
functions embedded in HTML pages and interact
with the Document Object Model (DOM) of the
page to perform tasks not possible in HTML alone.
The Cascading Style Sheets (CSS) language is used
to style the web pages written in HTML and format
the XML documents.
In order to draw the risk group representation,
the GD graphics library was used. GD is an open
source code library for the dynamic creation of
images, allowing programmers to easily generate
PNG, JPEG, GIF (among other images formats),
from many different programming languages (C,
Perl, and PHP).
The central repository was implemented using
MySQL, an open source relational database
management system (RDBMS) that uses Structured
Query Language (SQL).
5 CONCLUSIONS
Decision support systems can be considered as
useful elements for helping physicians on the
tuberculosis diagnosis. The application can be used
ICEIS 2007 - International Conference on Enterprise Information Systems
202
as a learning tool since it gathers information,
defined by experts, that is needed for the
tuberculosis diagnosis.
The NeuralTB system can be easily installed in
hospitals or health care units and can also be
executed in portable computers that are carried to
different regions. The approach to incorporate the
knowledge into the system, allowing an easy
maintenance of the information, guarantees the
lifetime of the proposal.
Currently the NeuralTB system is being installed
in health care units in the Rio de Janeiro, the number
one city for TB cases in Brazil. This effort will
facilitate the implantation of a network to integrate
diverse professionals and specialists in tuberculosis.
During the system operation we will be able to
validate the impact of this initiative.
As next steps, we intend to integrate the
NeuralTB input data form with other questionnaire
items used during an anamnesis interview. Actually,
the proposal is to integrate the input form with the
system that is used in the hospital reception. As a
result, the attendance will use a single environment
to register all data related to patients. Another
enhancement is to develop queries in the central
database to extract the information that comes from
the various health care units. The knowledge of
which information should be extracted can also be
modelled and incorporated into the repository. Data
quality metrics (Chapman, 2005) will also be
applied to ensure network information quality. This
is quite important as network performance relies on
the accuracy of questionnaire answers. The
continuous update of the neural model with
incoming new data is also being developed. This
involves stability studies and the monitoring of TB
main features, trying to track disease evolvement in
time and geographically.
We expect that the accomplishments of this
project bring social benefits, allow a better
integration of the information technology in the
diagnosis domain, and provide an infrastructure to
enable an efficient communication and information
exchange among tuberculosis experts.
ACKNOWLEDGEMENTS
The authors thank the Tuberculosis Research Unit,
Faculty of Medicine, Federal University of Rio de
Janeiro, for making available the data used in this
work and CAPES, CNPq, and FAPERJ for
financially supporting this project.
REFERENCES
Castelo A., Kritski A.L., Werneck A., Lemos A.C.,
Ruffino Netto A., et al., 2004. Brazilian Directives for
Tuberculosis. J Brás Pneumo, 30 (supl 1). 1- 86. In
Portuguese.
Chapman, A., 2005. Principles of Data Quality, Report,
Global Biodiversity Information Facility.
Cook, J., 2000. XML Sets Stage for Efficient Knowledge
Management, IT professional, v.2, n.3, 55-57.
El-Solh, A.A., Hsiao, C.-B., Goodnough, S., Serghani, J.,
Grant, B.J.B., 1999. Predicting Active Pulmonary
Tuberculosis using an Artificial Neural Network.
Chest, 116, 968–973.
Hendriks, P., Vriens, D. 1999. Knowledge-Based Systems
and Knowledge Management: Friends or Foes?.
Information & Management, v.35, n.2 (Feb), 113-125.
Kohavi, R., 1995. A study of cross-validation and
bootstrap for accuracy estimation and model selection.
In International Joint Conference on Artificial
Intelligence.
Mello, F.C.Q., 2001. Smear Negative Pulmonary
Tuberculosis Predicting Models, Ph.D. Thesis,
Medicine Faculty, Federal University of Rio de
Janeiro, Brazil. In Portuguese.
Perkins, M.D.., Kritski, A.L., 2002. Perspectives.
Diagnostic Testing in the Control of Tuberculosis. In:
Bull WHO, 80 (6), 512-513.
Probst, G., Raub, S., Romhardt, K. 1999. Managing
Knowledge: Building Blocks for Success, 368 pp,
ISBN: 0-471-99768-4.
Rabarijaona, A., Dieng, R., Olivier, C., Quaddari, R. 2000.
Building and Searching an XML-Based Corporate
Memory, IEEE Intelligent Systems, v.15, n.3 (May),
56-63.
Sarmiento, O., Weigle, K., Alexander, J., Weber, D.J.,
Miller, W., 2003. Assessment by Meta-Analysis of
PCR for Diagnosis of Smearnegative Pulmonary
Tuberculosis, Journal of Clinical Microbiology, 41,
3233-3240.
Santos, A.M. 2003. Neural Networks and Classification
Trees Applied to Smear Negative Pulmonary
Tuberculosis Diagnosis, Ph.D. Thesis, COPPE/ UFRJ,
Rio de Janeiro, Brazil. In Portuguese.
Santos, A.M., Pereira, B.B., Seixas, J.M., Mello, F.C.Q.,
Kristski, A.L., 2006. Neural Networks: an Application
for Predicting Smear Negative Pulmonary
Tuberculosis. In: Balakrishnan, N.; Auget, J.L.;
Mesbah, M.; Molenberghs, G. (org.). In: Advances in
Statistical Methods for The Health Sciences. 279-292.
Seixas, J.M., Calôba, L.P., Delpino, I., 1996. Relevance
Criteria for Variable Selection in Classifier Design. In:
International Conference on Engineering Applications
of Neural Networks, 451-454.
Vassali, M.R., Seixas, J.M., Calôba, L.P., 2002. A Neural
Particle Discriminator Based on a Modified Art
Architecture. In: IEEE International Symposium on
Circuits and Systems, v. II., 121-124.
World Health Organization (WHO), 2002. Stop TB annual
report 2001.
NEURALTB WEB SYSTEM: Support to the Smear Negative Pulmonary Tuberculosis Diagnosis
203