MEDICAL DIAGNOSIS ASSISTANT BASED

ON CATEGORY RANKING

Fernando Ruiz-Rico, Jose Luis Vicedo and Mar´ıa-Consuelo Rubio-S´anchez

University of Alicante, Spain

Keywords:

Medical diagnosis, category ranking.

Abstract:

This paper presents a real-world application for assisting medical diagnosis which relies on the exclusive use

of machine learning techniques. We have automatically processed an extensive biomedical literature to train

a categorization algorithm in order to provide it with the capability of matching symptoms to MeSH diseases

descriptors. To interact with the classiﬁer, we have developed a web interface so that professionals in medicine

can easily get some help in their diagnostical decisions. We also demonstrate the effectiveness of this approach

with a test set containing several hundreds of real clinical histories. A full operative version can be accessed

on-line through the following site: www.dlsi.ua.es/omda/index.php.

1 INTRODUCTION

Text categorization consists of automatically assign-

ing documents to pre-deﬁned classes. It has been ex-

tensively applied to many ﬁelds such as search en-

gines, spam ﬁltering, etc. and in particular, some ef-

forts have been focused on MEDLINE abstracts clas-

siﬁcation (Ibushi et al., 1999). However, as far as we

are concerned, it has never been used to assist medical

diagnosing by using the textual information provided

by biomedical literature together with patient histo-

ries.

Every year, thousands of documents are added to

the National Library of Medicine and the National In-

stitutes of Health databases

. Most of them have been

manually indexed by assigning each document to one

or several entries in a controlled vocabulary called

MeSH

(Medical Subject Headings). The MeSH tree

is a hierarchical structure of medical terms which are

used to deﬁne the main subjects that a medical arti-

cle or report is about. Due to the wide use of this

terminology, we can ﬁnd translations into several lan-

guages such as Portuguese and Spanish (i.e. DeCS

- Health Science Descriptors). The diseases sub-tree

not only deﬁnes on its own more than 4,000 patho-

logical states, but also offers the chance to search for

documented case reports related to each of them.

http://www.pubmed.gov

http://www.nlm.nih.gov/mesh

http://decs.bvs.br/I/homepagei.htm

Our proposal tries to estimate a ranked list of diag-

noses from a patient history. To tackle this problem,

we haveselected an existing categorization algorithm,

and we have trained it using the textual information

provided by lots of previously reported cases. This

way, a detailed symptomatic description is sufﬁcient

to obtain a list of possible diseases, along with an es-

timation of probabilities.

We have not used binary decisions from binary

categorization methods, since they might leave some

interesting MeSH entries out, which should probably

be taken into consideration. Instead, we have chosen

a category ranking algorithm to obtain an ordered list

of all possible diagnoses so that the user can ﬁnally

decide which of them better suits the clinical history.

In this paper, ﬁrst of all, we will explain the way

we have developed our experiments, including a full

description of the sources and methods used to get

both training and test data. Secondly, we will provide

an example of a patient history and both the expected

and provided diagnoses. We will ﬁnish by showing

and commenting several evaluation results on.

2 PROCEDURES

We have extracted the training data from the PubMed

database

by selecting every case reports on diseases

written in English including abstract and related to

humans beings. These documents were retrieved by

using the “diseases category[MAJR]” query, where

239

Ruiz-Rico F., Luis Vicedo J. and Rubio-S

anchez M. (2008).

MEDICAL DIAGNOSIS ASSISTANT BASED ON CATEGORY RANKING.

In Proceedings of the First International Conference on Health Informatics, pages 239-241

 SciTePress

[MAJR] stands for “MeSH Major Topic”, asking the

system for retrieving only documents whose subject is

mainly a disease. The query provided us with 483,726

documents

that we downloaded by sending them to

a ﬁle in MEDLINE format. We automatically pro-

cessed that ﬁle to obtain the titles and abstracts with

their correspondingMeSH topics. This led us to 4,024

classes with at least one training sample each.

With respect to the test set, we have used 400 med-

ical histories from the School of Medicine of the Uni-

versity of Pittsburgh (Department of Pathology

). Al-

though, so far the web page contains 500 histories

not all of them are suitable for our purposes. There

are some which do not provide a concrete diagnosis

but only a discussion about the case, and some others

do not have a direct matching to the MeSH tree. We

downloaded the HTML cases and afterwards we con-

verted them to text format by using from each doc-

ument the title and all the clinical history, including

radiological ﬁndings, gross and microscopic descrip-

tions, etc. To get the expected output, we extracted

the top level MeSH diseases categories correspond-

ing to the diagnoses given on the titles of the “ﬁnal

diagnosis” ﬁles (dx.html).

To select a proper ranking algorithm, we have

looked up the most suitable one through several

decades of literature about text classiﬁcation and cat-

egory ranking. We have chosen the Sum of Weights

(SOW) approach (Ruiz-Rico et al., 2006), that is more

suitable than the rest for its simplicity, efﬁciency, ac-

curacy and incremental training capacity. Since med-

ical databases are frequently updated and they also

grow continuously, we have preferred using a fast and

unattended approach that lets us perform updates eas-

ily with no substantial performance degradation af-

ter incrementing the number of categories or training

samples. The restrictive complexity of other classi-

ﬁers such as SVM could derivate to an intractable

problem, as stated by (Ruch, 2005).

To evaluate how worth our suggestion is, we have

measured accuracy through three common ranking

performance measures (Ruiz-Rico et al., 2006): Pre-

cision at recall = 0 (P

r=0

), mean average precision

(AvgP) and Precision/Recall break even point (BEP).

Sometimes, only one diagnosis is valid for a partic-

ular patient. In these cases, P

r=0

let us quantify the

mistaken answers, since it indicates the proportion of

correct topics given at the top ranked position. To

know about the quality of the full ranking list, we use

the AvgP, since it goes down the arranged list averag-

ing precision until all possible answers are covered.

BEP is the value where precision equals recall, that

Data obtained on February 14th 2007

http://path.upmc.edu/cases

Figure 1: Example of the ﬁrst level of a hierarchical diag-

nosis.

Figure 2: Output example after manual expansion of high

ranked topics (up) and by selecting the ﬂat diagnosis mode

(down).

is, when we consider the maximum number of rele-

vant topics as a threshold. To follow the same proce-

dure as (Joachims, 1998), the performance evaluation

has been computed over the top diseases level.

2.1 Availability and Requirements

No special hardware nor software is necessary

to interact with the assistant. Just an Inter-

net connection and a standard browser are enough

to access on-line through the following site:

www.dlsi.ua.es/omda/index.php.

By using a web interface and by presenting re-

sults in text format, we allow users to access from

many types of portable devices (laptops, PDA’s, etc.).

Moreover, they will always have available the latest

version, with no need of installing speciﬁc applica-

tions nor software updates.

3 AN EXAMPLE

One of the 400 histories included in the test set looks

as follows:

Case 177 – Headaches, Lethargy and a Sel-

lar/Suprasellar Mass

A 16 year old female presented with two months

of progressively worsening headaches, lethargy and

visual disturbances. Her past medical history in-

cluded developmental delay, shunted hydrocephalus,

and tethered cord release ...

The ﬁnal diagnosis expected for this clinical his-

tory is: “Rathke’s Cleft Cyst”, which is a synonym of

the preferred term “Central Nervous System Cysts”.

Translating this into one or several of the 23 top

MeSH diseases categories we are lead to the follow-

ing entries:

• Neoplasms

• Nervous System Diseases

• Congenital, Hereditary, and Neonatal Diseases

and Abnormalities.

In hierarchical mode, our approach provides auto-

matically a ﬁrst categorization level with expanding

possibilities as shown in ﬁgure 1. We provide navi-

gation capabilities to allow the user to go down the

tree by selecting different branches, depending on the

given probabilities and his/her own criteria. More-

over, a ﬂat diagnosis mode can be activated to directly

obtain a ranked list of all possible diseases, as shown

in ﬁgure 2.

After an individual evaluation of this case, we

have obtained the following values: P

r=0

= 1, AvgP =

0.92, and BEP = 0.67, since the right topics in ﬁgure

1 are given at positions 1, 2 and 4.

4 RESULTS

Last row in table 1 shows the performance measures

calculated for each medical history and its diagnosis,

averaged afterwards across all the 400 decisions. P

r=0

indicates that we get 69% of the histories correctly di-

agnosed with the top ranked MeSH entry. AvgP value

means that the rest of the list also contains quite valid

topics, since it reaches a value of 73%.

First row in table 1 provides a comparison be-

tween SVM (Joachims, 1998) and sum of weights

(Ruiz-Rico et al., 2006) algorithms using the well

known OHSUMED evaluation benchmark. Even us-

ing a training and test set containing different doc-

ument types, BEP indicates that the performance is

not far away from that achieved in text classiﬁcation

tasks, meaning that category ranking can also be ef-

fectively applied to our scenario.

Table 1: Averaged performance for both text categorization

and diagnosis.

Corpus Algor. P

r=0

AvgP BEP

OHSUMED SVM - - 0.66

SOW - - 0.71

Case reports and

patient histories SOW 0.69 0.73 0.62

5 CONCLUSIONS

We believe that category ranking algorithms may pro-

vide a useful tool to help in medical diagnoses from

clinical histories. Although the output of the catego-

rization process should not be directly taken to diag-

nose a disease without a previous review, the accu-

racy achieved could be good enough to assist human

experts. Moreover, our implementation demonstrates

that both training and classiﬁcation processes are very

fast, leading to an accessible and easy upgradable sys-

tem.

REFERENCES

Ibushi, K., Collier, N., and Tsujii, J. (1999). Classiﬁcation

of medline abstracts. Genome Informatics, volume 10,

pages 290–291.

Joachims, T. (1998). Text categorization with support vec-

tor machines: learning with many relevant features. In

Proceedings of ECML-98, 10th European Conference

on Machine Learning, pages 137–142.

Ruch, P. (2005). Automatic assignment of biomedical cat-

egories: toward a generic approach. Bioinformatics,

volume 22 no. 6 2006, pages 658–664.

Ruiz-Rico, F., Vicedo, J. L., and Rubio-S´anchez, M.-C.

(2006). Newpar: an automatic feature selection and

weighting schema for category ranking. In Proceed-

ings of DocEng-06, 6th ACM symposium on Docu-

ment engineering, pages 128–137.