IMPROVING THE RESULTS OF THE CONTENT-BASED IMAGE
QUERY ON MEDICAL IMAGERY
Liana Stanescu, Dan Dumitru Burdescu, Anca Ion, Marius Brezovan
Faculty of Automation, Computers and Electronics,University of Craiova,Bvd. Decebal, Craiova, Romania
Keywords: Image feature extraction, image processing, content-based visual query, color, color texture, histogram, co-
occurrence matrices.
Abstract: The article presents a solution for raising the quality of the content-based image query process, namely of
the number of the relevant images retrieved from the database for a query image, in the case of the color
medical images. The solution combines the content-based image query on color feature with color texture
feature. There have been effectuated and presented studies of content-based image query on color images
from the field of the digestive apparatus gathered with an endoscope. The color information is represented
by the color histograms computed on HSV color space quantized at 166 colors. In order to represent the
color texture the co-occurrence matrices are used. To compute the dissimilitude between the images, the
histogram intersection has used for the color and the Euclidian distance for the color texture. The union of
the results obtained with the two content-based image query methods on color and color texture, performed
in parallel, leads to a greater number of retrieved relevant images. The reason is, that, generally, in the case
of the considered diseases there are changes in the color and the texture of the sick tissue.
1 INTRODUCTION
The development of the multimedia field, the
creation of new images and video archives of large
dimensions, have led a series of researchers to turn
their attention, over the past decade, towards
creating new tools for retrieving the visual data
based on their content (Del Bimbo, 2001).
Retrieving the visual information is important in
many applications starting with the artistic domain
(art galleries, museums), to security and medical
fields, which are in fact the most important.
Visual information retrieval represents a new
research direction in information technology. Its
purpose is to retrieve from a database the relevant
images for a query.
It represents in fact, an extension of the
traditional process of information retrieval to visual
media. For a computer, an image is only a sequence
of binary numbers or a bi-dimensional array.
Recognizing the images and objects on the
computer, in this kind of applications is a difficult
matter. This is due to the fact that the information
existing in the multimedia data is not structured and
therefore it is not possible to use attributes
describing its content.
As a result, the extraction of data that describe as
accurately as possible the visual content is essential.
Visual elements such as color, texture, shape that
directly describe the visual content, and also high-
level concepts (for example the significance of the
objects) are used for retrieving images with a similar
content from the database (Del Bimbo, 2001).
One of the domains in which the use of the
content-based visual retrieval is needed is the
medical one. This is mainly due to the fact that in
the process of patient diagnosis, medical tools that
offer images to the doctor are used on a large scale
(computer tomograph, endoscope, X-ray, ecograph,
etc.). There are hospitals in which more than 10000
images are gathered daily (Muller et al, 2004). This
process led to very large medical image databases.
Except for the traditional information retrieval in
these databases (taking into account the patient
name, the doctor name, the diagnosis), it is
necessary to have a content-based visual query for
the following reasons:
From the conversation with the doctors, the
next situation appears frequently: the doctor
visualizes a medical image, doesn’t know
exactly the diagnosis, but he is aware of the
fact that he has seen something similar, but
432
Stanescu L., Dumitru Burdescu D., Ion A. and Brezovan M. (2006).
IMPROVING THE RESULTS OF THE CONTENT-BASED IMAGE QUERY ON MEDICAL IMAGERY.
In Proceedings of the Third International Conference on Informatics in Control, Automation and Robotics, pages 432-437
DOI: 10.5220/0001212604320437
Copyright
c
SciTePress
doesn’t have the means to search for
something similar in the database; the
problem can be solved establishing that
image as query image and the content-
based image query will provide the similar
images from the database; it is very likely
that among the retrieved images should be
the searched image together with its
diagnosis, observations, treatment; so the
content-based image query can be directly
used in the diagnosis process;
It may be necessary in other cases to
specify a region of an image as a query and
to retrieve all images containing a similar
region. In this second case, an automated
algorithm for the correct extraction of color
regions is very important.
The education and research activity can be
improved by using the access visual
methods.
The visual characteristics allow not only
the retrieving of the patients having the
same disease, but also the cases where the
visual similitude exists, but the diagnosis
differs.
There are still few systems that are really
integrated into the medical diagnosis process, and
the work for the application of the most suitable
algorithms in image processing and features
extraction continues (Muller et al, 2004).
The research has shown that the methods used in
content-based image query on common images
(from nature), do not have the same good results on
medical images (Stanescu and Burdescu, 2003).
Therefore, it is necessary to individualize the
methods on the diagnosis level.
So, on the gray-level images can be applied the
content-based image query based on texture or
shape. A large part of the images given by the
medical apparatus are color, in which case the
characteristics color, color texture and shape must be
considered.
In this article, the research has been effectuated
on color images from the field of the digestive
apparatus gathered with an endoscope, stored in a
database on which is applied the content-based
image query on color and color texture features.
The paper emphasizes that using the color and
color texture features in content-based image query
will lead to better results in some diseases. There
are some diseases that are characterized by the
change of the color and the texture of the affected
tissue, for example: ulcer, colitis, esophagitis,
polyps, ulcer, and ulcerous tumor.
2 CONTENT-BASED IMAGE
QUERY ON COLOR FEATURE
The color is the visual feature immediately
perceived on an image. In content-based visual
query on color feature is important the used color
space and the level of quantization, meaning the
maximum number of colors. This study uses the
representation of images in the HSV color space that
has the properties of being complete, compact,
natural and uniform and its quantization to 166
colors (Smith, 1997).
The color histograms represent the traditional
method of describing the color properties of the
images. They have the advantages of easy
computation and up to certain point are insensitive
to camera rotating, zooming, and changes in image
resolution (Del Bimbo, 2001).
The transformation from the RGB color space to
HSV color space is realized with the equations
(Smith, 1997): v
c
= (r,g,b) represents a color point in
RGB color space and w
c
= (h,s,v) is the color point
transformed in HSV color space, where w
c
=T
c
(v
c
).
For r,g,b [0…1], then T
c
gives h,s,v [0…1]:
(1)
The procedure of quantization of the HSV color
space to 166 colors is:
),,max( bgrv
=
v
bgrv
s
),,min(
=
),,min(
'
bgrv
rv
r
=
),,min(
'
bgrv
gv
g
=
),,min(
'
bgrv
bv
b
=
),,min(,
),,max(,'5ßh
bgrgand
bgrrifb
=
=
+
=
),,min(,
),,max(,'1ßh
bgrgand
bgrrifg
=
=
),,min(,
),,max(,'1ßh
bgrband
bgrgifr
=
=
+
=
),,min(,
),,max(,'3ßh
bgrband
bgrgifb
=
=
),,min(,
),,max(,'3ßh
bgrrand
bgrbifg
=
=
+
=
otherwiser ,'5ßh
=
IMPROVING THE RESULTS OF THE CONTENT-BASED IMAGE QUERY ON MEDICAL IMAGERY
433
Proc quantize
color = 0
h_scale = 1 / 18
s_scale = 1 / 3
v_scale = 1 / 3
If
s = 0 Then
color = 162 + Int(v/(1/4))
If
color = 166 Then
color = color - 1
End
Else
If
Int(v / v_scale) >= 1 Then
color=color+54*(Int(v/v_scale))
If
color Mod 3 * 18 * 3 = 0 Then
color = color - 3 * 18
End
End
If
Int(s / s_scale) >= 1 Then
color=color+18*(Int(s/s_scale))
If
color Mod 3 * 18 = 0 Then
color = color - 18
End
End
If
Int(h / h_scale) >= 1 Then
color=color+(Int(h/h_scale))
If
color Mod 18 = 0 Then
color = color - 1
End
End
End
End
For computing the distance between the color
histograms of the query image and the target image,
the intersection of the histograms is used (Smith,
1997):
(2)
3 CONTENT-BASED IMAGE
QUERY ON COLOR TEXTURE
FEATURE
Together with color, texture is a powerful
characteristic of an image, present in nature and
medical images, where a disease can be indicated by
changes in the color and texture of a tissue.
It is difficult to describe in words the image
texture. Still, there are representations of the texture
based on statistical and structural properties of
brightness patterns. A series of methods have been
studied to extract texture feature (Del Bimbo, 2001).
Among the most representatives methods of texture
detection is the one that uses the co-occurrence
matrices.
There are many techniques used for texture
extraction, but there isn’t a certain method that can
be considered the most appropriate, this depending
on the application and the type of images taken into
account.
Although most images coming from nature and
other fields are color, the majority of research has
been done on grayscale textures, for several reasons:
high costs for color cameras, high computational
costs for color image processing, large complexity
even for grayscale textures. However, over the past
few years, research has been done in color textures
recognition, proving that taking into account the
color information improves the color texture
classification (Palm et al.,2000, Zhang et al.,2000).
For an image f(x, y), the co-occurrence matrix
h
dφ
(i, j) is defined so that each entry (i, j) is equal to
the number of times for that f(x
1,
y
1
) = i and f(x
2,
y
2
) =
j, where (x
2,
y
2
) = (x
1,
y
1
) + (dcosφ, dsinφ) (Del
Bimbo, 2001).
In the case of color images, one matrix was
computed for each of the three channels (R, G, B).
This leads to three quadratic matrices of
dimension equal to the number of the color levels
presented in an image (256 in our case) for each
distance d and orientation φ.
The classification of texture is based on the
characteristics extracted from the co-occurrence
matrix: energy, entropy, maximum probability,
contrast, inverse difference moment and correlation
(Del Bimbo, 2001).
1. Energy
(3)
2. Entropy
(4)
3. Maximum probability
(5)
4. Contrast
(6)
5. Inverse difference moment
(7)
Φ
ba
d
baP
,
2
,
),(
),(log),(
,2
,
2
,
baPbaP
d
ba
d ΦΦ
),(max
,
,
baP
d
ba
Φ
),(
,
,
baPba
ba
d
k
Φ
λ
Φ
baba
k
d
ba
baP
;,
,
),(
λ
) |h| , |h| min(
[m])h[m],min(h
-1tq,
tq
1-M
0m
tq
d
=
=
ICINCO 2006 - ROBOTICS AND AUTOMATION
434
6. Correlation
(8)
where means and standard deviation are defined
as:
(9)
The three vectors of texture characteristics
extracted from the three occurrence matrices are
created using the 6 characteristics computed for d=1
and φ=0.
The texture similitude between the query image
Q and target image T is computed by the Euclidian
metric.
The algorithm in pseudo-cod for generating the
co-occurrence matrix is:
**function computecoMatrix (double
map[][], int xshift, int yshift, int
height, int width)
begin
int total = 0, gray1, gray2;
Matrix coMatrix(256,256);
for
i = 0; height; do
for j = 0; width do
if
(not((j + xshift >= width) ||
(j + xshift < 0) || (i + yshift
>= height) || (i + yshift < 0)))
then
gray1=map[i][j];
gray2=map[i+yshift][j+xshift]
coMatrix.set(gray1, gray2,
coMatrix[gray1][gray2] + 1);
total ++;
end;
end;
end;
end;
The algorithm that generates the 6 characteristics
(entropy, maximum probability, contrast, inverse
difference moment and correlation) is:
**function analysecoMatrix ()
begin
double sum=0; double miu_x=0,
miu_y=0, tau_x=0, tau_y=0,sum_a1=0,
sum_b1 =0; double ss1=0;
double maxProb,inverseDiff, entropy,
energy, contrast, correlation;
String vectorsString; MaxProb =0;
InverseDiff =0; Energy=0; Contrast=0;
for
i = 0; i < w do
for
j = 0; h do
if
(coMatrix.elementAt(i, j) >
MaxProb) then
maxProb=
coMatrix.elementAt(i,j);
end;
inverseDiff+=
coMatrix.elementAt(i,j)/
(1+Math.abs(i - j));
energy=+= coMatrix.elementAt(i,
j) * coMatrix.elementAt(i, j);
contrast += (i - j) * (i - j) *
coMatrix.elementAt(i, j);
if
(coMatrix.elementAt(i, j)!=0)
then
sum +=
coMatrix.elementAt(i, j)
*log(coMatrix.elementAt(
i, j));
end;
entropy=-sum;
sum_b1 += coMatrix[i, j];
miu_x += i * sum_b1;
sum_a1 += coMatrix[i, j];
miu_y += j * sum_a1;
tau_x += (i - miu_x)*(i - miu_x)
* coMatrix[i, j];
tau_y +=(j - miu_y) * (j -
miu_y) * coMatrix[i, j];
end;
end;
tau_x = Math.sqrt(tau_x);
tau_y = Math.sqrt(tau_y);
for
i = 0; i < w do
for j = 0; h do
sum += (double) Math.abs((i * j
* coMatrix.elementAt(i,j)-
miu_x*miu_y))/
(tau_x* tau_y);
end;
end;
[]
yx
ba
yxd
baPba
σσ
μμ
Φ
,
,
),(),(
),(
,
baPa
d
ab
x Φ
∑∑
=
μ
),(
,
baPb
d
ba
y Φ
∑∑
=
μ
),()(
,
2
baPa
d
ab
xx Φ
∑∑
=
μσ
),()(
,
2
baPb
d
bb
xy Φ
∑∑
=
μσ
IMPROVING THE RESULTS OF THE CONTENT-BASED IMAGE QUERY ON MEDICAL IMAGERY
435
correlation = sum;
vectorsString = maxProb + ";" +
inverseDiff + ";" + entropy + ";" +
energy + ";"+ contrast + ";" +
correlation + ";";
* output vectorsString;
end;
4 EXPERIMENTS
The experiments were performed in the following
conditions.
A database with 960 color images from the field
of the digestive apparatus was created.
A software tool that permits the processing of
each image was created. The software tool executes
the following steps:
1. the transformation of image from RGB
color space to HSV color space and the
quantization to 166 colors
2. the co-occurrence matrices are computed
for each component R,G,B and three
vectors containing the 6 sizes (energy,
entropy, maximum probability, contrast,
inverse difference moment, correlation) are
generated; the matrices are computed for
d=1 and φ=0; in this case the characteristics
vector has 18 values
3. the characteristics vectors generated at
points 1 and 2 are stored in the database
In order to make the query the procedure is:
a query image is chosen
the dissimilitude between the query image
and every target image from the database is
computed; for each two specified criteria
(color histograms with 166 colors and the
vector generated on the basis of the co-
occurrence matrices);
the images are displayed on 2 columns
corresponding to the 2 methods in
ascending order of the computed distance
For each query, the relevant images have been
established. Each of the relevant images has become
in its turn a query image, and the final results for a
query are an average of these individual results.
The experimental results are summarized in table
1. Met 1 represents the query on color feature, Met 2
represents the query on color texture feature using
co-occurrence matrices.
The values in the table represent the number of
relevant images of the first 5 images retrieved for
each query and each of the three methods.
Table 1: The experimental results.
Query
Met 1 Met 2
Polyps 3.3 2.8
Colitis 3.5 1.7
Ulcer 2.8 2.2
Ulcerous
Tumor
2.6 1.5
Esophagitis 3.4 2.5
In figure 1 there is an example of content-based
image query considering the two specified methods.
The first column contains the 5 images retrieved on
color feature and the second contains the retrieved
images on color texture using the co-occurrence
matrices. In the first case there were 4 relevant
images and in the second case 3 relevant images.
307 (query) 307(query)
303 (relevant) 317 (irrelevant)
304 (relevant) 303 (relevant)
328 (relevant) 342 (relevant)
391 (irrelevant) 425 (irrelevant)
Figure 1: The retrieved images using the three specified
methods.
ICINCO 2006 - ROBOTICS AND AUTOMATION
436
5 CONCLUSION
As the values in the table and other experiments
have shown, the best results for medical color
images from the field of digestive apparatus have
constantly been obtained on color feature.
The color textures obtained by the co-occurrence
matrices have poorer results. This is a bad thing
because in the case of colitis and esophagitis, the
doctors have noticed changes in the tissue texture,
such as scratches. These abnormal things could not
be detected too well with the implemented method.
An important observation, which leads to the
improvement of the quality of the content-based
query on this type of images, has to be done.
For each query, at least in half of the cases, the
color texture method based on co-occurrence
matrices has given at least one relevant image for the
query, image that could not be found using the color
feature.
Consequently, it is proposed that the retrieval
system should use two methods: one based on color
feature and the other based on color texture detected
with co-occurrence matrices. It is also proposed that
the display of the results should be done in parallel,
so that the number of relevant images can be
increased from an average of 3 to an average of 4 in
the first 5 retrieved images. For the example in
figure 1, in the case of a union of the images
retrieved using the first and the second method, the
next relevant distinct images will result: 307, 303,
304, 328 and 342. Both feature detection methods
have the same complexity O(width*height), where
width and height are the image dimensions
(Burdescu, 1998). The two computed distances, the
histogram intersection and the Euclidian distance are
equally complex O(m*n) where m is the number of
values in the characteristics vector, and n is the
number of images in the database (Burdescu, 1998).
In addition, a parallel computation of the two
distances can be proposed in order to make the
execution time for a query shorter.
In the future, this study on color images from
other medical fields, for example pathology, where
both color and texture are important, will be
extended. Also, other methods for detecting texture
will be studied.
REFERENCES
Burdescu, D.D., 1998. Analiza complexitatii algoritmilor,
Ed. Albastra. Cluj-Napoca.
Del Bimbo, A., 2001. Visual Information Retrieval,
Morgan Kaufmann Publishers. San Francisco USA.
Muller, H., Michoux, N., Bandon, D., Geissbuhler, A.,
2004. A Review of Content_based Image Retrieval
Systems in Medical Application – Clinical Benefits
and Future Directions. Int J Med Inform. 73(1)
Palm, C., Keysers, D., Lehmann, T., Spitzer, K., 2000.
Gabor Filtering of Complex Hue/Saturation Images
For Color Texture Classification. In: JCIS2000, 5th
Joint Conference on Information Science. Atlantic
City, USA.
Smith, J.R., 1997. Integrated Spatial and Feature Image
Systems: Retrieval, Compression and Analysis, Ph.D.
thesis, Graduate School of Arts and Sciences.
Columbia University.
Stanescu, L., Burdescu, D., 2003. IMTEST-Software
System For The Content-based Visual Retrieval Study.
In: CSCS14, 14th International Conference On
Control Systems And Computer Science. Bucuresti.
Zhang, D., Wong, A., Infrawan, M., Lu, G., 2000.
Content-Based Image Retrieval Using Gabor Texture
Features. In: The First IEEE Pacific-Rim Conference
on Multimedia. Sydney
IMPROVING THE RESULTS OF THE CONTENT-BASED IMAGE QUERY ON MEDICAL IMAGERY
437