Joint Brightness and Tone Stabilization of Capsule Endoscopy Videos
Sibren van Vliet
1
, A ndr´e Sobiecki
1,2
and Alexandru C. Telea
1
1
Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen, The Netherlands
2
ZiuZ Visual Intelligence, Gorredijk, The Netherlands
Keywords:
Capsule Endoscopy, Video Processing, Color Stabilization.
Abstract:
Pill endoscopy cameras generate hours-long videos that need to be manually inspected by medical specialists.
Technical limitations of pill cameras often create large and uninformative color variations between neighboring
frames, which make exploration more difficult. To increase the exploration efficiency, we propose an automatic
method for joint intensity and hue (tone) stabilization that reduces such artifacts. Our method works in real
time, has no free parameters, and is simple to implement. We thoroughly tested our method on several real-
world videos and quantitatively and qualitatively assessed its results and optimal parameter values by both
image quality metrics and user studies. Both types of comparisons str ongly support the effectiveness, ease-of-
use, and added value claims for our new method.
1 INTRODUCTION
Endoscopy of the gastrointestinal tract is since long
used to screen, diagnose, locate, or treat con ditions
such as gastrointestinal bleeding, inflammatory bowel
disease, celiac disease, polyps, and certain cancer ty-
pes (Classen and Phillip, 198 4). This is traditionally
done by using a small camer a at the end of a thin flex-
ible tube inserted into the mouth and guided through
the tract. However, this method does not reach the
many tight bends of the intestines.
3515
4765
6900
8096
32713
2654
2659
2689
2694
2696
Figure 1: Sample frames from endoscopy pill camera
footage illustrating intensity (top row) and hue (bottom row)
problems.
A re cent disruptive technology is the pill camera, a
small capsu le holding a camera and lights (Hale et al.,
2014). After being swallowed, the camera r ecords 8
to 12 hours of video. While chea per, less intrusive,
and better covering the full gastrointestinal tract, pill
cameras have several issues. Figure 1 shows sample
frames from a video recorded by the MiroCam pill
camera (Hale et a l., 2014) at 3 frames per second at
320
2
pixel resolution. Each frame contains a circular
picture surrounded by black b orders, with the frame
number in white. In the top row frames, areas close
to the camera ar e very bright, and far away areas are
completely dark, due to the distance fr om the cap-
sule’s lights. Consider frame 4765. All tissue here has
in reality the same color, but it is not imag ed as su c h.
As the capsule moves onwards from frame 4765, the
moderately lit area in the center of frame 4765 beco-
mes too b right, as the light a pproaches it. Also, the
too dark area top-left in frame 4765 becomes mode-
rately lit due to the camera motion. All in all, the
same tissue a rea is shown in differing intensities over
time. Figure 1(bottom row) sh ows a second type of
problem: All images are of the same tissue type, so
they should have the same color tone (hue). Ye t, as the
camera automatically adjusts its color balance, tone
fluctuates over time. For instance, frame 2654 has a
pink ton e; frame 2659 has a mo re orange tone; frame
2689 appears pin k aga in; frame 2694 appears orange;
and frame 26 96 shifts to pink again.
Medical practitioners viewing endoscopy videos
are being distracted b y sudden tone and/or intensity
fluctuations, which do not contain any information.
Color correction (also called stabilization) methods
are an effective way to alleviate such problems. Ho-
wever, such methods should not introduce any arti-
facts which c ould mislead the ph ysycian. From dis-
cussions with gastroenterologists, we found two key
Vliet, S., Sobiecki, A. and Telea, A.
Joint Brightness and Tone Stabilization of Capsule Endoscopy Videos.
DOI: 10.5220/0006552401010112
In Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2018) - Volume 4: VISAPP, pages
101-112
ISBN: 978-989-758-290-5
Copyright © 2018 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
101
requirements for a stabilization method: (i) the rela-
tive intensity of pixels in the corrected and original
image should be the same (if a pixel a is brighter than
another pixel b in the input ima ge I, then a should also
be brighter than b in the correc te d image I
; and (ii)
hue changes should be small enough so that a tissue
type can be reliably reco gnized in stabilized images.
While many generic color correction algorithms ex-
ist (A nbarjafari, 2014 ; Vig et al., 2016; Purushotha-
man et al., 2 016; Gautam and Tiwari, 2 015; Gonz´alez
et al., 2016; Moradi et al., 2015), few have been deve-
loped with the specific constraints of endoscopy vi-
deos: low resolution, poor lighting of large image
areas, relatively low framera te , rapid variation of the
light direction, real-time operation , and the avoidance
of misleading artifacts in the corrected video. More-
over, such algorithms have various parameters which
influence their results. We ar e not aware of any stu-
dies showing how to find optimal pa rameter values
that smooth out intensity and tone changes but do not
create significant ar tifacts.
In this paper we attack the problem of jo int
intensity-and-tone stabilization in endoscopy videos.
We ana lyze a large set of existing inten sity-and-tone
stabilization techniques vs th e video endoscopy con-
straints (Sec. 2). We select the best candidate, w hich
we next enhance to optimally meet all these con-
straints (Secs. 3, 4) . We evaluate our enhanced al-
gorithm quantitatively (by image similar ity metrics)
and qualitatively (by an extensive user study), on a
set of endoscopy videos showing a wide variation of
imaged tissues and lighting conditions (Sec. 5) . The
evaluation shows that our improved algorithm surpas-
ses the best-so- far algorithm we could find, by per-
forming joint intensity and tone stabilization, being
parameter free, guaranteeing goo d image quality, and
working at the same speed as the pill camera. Finally,
we conclude with directions for future work (Sec. 6 ).
2 RELATED WORK
Color correction has a long history in image and video
processing (Gijsenij et al., 2011). Early methods in-
clude greyscale histogram equalization (GHE ) (Kim
and Yang, 2006) and d ynamic histogram equalization
(DHE) (Sun et al., 2005). Few methods w ere d esig-
ned for, or tested on, endoscopy videos. Hence, be-
sides considering endoscopy-specific methods, it is
useful to study if more generic methods can be used,
with suitable modifications, for our problem. We dis-
cuss below ten methods which target (partially) our
intensity and hue stabilization goal, and are either
well-known in image processing or else are designed
to handle e ndoscopy videos. We assess these met-
hods by rating them on a Likert scale (5=very good,
4=goo d, 3=average, 2=poor, 1=very poor ) against the
following requirements:
Validation measures how well the claims of a
method are defended by results shown in the re-
spective paper. M ethods sh owing stronger vali-
dation are more interesting candidates to adapt to
our endoscopy use-case.
Reproductibility measures how easy is to
(re)implement a method and obtain the results
described in tha t paper. This is essential: without
reproducibility, we cannot validate and/or extend
a given method.
Complexity measures the c omputational com-
plexity of a method for a video of n frames of
w × h pixels. Ideally, we want a (near) linear
complexity me thod in video size so that we can
achieve interactive exploration.
Usability tells h ow easy c an a non-te chnical user
run the method. It is measure d by the number and
intuitiveness of the exposed parameters. A met-
hod with ma ny parameters which are not intuitive
or easy to set is not very usable. This is a critical
requirement for an application that aims to decre-
ase the workload fo r a medical specialist.
(Anbarjafari, 2014) propo sed an iterative n
th
root
and n
th
power co lor equalization for single generic
images. The intensity channel of an image in H SI
space is passed through a non-linear transfer function
f (x) = x
ln(0.5)/ln(
x)
, where x is the image’s mean in-
tensity. Th e operation is repeated until the final imag e
achieves a mean ‘goal’ intensity equal to γ, set typi-
cally to γ = 0.5. The method is good in lighting very
dark image areas and darkening too bright areas. Ho-
wever, it does not address our problem of tone stabi-
lization in vide os.
(Vig et al., 2 016) equalize colors in single images
by increasing the intensity of dark areas, but keeps
bright areas unchanged , akin to overexposing. Not
darkening very bright areas is a limitation in our con-
text. Also, this technique doe s contrast enhancement;
this can create artifacts in endoscopy images which
typically contain only low contrast tissue.
(Purushothaman et al., 2016) propose a differen-
tial histogram equaliza tion method for color images
which increases th e contrast of color images so as to
make the color information more visible to the hu-
man eye. However, as a result, brig htly lit ar e as may
become even brighter, losing potentially valuable in-
formation in endoscopy imagery.
(Gautam and Tiwari, 2015) propose yet another
histogram equalization based method for single ima-
VISAPP 2018 - International Conference on Computer Vision Theory and Applications
102
ges which incre a ses contrast in dimly lit areas while
not brightening properly lit areas. However, too bright
areas are not darkened, which conflicts with our inten-
sity equa lization goal.
(Gonz´alez et al., 2016) propose an improvement
of the earlier luminance Multi-Scale Retinex met-
hod (Funt et al., 1997) that targets hues. The method
is very powerful at brightening d ark areas and thus re-
vealing rich colo r infor mation. However, already well
lit areas may become too bright.
(Moradi et a l., 2015) propose a method specifi-
cally targeted at endoscopy images which increases
contrast and removes noise. However, intensity n or-
malization is not specifically addressed. Also, the
method does not specifically han dle tone stabilization.
(Va zquez-Corral and Bertalmio, 2014) pr opose a
so-called video tone stabilization method w hich equa-
lizes a set of images taken from several cameras or
from a single camera where white balance and/or ex-
posure change over time. The metho d works by ma-
king all input images more similar with respect to a
so-called re ference image. I t works in both hue and
intensity channels, both wh ic h ar e important for our
context. However, an open challen ge is how to auto-
matically select a single reference frame.
(Wang et al., 2014) propose yet another video
tone stabilization, based on smoothing differences be-
tween neighbor frames, much like a n average running
through time, applied on the trajectory of the color
state in color space. A par ameter allows turning the
smoothing off to keep large tone temporal differences
which can encode important information.
(Farbman and Lischinski, 2011) also propose a vi-
deo tone stabilization method for videos, based on
the same reference frame idea as (Vazquez-Corral and
Bertalmio, 2014). While the results of this method are
impressive, a major drawback is that it appears to be
closed-source and patented, which makes its replica-
tion and application ha rd at best.
(Bassiou an d Kotropoulos, 2007) present a single-
image me thod based on histogram equalization. The
method uses multi-level smoothing correct images
in HSI space, using the probab ility density functi-
ons of the satura tion and intensity componen ts while
keeping hue unchanged. The method can equalize in-
tensity very well. However, it does not directly a d-
dress the p roblem o f tone stabilization.
Table 1 summarizes our sur vey. The method
of (Anbarjafari, 2014) (referred next to as Anbar-
jafari’) gets the best overall rating, with the met-
hods of (Vazquez-Corral and Bertalmio, 2014) and
(Bassiou and Kotropoulos, 2007) coming next. As
such, w e considered extending these three methods
for our goal. However, rep lica ting the algorithms in
(Va zquez-Corral and Bertalmio, 2 014) and (Bassiou
and Kotropoulos, 2007) did not succeed in producing
the same results as in the respective papers, as several
crucial details were omitted in the papers. As such,
we settled with extending the method of (Anbarjafari,
2014) to suit our goals, as described next.
3 PROPOSED METHOD
As explained in Sec. 2, the Anbarjafari method brig-
htens dark areas and darkens bright areas in single
images. However, we want to equalize intensity and
smooth out hue fluctuations over time. For this, we
extend th e A nbarjafari method as follows.
We smooth out fluc tuations in an image channel
over time by detecting large variances between the
channel’s histograms (computed over all input image
pixels) of all frames within a time window, and next
changin g the pixel values so that th e histogram is
suitably compressed. By compressing the h isto gram,
differences between pixel values are mad e smaller.
When applied to all frames within a time window,
the compression rate should progress gradually, in or-
der to smoothen out sudden differences. This techni-
que can be applied to any image c hannel in any color
space, e.g., RGB or HSI. As discussed next in Sec. 5,
we will apply our technique on both the intensity and
saturation chan nels of a HSI-space image, and com-
bine it with the original Anbarjafari method, which
we will also apply on b oth above channe ls. The hue
channel is left untouched, as changing it easily yields
undesired artifacts.
t
t−1
t−2
t+2
t+1
saturation
number of
pixels
H
t−2
H
t−1
H
t
H
t+1
H
t+2
Figure 2: Five successive video frames in which tone fluc-
tuation occurs (from orange to pink). Below each frame,
a histogram of saturation values is shown. Summing these
histograms results in a cumulative histogram.
The histogram compression works as follows.
Consider the current frame t in the video and a time-
window of 2k+1 frames centered at t. Figure 2 shows
Joint Brightness and Tone Stabilization of Capsule Endoscopy Videos
103
Table 1: Brightness and/or tone stabilization methods reviewed in this work.
Method Validation Reproductibility Complexity Usability
(Anbarjafari, 2014) (4) Very good results
for two test-sets
(4) MATLAB code
provided
(4) O (whnx) with x
10
(4) A single intuitive
parameter to set (goal
mean).
(Vig et al., 2016) (2) Good results for
two test-sets, but
only for brightening
dark areas
(2) No code provi-
ded, reproducing is
difficult
(4) O(whn) (2) Four not very in-
tuitive parameters
(Purushothaman
et al., 2016)
(3) Good results on
two test-sets, but
mainly for brighte-
ning dark areas
(3) No code provi-
ded, but implementa-
tion clear and easy to
reproduce)
(2) O((wh)
2
nx) with
x 128)
(3) A single parame-
ter which is easy to
understand
(Gautam and Tiwari,
2015)
(3) Good results on
five test-sets, but dark
areas can become un-
desirably darker
(2) No code provi-
ded, reproducing is
moderately difficult
(4) O(whn) (5) No parameters to
be set
(Gonz´alez et al.,
2016)
(3) Good results on
six test-sets, but all
only show brighte-
ning dark areas
(3) No code provi-
ded, reproducing is
moderately difficult
(4) O(whNn) where
N is t he constant size
of a small neighbor-
hood around each
pixel
(3) Three parameters,
of which two are not
directly intuitive
(Moradi et al., 2015) (4) Good results on
four test-sets
(2) No code provi-
ded, reproducing is
difficult due to vague
description
(4) O(whn) (2) Two parameters
which do not have an
intuitive meaning
(Vazquez-Corral and
Bertalmio, 2014)
(5) Very good results
on 24 test-sets.
(3) No code provi-
ded, algorithm ex-
planation leaves out
some important de-
tails
(4) O(whn) (authors
mention that real-
time operation is
feasible)
(3) Two parameters
which do not have an
intuitive meaning
(Wang et al., 2014) (5) Good results on
seven test-sets
(2) No code pro-
vided, reproducing
seems difficult
(4) O(whn) (1) Five parameters
which do not have an
intuitive meaning
(Farbman and Lis-
chinski, 2011)
(4) Good results on
five test-sets
(1) No code provi-
ded, algorithm paten-
ted by authors
(4) O(whn) (4) A single parame-
ter with clear usage
instructions
(Bassiou and Kotro-
poulos, 2007)
(4) Good results on
five test-sets
(3) Third party code
used in the paper pro-
duces undesired re-
sults
(4) O(whn) (4) Parameter(s) of
probability smoo-
thing step not
explained
this for a window of 5 frames. Below each frame t, the
histogram H
t
of its saturation channel is shown (in the
following, we use saturation as example, though our
technique also works o n the intensity channel, as al-
ready stated). In the frames, we observe an undesi-
red tone shift from orange to pink . We also observe
a distinct shap e ch a nge of the saturation histograms.
Hence, the shape change can be used as an indicator
of the amount of color variation. For this, we need
a way to measure the amount of shape change. To
do this, we first compute a cumulative histogram H
C
whose bins are given by
H
C
x
=
k
i=k
H
t+i
x
where H
t+i
x
is th e bin for saturation value x of the his-
togram for frame t + i. As our pill ca mera images are
8 bit per channel, we use histograms of 25 5 bins. We
next compute the mean µ and variance σ
2
of H
C
and
use the latter as a measure of the shape change of all
histograms with in the time window. A small variance
indicates a small tone fluctuation, mea ning tha t very
little histogram compression is ne e ded. A large vari-
ance indicates a large tone fluctua tion, meaning that
more co mpression is needed to sm ooth out the fluctu-
ation.
We c an now procee d with the actual histogram
compression (see also Fig. 3). We start with the
computed mean µ and variance σ
2
of the cumula-
tive histogram H
C
(Fig. 3a). Sec ondly, we eliminate
the mean by subtrac ting µ fro m the saturation s of all
pixels (Fig. 3b). Thirdly, we compress the histogram
by dividing the saturations by aσ
2
(Fig. 3c). Here,
a [1/σ
2
,1] controls the comp ression amount: For
VISAPP 2018 - International Conference on Computer Vision Theory and Applications
104
number of
pixels
number of
pixels
number of
pixels
number of
pixels
saturation saturation
saturation saturation
μ
σ
2
σ
2
shift by μ
division by
ασ
2
shift by c
a) b)
c) d)
Figure 3: Histogram compression. a) The histogram’s mean
µ and variance σ
2
are computed. b) The histogram is shifted
µ bins to the left so that its mean is zero. c) The histogram
is compressed by dividing all saturation values by aσ
2
. d)
The histogram is shifted right by c bins.
a = 1, all saturations are divided by σ
2
, so tha t the
histogram is compressed by an amount proportional
to the variance. For a = 1/σ
2
, no compression occurs.
After this step, a part of the histogram will correspond
to negative saturation values, which of course make
no sense. To fix this, it seems natural to shift the histo -
gram back with the same value µ we used in step one.
However, we verified that doing so produces unnatu-
ral looking tones pixel satur ations appear higher or
lower than desired. To solve this issue, w e use a shift
value c [0,1] (Fig. 3), as follows. If c = 0, the his-
togram is shifted so that its leftmost bin corresponds
to saturation value 0; if c = 1, the histogram is shif-
ted so that its rightmost bin corresponds to saturation
value 255. Intermediate values for c produce linearly
interpolated shifts betwee n these two extremes.
Several comments are due. The proposed histo-
gram compression extends the relative p ixel intensity
constraint mentioned in Sec. 1 to pixel saturations. In-
deed, the applied transformations are linear, and the
shape of the histogram is preserved. Separately, while
the histogram compre ssion is c omputed on the cu-
mulative time-window histogram, the individual pixel
intensity or saturation manipulations are done sepa-
rately on each frame. This ensures that these mani-
pulations will vary smoothly in time, as the cumula-
tive histogram has the effect of a smoothing sliding-
window time filter.
4 IMPLEMENTATION
We implemented our method in single-threaded C++
under Lin ux and Windows. Our tool covers both the
original Anbarjafari method and our new method, and
allows one to apply them separately, or in sequence,
on the saturation and/or intensity channe ls. The tool
loads a pill-camera video in MPEG format, allows
changin g the para meters k, a, and c of our algorithm
and the mean goal γ of Anba rjafari, plays the origi-
nal and stabilized videos side-by-side, and saves the
stabilized video as an MPEG file (Fig. 4).
Figure 4: Software tool for color stabilizati on and video
exploration.
For a time window of 41 frames (k = 20), compu-
ting histog rams takes ab out 3 second s on a 2.3 GHz
laptop with 4GB RAM. The video stabilization runs
smoothly at 3 frames/second, which is the recording
speed of th e pill-camera video (Sec. 1 ). The compu-
tational complexity is O(whn) for processing a video
of n frames each of w × h pixels, i.e. linear in input
size. After computing the h istograms, changing all
parameters is, however, instantaneous. This allows a
physician to focus on an image of interest and explore
it to e.g. brighten or darken its various areas in real
time.
5 EVALUATION
As already outlined, only very few evaluations of co-
lor stabilization for endoscopy videos are present in
the literature . Moreover, these take the form of pre-
senting the stabilized images, but come with limited
or even no actual evaluation of the quality thereof. We
improve upon this by presen ting next both a qualita-
tive user-study based evaluation (Sec. 5.1) and a quan-
titative m etrics-based evaluation (Sec. 5.2).
Joint Brightness and Tone Stabilization of Capsule Endoscopy Videos
105
5.1 Qualitative Evaluation
The nature of color stabilization is q uite application-
specific and possibly even user-specific. It is not easy
to fo rmally measure how much ‘better’ a given stabi-
lized image is than another one. Also, note that we
have no g round truth, in the sense of an ‘optimally’
stabilized image. As such , it is definitely important
to compare different stabilization methods or para-
meter settings by me ans of user studies. To this end,
we p erformed a survey in which users were asked to
rank images produced by different stabilization met-
hods and para meter values, as de scribed next.
5.1.1 Evaluation Materials
We acquired several end oscopy videos, each 8
hours long, recorded using the MiroCam pill ca-
mera (Medivators, 2017), fr om medic a l specialists at
a major regional hospital in the Neth erlands. The vi-
deos were pre-screened by the specialists for suita-
bility that is, containing no major artifacts due to
camera malfunction, and containing a wide range of
image intensities and tone s that would pose difficul-
ties in manual analysis and for which stabilization
would b e of added value. Since organizing a study
where multiple users examine thousands of images
such as present in our videos was infe a sible, we first
manually grouped the available vid eo frames into five
representative classes, depending on th e c olor and in-
tensity distribution, as follows:
Dark area directly b ordering a very bright area
(Fig. 1, frame 3515);
Dark area separated from a very bright area by
moderate illumination (Fig. 1, frame 4765);
Dark area dire c tly surro unded by bright areas on
all sides (Fig. 1, frame 6900);
Dark area surrounded by bright areas on a ll si-
des, with a moderate illu mination transition zone
(Fig. 1, frame 8096);
Dark area bordering a bright area of varied color
and structure (Fig. 1, frame 8096).
Next, we randomly selected a few imag e s in each
class for the qualitative study. For each image, we ran
several combinations of the Anbarjafari method (A)
and our proposed method (P) described in Sec. 3, app-
lied on the intensity (I) and saturation channels (S), as
described below. Note tha t only the first combination
(A used solely on I) is covered by existing literature,
all other com binations being novel.
1. A I: A applied to I only;
2. A (I,S): A applied to both I and S channels;
3. P I: P applied to I only;
4. P (I,S): P applied to both I and S channels;
5. (A,P) I: A applied to I, followed by applying P
to the resulting I;
6. (A,P) (I,S): A applied to I, followed by ap-
plying P to the resulting I; and A applied to S,
followed by applying P to the resulting S.
For each combination, we ran the involved met-
hods for several p a rameter values. Spec ifically, we
set the mean go al γ in Anbarjafari to values in
{0.6, 0.7,0.8, 0.9,1%}; and the compression a of
our method to values in {0.02,0.04,0.08,0.16, 0.32}.
The la tter set of values is chosen as such since c is
used as a den ominator (Sec. 3), so it affects a function
of hyperbolic type 1/x. For the time window size and
correction, we used the fixed values of k = 20 fram es
and c = 0.4 respectively, which have been determined
by us empirically by testing stabilization on several
videos.
Figure 5 shows the stab iliza tion re sults o btained
for frame 8096 (Fig. 1) for several method and
parameter com binations. Du e to space limitations,
we cannot show all the tested r e sults which entail
several hundreds of image s. The rows in Fig. 1
indicate method combination s; columns indicate
parameter-value combinatio ns. Below we discuss the
findings we observed ourselves that is, before using
these results in the actual su rvey, which is descr ibed
next in Sec. 5. 1.2.
A I: We see that, as the parameter γ incr ea-
ses, dark areas are brightened, and colors and details
get more easily visible to the human eye. For all ve
frames in the top row in Fig. 5, we found that γ = 0.7
yields the greatest intensity increase with acceptable
loss of d e ta ils. When γ > 0.7, images become too
noisy. Moreover, in endoscopy im ages, detail such as
edges is mainly defined by intensity and not hue, so
too much brightening erases such detail.
A (I,S): Similar to brightening dark areas, in-
creasing γ now makes the color of low-saturation
(gray-like) areas more vivid. Since low saturation
areas match very well dark areas in the gastrointesti-
nal tra ct, this method a dditionally boosts dark areas
by making them not only brighter, but also more
colorful. As for the A I method, we found an
optimal value around γ = 0.7. Larger γ values affect
color tones too much, which can create undesirable
artifacts, like renderin g a norma l tissue too red, thus
suggesting an internal bleeding.
P I: Similar to A I, this method makes dark
areas become brighter as c increases. However,
VISAPP 2018 - International Conference on Computer Vision Theory and Applications
106
γ=0.6, a=0.02 γ=0.7, a=0.04 γ=0.8, a=0.08 γ=0.9, a=0.16 γ=1, a=0.32
A IA (I,S)P IP (I,S)(A,P) (I,S) (A,P) I
Figure 5: Frame 8096 (shown in Fig. 1) processed with various combinations of algorithms and parameters.
details in dark areas ar e lost ea rlier than in the A
I case. We also note that this method yields overall
brighter images than A I (compare rows 1 and 3
in Fig. 5). However, detail shading is sligh tly less
well visible. This is expected, since the goal of our
method (P) is not to enhance single image s, but to
smooth sudden changes in video sequences. Since
P I essentially compresses the intensity channel
histogram, edges captured by intensity differences
may beco me less visible.
P (I,S): In addition to the previously discussed
Joint Brightness and Tone Stabilization of Capsule Endoscopy Videos
107
effect on the intensity levels, this method makes
colors more saturated as c increases. Inte restingly,
saturation is not increased as aggressively as in A
(I,S). Again, this is beca use our algorithm does not
try to increase saturatio n to a ce rtain predefined level
γ, but aims to smooth out sudden differences in the
saturation histograms of neighboring frames. This
is w hy, as we will discuss later, our metho d is better
for stabilizing satu ration in vide os rather than single
images.
(A,P) I: We observe that the results of this method
are nearly identical to those of P I. We explain
this by the fact that P compresses th e histogra m after
A enhanced the intensity. This largely undoes the
enhancements that the A method m ade. As a result,
the output images suffer from the same problems we
observed whe n using P I, namely lo ss of deta ils
due to the histogram compre ssion.
(A,P) (I,S): We observe that the results of this
method are very similar to those of A (I,P). Ho-
wever, the saturation is less d ramatically increased.
We explain this by the fact that, after the A method
has ma de the satura tion very high, the P method
compresses the saturation histogram, thus making the
color vib rance less extreme.
From all above, we draw the following preliminar y
qualitative conclusions. The Anbarjafari method
(A) with a mean goal value around γ = 0.7 shows
itself to be best for intensity stabilization of single
images. However, it is not effective in stabilizing
tone fluctu ations when applied to saturation (A
S), it may actually enhance tone fluctuations. In
contrast, o ur method (P) is effective in smoothing
tone fluctuations, but less effective in stabilizing
intensity.
5.1.2 User Survey
We refined the qualitative observations presented
above, which a re drawn from ou r own study of the
computed results, by conducting an online survey that
involved a wide group of people, thereby rea lizing a
more represen ta tive qualitative evaluation. The sur-
vey material con sisted of five pages, one page for an
image in each image class d efined in Sec. 5. 1.1. Each
page contained all stabilized images for the respective
input image, laid out identically to Fig. 5. We also in-
cluded an additional column representing the actual
input image. However, the c olumn was not marked
as such, so the particip ants could not know which is
the in put and which the outputs of the stabilization.
For each image row, the participant was asked to pick
the image that they thoug ht was the b est in terms of
enhancing the info rmation in th e brighter and darker
areas of the image and without introducing too much
noise or losing information. This answers the ques-
tion ‘which parameter values are best for a given me t-
hod combination? ’. Next, at the end of each page,
participants were asked to r eview the six images they
picked as best for the six rows and pick the best one
among these. This answers the question ’which met-
hod combination d elivers the b e st results, given that
all methods are run with their optimal parameter va-
lues?’.
The survey was cond ucted using Goo gle Forms.
Participants were encouraged to look at each row of
images for roughly 10 seconds, so tha t the survey
could be finished in about 5 minutes. However, the
participants could spe nd m ore time if desired, and
were also allowed to go back to previous pages to
review or change their answers. Note that the parti-
cipants did not see any annota tions on the survey pa-
ges such as the method names and parameter values
in Fig. 5. Eighteen people participated in the survey.
All are specialists in image processing and computer
vision, and are well familiar with endoscopy videos
and their issues. The participants were aged betwe e n
20 and 50, the majority being male.
Table 2 presents the aggregated results of the sur-
vey. Rows indicate method comb inations, and co-
lumns indicate parameter values, just like in Fig. 5.
Each cell contains two n umbers, sepa rated by a slash.
The first number indicates how many times an image
generated by the respective method and parameter-
values combination was chosen best in a row of ima-
ges thus, best for all tested parameter values. The
second number (in bold) indicates how many times an
image was chosen as best for an entire survey page
thus, best for all method and parameter values combi-
nations tested.
We get several insig hts from these fig ures. First,
we see that the parameter values γ = 0.6,a = 0.02 and
γ = 0.7,a = 0.04 get most votes, the former being
seen best when the combined method (A,P) is used,
and the latter when the individual Anbarjafari (A)
method is used, respectively. These are thus g ood
values for a wide set of images a nd a wide set of
users. Note that the setting γ = 0.7 matches what we
found ourselves in our preliminary qualitative evalua-
tion (Sec. 5.1.1). Hence, we use these values as pre-
sets in our tool (Sec. 4). Secondly, w e see that very
high parameter values are never p referred. This mat-
ches our own findings that such values yield too much
disappearance of relevant details (Sec . 5.1.1). Thirdly,
we see that the Anbarjafari method applied to satura-
tion (A S) with γ = 0.7,a = 0.04 has the highest
number of overall best results. This matc hes our ear-
VISAPP 2018 - International Conference on Computer Vision Theory and Applications
108
lier observations that this method is indeed very good
for stabilizing single imag es. More over, this is an in-
teresting novel result, as the Anbarjafari method has
been originally proposed to work on intensity only.
Separately, as explained earlier, this me thod is not ai-
med at stabilizing tone fluctuations in video sequen-
ces something that our survey could not capture, as
participants we re shown only individual frames. Fi-
nally, we see that the combination (A,P) (I,S) with
γ = 0.6, a = 0.02 scores the be st image-in-a-row. As
such, this method combination is arguably goo d for
video color stabilization, albeit it scores lower for sin-
gle frame stabilization.
2654
2659
2689
2694
2696
2709
original frame
P S
(A,P) (I,S)
Figure 6: Selected frames from a video fragment demon-
strating how the combination (A,P) (I,S) successfully
stabilizes both intensity and tone in image sequences.
5.1.3 Video Intensity and Tone Stabilization
Among the studied methods, we found the original
Anbarjafari method to be the best for intensity stabi-
lization in single images. Yet, this method does not
handle tone stabilization in video seque nces. Consi-
der Fig. 6, left column, which shows a selection of
frames from a video of a bleeding gastrointestinal tis-
sue. The first five frames are identical to those in
Fig. 1, bottom row. As also outlined in Sec. 1, a cer-
tain amount of tone fluctuation is visible even in this
short sequence.
We next show how the combination of Anbarja-
fari and our method solves this problem. First, as a
baseline, we apply only our method to the saturation
channel (P S), see Fig. 6 mid dle column, with a
time window k = 40, compression a = 0.04, and cor-
rection c = 0.4, in line with the optimal values found
for our method (P) in the survey. We see how the sud-
den tone changes have now b een smoothed out all
frames in Fig. 6, middle column, have a pinkish tone.
The tone stabilization is even more evident when wa-
tching the actual video. However, the intensity is not
stabilized. To solve this, we apply the c ombination of
Anbarjafari and our method to both the intensity and
saturation channels ((A,P) (I,S)), see Fig.6 , right
column. I n addition to the previous parame ters, we
use a mean goal γ = 0.7, shown to be optimal in our
survey (Sec. 5.1.2). As visible, especially for frames
2659 and 2709, the intensity is more uniform now; in
addition, the tone fluctuation s are low, thanks to our
method. All in all, we conclude that the combination
(A,P) (I,S) is indeed a good way to stab ilize both
intensity and tone fluctuations.
5.2 Quantitative Evaluation
The qualitative evaluation of the various comb inati-
ons of methods and parameters in Sec. 5.1 has e mpi-
rically found good parameter values that yield images
perceived by users as stabilized. However, as explai-
ned already in Sec. 1, stab iliza tion sh ould not create
artifacts which could lead to misinterpretation of the
imaged tissue structures. Formally put, stabilization
can b e thought of a function Φ(γ,a,I
input
) = I
stabilized
from images to images which aims to maxim iz e both
the temporal stability of intensity and tones and in
the same time minimize th e perceptual difference be-
tween the original and stabilized im a ges. The beha-
vior of this func tion is driven by our method’s free
parameters, of which th e most important are the goal
mean γ (for Anbarjafari) and th e compression a (f or
our histogra m-based compression). To stu dy how Φ
affects image similarity, we need a way to compare
I
input
and I
stabilized
. For this, similar ly to (Moradi
et al., 2015), we use the peak-to-signal noise ratio
(PSNR) and structu ral similarity index (SSIM) me-
trics, well k nown in image processing. For 8-bit-per-
Joint Brightness and Tone Stabilization of Capsule Endoscopy Videos
109
Table 2: Image-quality survey results accumulated for all five tested endoscopy image classes.
original γ = 0.6,a = 0.02 γ = 0.7,a = 0.04 γ = 0.8,a = 0.08 γ = 0.9,a = 0.16 γ = 1,a = 0.32
A I 6 / 6 18 / 7 40 / 12 24 / 3 2 / 0 0 / 0
A (I,S) 5 / 5 14 / 5 55 / 17 16 / 3 0 / 0 0 / 0
P I 11 / 7 41 / 4 28 / 6 7 / 0 3 / 2 0 / 0
P (I,S) 9 / 7 41 / 7 30 / 2 8 / 1 2 / 0 0 / 0
(A,P) I 8 / 7 50 / 6 21 / 3 11 / 1 0 / 0 0 / 0
(A,P) (I,S) 8 / 7 57 / 7 23 / 6 2 / 0 0 / 0 0 / 0
channel im ages like ours, typical PSNR values for
good similarity are between 3 0 and 50 d B, where hig-
her is better (Huynh-Thu and Ghanbari, 2008). SSIM
ranges between -1 and 1 where higher is better (1 de-
notes identical images) (Wang et al., 2004).
Figure 7 shows the plots of the PSNR an d SSIM
similarity metrics between th e original endoscopy
images I
input
and the stabilized on e s I
stabilized
as
function of the key parameters γ (for Anbarjafari) and
a (for our method), for the set of im a ges used in our
qualitative analysis (see Sec. 5.1.1 ), and for fixed va-
lues of k = 20 and c = 0.4. As methods, we consi-
dered Anbar jafari applied on intensity (A I) and
separately on saturation (A S), an d our method ap -
plied on intensity (P I) and separately o n saturation
(P S). Fro m these plots we make the following ob-
servations.
Quality: The A I metho d peaks for both PSNR
and SSIM at γ very close to 0.5, i.e., the mean in-
tensity of I
input
. T his is expec te d: If the goal mean
equals the or iginal mean, no correction needs to be
done, as I
stabilized
is iden tica l to I
input
. In contrast, A
S peaks at values around γ = 0.7. This matches
very well the optimal γ values found in our qualitative
study (Sec. 5.1). Hence, the γ values found best by
users to explore the images is also the one where the
least chan ges are done by stabilization. Moreover, the
maximal PSNR values (over 5 0 dB) and SSIM values
(close to 1) indicate that our stabilization looses very
little from the original image feature s. Separately, we
see that both SSIM and PSNR have very good values
for a close to 0.04, which was found earlier in our
qualitative studies to yield a very good tone stabili-
zation (Secs. 5.1.2 and 5.1.3). This confirms that o ur
preset a = 0.04 is indeed a good one.
Intensity vs Saturation Stabilization: The plots for
A I a nd A S are very similar in shape and mag-
nitude. This matches our earlier qualitative finding
that the Anbarjafari metho d can be used to stabilize
both intensity and saturation (Sec . 5.1). In contrast,
the plot for P S is always larger than P I. This
means that our proposed method P is better at stabili-
zing saturations (tones) than intensities, which again
correlates with our qualitative findings.
Parameter Sensitivity: The plots for A I and A
S have overall quite high der ivatives close to the
maximum, while the plots for P I and P S show
a much more stable, and actually monotonic, varia-
tion. This tells that setting the compression a for the
P method is less sensitive than setting the m e an goal
γ for the A method. However, this does not mean that
tuning γ is sensitive: As explained above, we obtain a
very good image quality for values around γ = 0.5 for
the method A I, and respectively for values around
γ = 0.7 for the method A P. All in all, we conclud e
that parameter setting is not a sensitive process.
Consistency and Smoothness: Across the five fra-
mes, plots for the same method are similar in shape,
position, and peak location. This is desirable, as it
tells that optimal parameter values ar e consistent for
quite different inp uts. Considering the earlier para-
meter sensitivity analysis, the paramete r presets pro-
posed in Sec. 5.1 can be indeed used as default va-
lues for entire videos. This makes our method basi-
cally parameter-free. Secondly, the plots are smooth,
with no jitters, which tells that small parameter-value
changes do not massively affect the image similarity.
Hence, our method is robust vs parameter changing,
if users really need to change the pre set values.
6 CONCLUSIONS
We propose a new method for jointly stabilizing in-
tensity and tone (hue) in endosco py videos. For this,
we adapt the intensity channel by brightening dark
areas and darkening too brig ht areas, and also mini-
mize tone fluctuations between temporally close fra-
mes. Our method is simple to im plement, works at the
frame-r a te of the pill camera, has n o free parameters
that users should set, delivers consistent results for a
wide variety of endoscopy videos, and alters on ly mi-
nimally the input images, thereby reducing the risk of
creating misleading artifacts. Summarizing, our main
contributions are:
Survey: To our knowledge, our work is the first in
which a large set (10) of imaging m ethods was stu-
died for suitability for the spec ific case of endoscopy
VISAPP 2018 - International Conference on Computer Vision Theory and Applications
110
PSNR, frame 3515
0
20
40
60
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
0
20
40
0
20
40
0
20
40
0
0.5
1
0
0.5
1
0
0.5
1
0
0.5
1
γ for A I γ for A S
a for P I a for P S
PSNR, frame 4765
γ for A I γ for A S
a for P I a for P S
60
PSNR, frame 6900
γ for A I γ for A S
a for P I a for P S
60
PSNR, frame 8096
γ for A I γ for A S
a for P I a for P S
60
SSIM, frame 3515
γ for A I γ for A S
a for P I a for P S
SSIM, frame 4765
γ for A I γ for A S
a for P I a for P S
SSIM, frame 6900
γ for A I γ for A S
a for P I a for P S
PSNR, frame 8096
γ for A I γ for A S
a for P I a for P S
Figure 7: PSNR and SSIM image-similarity plots for several frames from Fig. 1 processed with Anbarjafari and our method.
The horizontal axis denotes either the goal mean γ or the compression factor a depending on the graph type.
video stabilization, from a practical perspective inclu-
ding validation, reproductibility, computational com-
plexity, and ease of use.
Joint Stabilization: Wh ile several methods perform
intensity stabilization , we show how both intensity
and tone can be jointly stabilized. For the former, we
use an existing method ( Anbarjafari, 2014). For the
latter, we propose a simple but efficient method based
on histogram co mpression.
Validation: Compared to existing work, we per-
form a significantly more thorough validation inclu-
ding testing several method types applied on inten-
sity and/or saturation; a detailed user study for finding
good method combinations and paramete r values; and
a qua ntitative evaluation that shows how to find para-
meter presets which match the values sugg ested by
our qualitative study and also minimally affect image
quality. This makes our method fully parameter-free
and guarantees its output quality. Our method can be
easily and efficiently implemented.
Limitations: Our search of the algorithm-and-
parameter space is, of cou rse, not exhaustive. More
methods and parameter values exist which cou ld be
assessed. It is also fair to say that our current evalua-
tion already surpasses what one typically encounters
in endoscopy video stabilization papers. Separa te ly,
one can argue that the differences between the orig i-
nal and stabilized images are quite small, so the entire
stabilization p rocess is not worthwhile. Yet, when wa-
tching the actua l stabilized videos, these differences
are well visible, and show that the stabilized material
is easier to follow.
Several future work directions exist. More exten-
sive evaluations can be made to compa re with additi-
onal color stabilization methods, use more videos, or
a more users. Machine lea rning techniques could be
used to perform a more fine-gra ined stabilization ba-
sed on images or image regions labeled by users as
requirin g brightening.
ACKNOWLEDGEMENTS
We thank Medisch Centrum Leeuwarden for provi-
ding us the capsule endoscopy videos.
REFERENCES
Anbarjafari, G. (2014). HSI based colour image equa-
lization using iterative n
th
root and n
th
power.
arXiv:1501.00108 [cs.CV] .
Bassiou, N. and Kotropoulos, C. (2007). Color image his-
togram equalization by absolute discounting back-off.
CVIU, 107(1):108–122.
Classen, M. and Phillip, J. (1984). Electronic endoscopy of
the gastrointestinal tract. Endoscopy, 16(1):16–19.
Farbman, Z. and Lischinski, D. (2011). Tonal stabilization
of video. ACM Trans Graph, 30(4):89–101.
Funt, B., Barnard, K., Brockington, M., and Cardei, V.
(1997). Luminance-based multi-scale retinex. In
Proc. 8
th
Congress of the International Colour Asso-
ciation.
Gautam, C. and Tiwari, N. (2015). Efficient color
image contrast enhancement using range limited bi-
histogram equalization with adaptive gamma cor-
rection. In Proc. IEEE ICIC.
Joint Brightness and Tone Stabilization of Capsule Endoscopy Videos
111
Gijsenij, A., Gevers, T., and van der Weijer, J. (2011). Com-
putational color constancy: Survey and experiments.
IEEE Trans Imag Process, 20(9):2475–2489.
Gonz´alez, D. M., Ponomaryov, V., and Kravchenko, V.
(2016). Chromaticity improvement in images wi th
poor l ighting using the multiscale-retinex MSR algo-
rithm. In Proc. IEEE MSSW.
Hale, M., Sidhu, R., and McAlindon, M. (2014). Cap-
sule endoscopy: Current practice and future directi-
ons. World J Gastroenterol, 20(24):7752–7759.
Huynh-Thu, Q. and Ghanbari, M. (2008). Scope of validity
of PSNR in image/video quality assessment. Electron
Lett, 44(13).
Kim, T. and Yang, H. (2006). A multidimensional his-
togram equalization by tting an isotropic Gaussian
mixture to a uniform distribution. In Proc. IEEE ICIP,
pages 2865–2868.
Medivators (2017). MiroCam capsule endoscope.
www.medivators.com/products/gi-physician-
products/mirocam-capsule-endoscope.
Moradi, M., Falahati, A., Shahbahrami, A., and Zare-
Hassanpour, R. (2015). Improving visual quality
in wireless capsule endoscopy images with contrast-
limited adaptive histogram equalization. I n Proc.
IPRIA. IEEE.
Purushothaman, J., Kamiyama, M., and Taguchi, A. (2016).
Color image enhancement based on hue differential
histogram equalization. In Proc. ISPACS, pages 322–
331. IEE E.
Sun, C. C. , Ruan, S. J., Shie, M. C., and T, W. P. (2005). Dy-
namic contrast enhancement based on histogram spe-
cification. IEEE Trans Consum Electr, 51(4):1300–
1305.
Vazquez-Corral, J. and Bertalmio, M. (2014). Color stabili-
zation along time and across shots of t he same scene,
for one or several cameras of unknown specifications.
IEEE Trans Imag Process, 23(10).
Vig, N., Budhiraja, S., and Singh, J. (2016). Hue preser-
ving color image enhancement using guided lter ba-
sed sub image histogram equalization. In Proc. 9
th
Intl. Conf. on Contemporary Computing (IC3).
Wang, Y., Tao, D., Li, X., Song, M., Bu, J., and Tan,
P. (2014). Video tonal stabilization via color states
smoothing. IEEE Trans Image Process, 23(11).
Wang, Z., Bovik, A., Sheikh, H., and Simoncelli, E.
(2004). I mage quality assessment: from error visibi-
lity to structural similarity. IEEE Trans Imag Process,
13(4):600–612.
VISAPP 2018 - International Conference on Computer Vision Theory and Applications
112