points or surfaces correspondences between the two
images, and use this information to extend the corre-
spondences to all other pixels in the floating image
by interpolation. Identification of correspondences
in images acquired by different modalities (such as
PETs) is not always straightforward. Another appro-
ach is that of considering force fields (Pekar et al.,
2006) or some physical model capable of representing
the registration problem in terms of partial differen-
tial equations (i.e. in viscous fluid approach, image is
modeled like a fluid and deformation comes from the
solution of the Navier-Stokes equation (Fookes and
Maeder, 2003)).
In works such as (Ardizzone et al., 2007; Viola
and Wells, 1997), there are no assumptions about ge-
ometric shapes and spatial positions of the structures
in the two images but, instead, it is assumed some re-
lation among their intensity distribution functions.
This is especially convenient when dealing with
medical images. Indeed, in mono-modal registration,
images are acquired with the same type of exam (for
example MR-PD); In this case, anatomical structures
are represented with similar intensity distribution and
comparison of the images to align can be performed
by estimating the mean squared error of the intensities
at the aligned pixels. Instead, in multi-modal registra-
tion, images are acquired by different modalities (for
example MR-PD versus MR-T1). In this case, since
the same anatomical structure is characterized by dif-
ferent intensity distributions in the two images, cri-
teria based on statistical properties of the gray levels
distribution function can be adopted.
The seminal work in (Viola and Wells, 1997) pre-
sents an image alignment technique derived from in-
formation theory. The technique adopts a framework
for estimating the empirical entropy from a set of data
samples. As we will detail in Section 3, entropy is not
easy to differentiate. However, by adopting a Parzen
window density estimation technique to approximate
the gray level distribution of an image, empirical en-
tropy can be made differentiable. Mutual information,
which is a function of entropies, becomes differenti-
able as well. Nonetheless, optimization of the empi-
rical mutual information is computationally intensive
especially when huge sample sets are used to obtain
a reliable estimation of the mutual information. Vi-
ola and Wells proposed then to use stochastic gradient
descent to optimize empirical mutual information. At
each step of their algorithm, a small sample set is used
to align the images. Of course, the gradient they use to
optimize their objective function is only an inaccurate
estimate of the true gradient, but the expected value
of the gradients will tend to approach the true gra-
dient. Nowadays, we know that stochastic gradient is
very effective in numerical optimization of non-linear
functions of a large number of parameters.
Our work is inspired to both the work in (Ardiz-
zone et al., 2007; Viola and Wells, 1997). We have
adopted the geometrical transformation proposed in
(Ardizzone et al., 2007), which is a piecewise affine
transformation. Effects of the estimated local trans-
formations are accumulated in a deformation field
that stores, for each pixel of the registered image,
the position of the corresponding pixel in the floating
image. In (Ardizzone et al., 2007), local transforma-
tions are estimated by maximizing the mutual infor-
mation computed in image blocks. However, mutual
information is highly non-linear and is not differen-
tiable without a parametric model of the gray level
distribution. Hence, in (Ardizzone et al., 2007), a
gradient-free optimization algorithm is used to opti-
mize the mutual information. Such kind of approach
is heavily affected by the initial solution to the op-
timization problem. A more reliable initial solution
is found by applying a coarse-to-fine registration stra-
tegy, which is implemented in (Ardizzone et al., 2007)
by using a pyramid of images.
In (Yang et al., 2015), optimization of the nor-
malized mutual information is carried out by combi-
ning the limited memory Broyden-Fletcher-Goldfarb-
Shanno with boundaries (L-BFGS-B) with cat swarm
optimization (CSO).
In our work, rather than optimizing the mutual in-
formation, we follow the work in (Viola and Wells,
1997) and optimize the empirical mutual information.
In contrast to the work in (Viola and Wells, 1997), we
estimate a sequence of local transformations that are
used to compute the deformation field of the registe-
red image. In contrast to (Ardizzone et al., 2007), we
do not adopt a uniform grid to select the image blocks
to register. Instead, we use an approach that is similar
in spirit to boosting, and sample the image block to
register during the alignment process.
3 EMPIRICAL MUTUAL
INFORMATION
Entropy H(X) is a measure derived by information
theory that allows us to quantify the randomness of a
random variable X, and is defined as follows:
H(X) = −
∑
∀x
p(X = x) · log p(X = x) (1)
In image registration, the joint entropy H(X, Y ) is
used to measure the extent to which two random va-
riables – Y, the gray levels in the registered image,
and X, the gray levels in the reference image – are de-
pendent. Low values of the entropy indicate a strong
ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods
536