2 RELATED WORK
Multi-scale representations of images using the no-
tion of low-pass filtering and sub-sampling was first
proposed in (Burt and Adelson, 1983) and has been
exploited for about three decades now. One of the
great advantages in the use of such a multi-scale ap-
proach is that it allows recognition of objects indepen-
dently of the scale in which it appears in the scene.
Much research work has been done in multi-scale ob-
ject recognition since then, from which we can par-
ticularly cite SIFT, first introduced in (Lowe, 1999)
and later extended and improved in (Lowe, 2004), and
also SURF (Bay et al., 2008).
Both SIFT and SURF present processes to gener-
ate multi-scale image pyramids, followed by the ex-
traction of robust local descriptors from a model ob-
ject image, which can be subsequently matched to the
descriptors of a scene to find the object position and
affine orientation regardless of changes in scale and
some degree of changes in illumination.
In SURF, a multi-scale image pyramid is obtained
by first computing an integral image (Viola and Jones,
2001) and then processing it with coarse approxima-
tions of Gaussian derivative filters in different scales
and orientations. As the processing effort to filter in-
tegral images is independent of the scale of the filter,
overall computational cost is reduced when compared
to traditional filter-and-subsample pyramid building
schemes. The Hessian matrix is then applied locally
at the pyramid in order to detect stable interest points,
usually corresponding to blobs and corners, around
which image information is encoded as a descriptor
that is invariant to rotation.
The SIFT framework also builds a scale space, but
using the traditional Gaussian filter-and-subsample
approach at multiple scales. After filtering and sub-
sampling, differences between adjacent levels of the
Gaussian pyramid are computed in order to obtain
a difference-of-Gaussian pyramid, which is a finer
representation of derivatives when compared to the
coarser method used in SURF. The sub-sampling pro-
cess is used to avoid the use of unnecessarily large
Gaussian filters, whose sizes grow with scale. The
set of pyramid levels in which the scale of filtering is
doubled is conventionally named an octave (Crowley
et al., 2002).
The difference-of-Gaussian constitutes an approx-
imation of the Laplacian of Gaussian (Burt and Adel-
son, 1983) and the main purpose of using it resides
in its reduced computational cost when compared
to the direct computation of the Laplacian of Gaus-
sian (Crowley and Stern, 1984). Both difference-of-
Gaussian and Laplacian of Gaussian approaches re-
sult in high-pass filtering of the original image, en-
hancing edges as their result.
A more computationally efficient way to build
multi-scale image pyramids than the one used in the
original SIFT approach uses binomial kernels instead
of Gaussian kernels (Crowley et al., 2002). Although
it is not possible to generate all possible scales us-
ing binomial filters, their resolution is enough to build
image pyramids with scales sufficiently spaced by
σ =
√
2 (Crowley and Riff, 2003).
More recently, the method presented in (Crowley
et al., 2002) was used in (Entschev and Vieira Neto,
2013) to build SIFT image pyramids more efficiently.
In this approach, the time spent to compute the whole
multi-scale pyramid was reduced to up to one fourth
of the time spent to build the same multi-scale pyra-
mid using the original approach of SIFT.
The work presented in (Entschev and Vieira Neto,
2013) proposes the use of binomial filters to build im-
age pyramids for SIFT-like object recognition, aiming
at near real-time performance in a BeagleBoard-xM
embedded development kit. That work explains how
the scale space for the SIFT framework can be built
more efficiently, focusing in execution time perfor-
mance, but did not consider the repeatability of inter-
est points. The present work intends to fill the gap left
in (Entschev and Vieira Neto, 2013), assessing the re-
peatability of interest points detected using binomial
image pyramids.
2.1 Binomial Pyramid
As described in (Entschev and Vieira Neto, 2013),
there are two kernels of special interest for the con-
struction of a binomial pyramid,
1
4
×[1 2 1] and its
auto-convolution
1
16
×[1 4 6 4 1], which are approxi-
mations to Gaussian kernels of scale σ =
1
2
and σ = 1,
respectively (Crowley et al., 2002). The reason for
such importance is that with just these two kernels,
it is possible to build full scale spaces separated by
σ =
√
2.
The relationship for cascaded convolutions of bi-
nomial filters is given in:
σ =
q
σ
2
1
+ σ
2
2
. (1)
From Equation 1, it is possible to deduce that with
two consecutive convolutions with the
1
16
×[1 4 6 4 1]
kernel, an image with scale σ =
√
2 is obtained. The
same scale is also obtainable by applying four con-
secutive convolutions with the
1
4
×[1 2 1] kernel. This
important relationship determines that multiple con-
volutions of binomial kernels result in a scale space
separated by σ =
√
2, the so-called half-octave bino-
mial pyramid.
PECCS2014-InternationalConferenceonPervasiveandEmbeddedComputingandCommunicationSystems
98