Robust Interest Point Detection by Local Zernike Moments

Gökhan Özbulak and Muhittin Gökmen

Department of Computer Engineering, Istanbul Technical University, Istanbul, Turkey

Keywords: Interest Point Detection, Feature Extraction, Object Detection, Local Zernike Moments, Scale-Space.

Abstract: In this paper, a novel interest point detector based on Local Zernike Moments is presented. Proposed

detector, which is named as Robust Local Zernike Moment based Features (R-LZMF), is invariant to scale,

rotation and translation changes in images and this makes it robust when detecting interesting points across

the images that are taken from same scene under varying view conditions such as zoom in/out or rotation.

As our experiments on the Inria Dataset indicate, R-LZMF outperforms widely used detectors such as SIFT

and SURF in terms of repeatability that is main criterion for evaluating detector performance.

1 INTRODUCTION

In computer vision, general object detection

framework is based on i) extracting interesting

points in images, ii) describing regions around these

points as feature vectors and iii) matching feature

vectors in order to find corresponding points of the

images. For instance, if there are two images of one

scene containing a black car, by detecting interesting

points that qualify the car itself in both images and

searching for similarity between feature vectors

extracted around these points through some distance

metrics such as Euclidean or Mahalanobis, it's

possible to say that the black car in first image exists

in the second image as well. This point

correspondence is also important for stereo vision,

motion estimation, image registration and stitching

applications to be able to match corresponding

regions in images.

Searching for corresponding points between

images is a hard problem when these images are

scaled, rotated and/or translated versions of each

other. Under these geometric transformations,

interesting points still need to be detected and

matched with high repeatability score that is the

correspondence rate of the interesting points

detected between the images.

A good interest point detector is expected to be

invariant to geometric and photometric

transformations, and also robust to background

clutters and occlusions in image. Changes in scale,

rotation and translation between the images are

examples of geometric transformations whereas

illumination change is an example of photometric

transformations. Scale invariance problem is

handled by building scaled samples of the image

with Gaussian blurring and then applying the interest

point detector to these samples. This stack of images

is named as scale-space and it's widely used by well-

known methods such as Scale Invariant Feature

Transform (SIFT) (Lowe, 2004) and Speeded-Up

Robust Features (SURF) (Bay et al., 2008). Local

characteristics of interest points make them invariant

to background clutters and occlusions (Mikolajczyk

and Schmid, 2004). The locality also provides

translational invariance for interest point detectors

because local regions move together in the image

and thus information in a local region is preserved in

case of image translation.

In this paper, by extending our previous rotation-

invariant detector named as Local Zernike Moment

based Features (LZMF) (Özbulak and Gökmen,

2014), we propose a robust interest point detector

that is invariant under scale, rotational and

translational changes. Proposed method uses

rotation-invariant Zernike moments locally in the

bility scores for “Zoom&Rotationn order to detect

interesting points and thus exhibits rotation and

translation-invariant characteristics. For scale

invariance, a scale-space is constructed from given

image and interest point detector is applied to

images in spatial and scale-space in order to detect

interest points/keypoints. We name our interest point

detector as Robust Local Zernike Moment based

Features or R-LZMF shortly.

644

Özbulak G. and Gökmen M..

Robust Interest Point Detection by Local Zernike Moments.

DOI: 10.5220/0005343506440651

In Proceedings of the 10th International Conference on Computer Vision Theory and Applications (VISAPP-2015), pages 644-651

ISBN: 978-989-758-089-5

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

2 RELATED WORK

In the literature, most of interest point detection

schemas are based on corner or blob detectors

because they are good candidates to be interesting

points. One of the earliest interest point detector was

developed by Harris et al. and named as Harris

corner detector (Harris and Stephens, 1988). It

searches for large intensity changes in spatial-space

by sliding a window and detects such locations as

corners. Harris detector is rotation-invariant but not

scale-invariant.

Andrew Witkin introduced the scale-space

theory in his seminal work (Witkin, 1983) and

showed that convolving an image with Gaussian

filters of increasing sigma repeatedly exposes the

structures in different scales and thus gives scale

invariance characteristic to the detector building it.

Lindeberg extended the work of Witkin and

proposed automatic scale selection mechanism with

scale normalized Laplacian-of-Gaussian (LoG)

operator to detect blob-like structures in an image

(Lindeberg, 1998). Lowe, in (Lowe, 2004), showed

that LoG can be approximated by Difference-of-

Gaussian (DoG) that is the difference of two images

convolved with Gaussian filters of consecutive

sigma values. Lowe built DoG space for interest

point detection and he presented a complete schema

(detector and descriptor) named as Scale Invariant

Feature Transform (SIFT).

Mikolajczyk et al. proposed Harris-Laplace

detector, which combines Harris detector with LoG

operator, in (Mikolajczyk and Schmid, 2001). They

used Harris detector to localize interest points in 2D

(spatial-space) and LoG operator to find local

maximum in 3D (scale-space). The reason of using

Laplacian instead of Harris function in 3D is that

Harris function can't reach to maximum in scale-

space frequently and this causes a few numbers of

keypoints to be generated. Harris-Laplace detector

was then extended in (Mikolajczyk and Schmid,

2002, 2004) by determining the shape of the

elliptical region with the second moment matrix.

This schema was named as Harris-Affine detector.

Hessian-Affine detector, which detects interest

points based on the Hessian matrix in 2D space, was

also proposed in (Mikolajczyk and Schmid, 2002,

2004). Both Harris-Affine and Hessian-Affine

detectors have significant invariance to affine

transformations when compared with Harris-Laplace

detector.

Bay et al., in their schema named as Speeded-Up

Robust Features (SURF) (Bay et al., 2008), used

Hessian-based detector instead of Harris-based

counterpart because Hessian function is more stable

and repeatable. They preferred building scale-space

of approximated LoG filters rather than image itself

as opposed to SIFT and Harris-Laplace. Up-scaling

the filters instead of down-scaling the image

prevents aliasing problems occurred when sub-

sampling the image and this approach is also faster

than SIFT because up-scaled filters are implemented

with efficient integral image method.

Rosten et al. developed a fast interest point

detector and named it as Features from Accelerated

Segment Test (FAST) in (Rosten and Drummond,

2006). FAST tests each image pixel for cornerness

by looking its 16 pixel-circular neighborhood and if

some contiguous pixels in this neighborhood are

brighter/darker than the pixel in test then it's

detected as corner. This method also learns from

image pixels by applying decision tree to increase its

accuracy. Interest points detected by FAST are not

multi-scale features, in other words, FAST is not

scale-invariant. Oriented FAST and Rotated Brief

(ORB), proposed in (Rublee et al., 2011), is a

combination of FAST keypoint detector and BRIEF

descriptor (Calonder et al., 2010). ORB modifies

FAST to work with image pyramid for scale

invariance and it also modifies BRIEF descriptor to

make it rotation-invariant. Center Surround Extrema

(CenSurE) is another scale and rotation-invariant

interest point detector proposed in (Agrawal et al.,

2008). In CenSurE, a center-surround filter is

applied to the image at all locations and scales, and

Harris function is used for eliminating weak corner

points. Leutenegger et al. proposed a rotation and

scale-invariant key point detector named as Binary

Robust Invariant Scalable Keypoints (BRISK) in

(Leutenegger et al., 2011). BRISK uses a novel

scale-space FAST-based detector for scale-invariant

interest point detection and considers a saliency

criterion by using quadratic function fitting in

continuous domain.

In (Özbulak and Gökmen, 2014), we proposed a

rotation-invariant interest point detector by applying

Zernike moment of 



to the image locally in order

to measure cornerness and sweeping up nearby

edges by dividing Zernike moments 



to 



. We

named it as Local Zernike Moment based Features

(LZMF). Performance evaluation of LZMF with

“Rotation” sequence of the Inria Dataset showed that

our method outperforms well-known interest point

detectors such as SIFT, SURF, CenSurE and

BRISK. In this study, we extend our rotation-

invariant detector to be scale-invariant by building

scale-space with optimal parameter settings.

RobustInterestPointDetectionbyLocalZernikeMoments

645

3 ROBUST INTEREST POINT

DETECTION

In this section, we introduce our robust and local

Zernike Moment (LZM) based interest point

detector schema, R-LZMF. Zernike moments are

described in Section 3.1 and Section 3.2 includes a

general overview of the proposed detector. In

Section 3.3, we show how the scale-space is built in

order to yield the best scale-invariance performance

with our detector.

3.1 Zernike Moments

In (Zernike, 1934), Fritz Zernike introduced a

complete set of complex polynomials, named as

Zernike polynomials, that are orthogonal on the unit

disk 



+



≤1. Zernike polynomials are defined

as:





(

,

)

=



(

,

)

=



()



(1)

Where 



(



)

is the radial polynomial, is the

order of polynomial,  is the number of iteration, 

is the length of vector from origin to (x,y) and  is

the angle between  and x-axis in counter-clockwise

direction. There are some constraints on n and m

parameters such as ≥0, n−|m|=even and

|m|<=n. 



(



)

is defined as:





(



)

=

(

−1

)







(

−

)

!







!







!









(2)

Teague introduced using Zernike polynomials as

orthogonal image moments in (Teague, 1980) for

two-dimensional pattern recognition. Given an

image function of f(x,y), Zernike moment of order

 and repetition  is defined as:





+1







(

,

)





∗

(

,

)















(3)

Where ∗ in 



∗

(,) denotes the complex

conjugate. The formula in (3) is discretized in order

to work with digital images of size MxN as:





+1







(,)

∗

(



,



)Δ



Δy











(4)

Where 



,



∈[−1,1], 











+









=tan











⁄

and Δ



=Δ



=2/

√

As seen from (4), a Zernike moment, 



, is a

measurement about the intensity profile of the whole

image. It's also possible to project the local intensity

profiles on to Zernike polynomials by fitting the unit

circle on the pixels of the image. The image

moments using Zernike moments in this way are

named as local Zernike moments or LZM shortly.

LZM presents a powerful description of local image

region as it's successfully applied to face recognition

problem in (Sariyanidi et al., 2012) and used for

detection of low-level features such as step edges

and gray-level corners in (Ghosal and Mehrotra,

1997). In this paper, we use LZM representation to

detect gray-level corners by convolving the image

with LZM based operator, 





. 





is a 

convolutional filter for Zernike moment of order 

and repetition  and defined as:







(

,



)

=



(



,



)

(5)

An image is convolved with 





as below:







(

,

)

=



(−,−)





(,)





,





(6)

There is one real filter denoted as [





] and

one imaginary filter denoted as [





] for a

Zernike moment of order  and repetation 

because Zernike moments are complex. However,

imaginary filter is discarded when there is no

repetition (m=0) and the image is only convolved

with real filter.

The magnitude of Zernike moments is

unchanged when an image is rotated by an angle of

 w.r.t. x-axis. This property gives Zernike moments

rotation-invariant characteristic under image

rotations, see (Khotanzad and Hong, 1990). The

magnitude of Zernike moment is defined as:







([





])



+([





])



(7)

Where [



] and [



] are local Zernike

moment representations obtained by convolving an

image with real and imaginary Zernike filters,

[





] and [





], respectively. As a note,

[



] is zero and discarded when =0.

3.2 Interest Point Detection by LZM

Our interest point detection method is applied in

spatial (2D) and scale-space (3D). In spatial-space,

input image is first converted to gray-scale and then





-normalization is applied on the gray-scaled image

to make proposed detector more robust to noise as

we showed in (Özbulak and Gökmen, 2014).

Before convolving the image with Zernike filter,

a second normalization procedure is locally applied

to the region where unit circle is fitted. The local

intensity profile under this circle is fitted to standard

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

646

normal distribution with =0 and =1 in order to

make proposed detector more robust to local

illumination changes. This is similar to

normalization procedure in Normalized Cross-

Correlation (NCC) and feature vector normalization

in SIFT (Lowe, 2004).

The input image is then convolved with Zernike

filters, 





and 





. Ghosal, in (Ghosal and

Mehrotra, 1997), used Zernike filters with order of

2, 



and 



, for his corner detection schema. In

the experiments, however, we found that working

with more complex orders such as 4 yields better

results in terms of repeatability. In our method, the

magnitude of Zernike moment of 



, |



|, is used

to measure the cornerness. A pre-defined corner

threshold is applied to |



| response and the pixels

that are higher than this threshold are considered as

candidate interest points. The drawback of using

Zernike moment of 



is that it may respond to

edges closer to corners. We suppress these nearby

edges by dividing |



| to |



| as









and then

thresholding









with a pre-defined nearby edge

threshold. Candidate interest points passing this

thresholding test are retained for further processing

and the rest is discarded. So, in this way, interest

points/keypoints, which are corners but not nearby

edges, are considered. For proposed detector, corner

threshold value of 0.51 and nearby-edge threshold

value of 5, which were determined in our previous

work by using “Rotation” sequence of the Inria

Dataset, are used throughout the experiments.

A further refinement procedure by Non

Maximum Suppression is also applied to the

detected interest points in spatial domain as follows:

i) a 5x5 window is centered on each interest point,

ii) the interest point is compared based on |



| with

detected interest points in its 5x5 neighborhood, iii)

the interest point is retained if its |



| response is

the maximum or discarded otherwise. In this way,

redundant interest points are swept out and more

consistent interest points are retained.

Candidate interest points detected in spatial-space

are then examined in scale-space to eliminate weak

ones, which don't reach local maximum in scale-

space, and to figure out characteristic scales of

strong ones (see Section 3.3 for details). This

analysis is realized in each octave of the scale-space

as follows: For outermost scale levels of an octave,

candidate interest points detected as a result of

spatial analysis are directly retained without any

scale analysis. For inner scale levels of an octave, a

candidate interest point is compared with interest

points detected in lower and upper scale levels based

on |



|. This comparison again falls in 5x5

neighborhood of the point in interest for adjacent

scale levels. If |



| response of the interest point is

the maximum among all interest points detected in

lower and upper scale levels then the candidate

interest point is considered as a real interest point.

This kind of approach is also named as 3D Non

Maximum Suppression. Here, R-LZMF has

5x5x3=75-1=74 comparisons at most in spatial and

scale-space and this check doesn't take time because

it only compares the point in interest with detected

interest points in 2D and 3D space, and stops

comparison if one interest point in the neighborhood

has higher |



3.3 Scale-Space

Andrew Witkin introduced the scale-space concept

in his seminal work (Witkin, 1983) to represent

signals in different scale levels in order to show how

signal behaviour changes from fine to coarse scales.

He also showed that smoothing an image with

Gaussian filters of increasing sigma has ability to

suppress fine details and expose coarse structures.

Koenderink, in (Koenderink, 1984), showed that

Gaussian filter is the unique filter for building scale-

space. Lindeberg verified this uniqueness and

proposed an automatic scale selection mechanism in

order to find the characteristic scale of an interesting

point in an image (Lindeberg, 1998). Characteristic

scale is the scale level where an interest point

detection function reaches local extremum in the

scale-space. This is the moment an image point

exhibits most interesting characteristic (cornerness,

blobness etc.) in the scale-space. A description

extracted from an interest point with characteristic

scale size would be independent of same interest

points detected in different scaled images.

In this study, we build a scale-space for our

rotation-invariant interest point detector proposed in

our previous work to make it scale-invariant as well.

The input image is repeatedly convolved with

Gaussian filters of increasing sigma size for blurring

and each blurred image constitutes a scale level,

(,,), in the scale-space. (,,) is defined

as:



(

,,

)

=

(

,,

)

∗(,)

(8)

Where (,) is the input image and (,,) is

the 2D Gaussian function defined as:



(

,,

)

2





(







)/



(9)

We divide scale-space into octaves for efficient

computation. An octave is a stack of scale

RobustInterestPointDetectionbyLocalZernikeMoments

647

levels/layers, (,,), with same resolution and

sigma for each scale level is a constant factor of

previous scale layer's sigma. An image in one octave

is a sub-sampled version of an image in previous

octave. When sigma in an octave is doubled, the

image convolved with Gaussian filter of this sigma

is halved in size and used as first scale layer of next

octave. Here, sub-sampling is the key factor for

computational gain.

(a)

(b)

(c)

Figure 1: Parameter evaluation for scale-space based on

average repeatability: (a) Number of scale level

performance for =3, 



=1.7. (b) Number of octave

performance for =2, 



=1.7. (c) Initial sigma

performance for =4, =2.

There are some parameters that should be fine-

tuned in order to have a full coverage of scale-space.

These are the number of scale levels in one octave

(), the number of octaves in scale-space () and

initial sigma for first scale level of the first octave

(



). We used Belledonnes images from “Zoom”

sequence of the Inria Dataset to figure out the

optimum parameters for our detector. We first

determined the number of scale layer under

assumptions of =3 and 



=1.7 and got the best

average repeatability by using two scale levels (=

2), see Figure 1-a. In this case, however, 3D Non

Maximum Suppression is not applied because

outermost scale layers are the only scale layers to be

used. We then searched for the optimum number of

octaves with =2 under assumption of 



=1.7

and as seen from Figure 1-b working with 4 octaves

(=4) yields the best result in terms of average

repeatability. As noted, using more than 4 octaves

doesn't affect the performance. For initial sigma

value, under =4 and =2, although the best

repeatability performance is obtained with value of

1.6 as seen in Figure 1-c, we had better performance

with value of 1.8 (



=1.8) in the experiments, so

we use this value as initial sigma value. Thus, final

parameter settings were determined as =4, =2

and 



=1.8.

In Figure 2, the interest points detected by R-

LZMF can be seen as red circles in some Laptop

images of “Zoom&Rotation” sequence. In this

figure, most of the interesting points detected in one

image can be observed in other images as an

indicator of how our detector is accurate in terms of

repeatability.

Figure 2: Interest points detected by R-LZMF for some

Laptop images from “Zoom&Rotation” sequence.

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

648

4 EXPERIMENTAL RESULTS

The Inria Dataset is used for performance evaluation

of proposed detector. We used Asterix, Crolles,

VanGogh image sets from "Zoom" sequences, and

East_park, Laptop, Resid image sets from

"Zoom&Rotation" sequences. "Zoom" sequence

contains only scaled image sets and

"Zoom&Rotation" image sets have scaled and

rotated images. Image sets in both sequences have

their own transformation matrices for repeatability

evaluation although scale and rotation information

about image sets are not provided. Therefore, x-axis

in Figure 3 and Figure 4 show the image index

instead of scale value or rotation angle. One can

think of larger image index as larger scale and

rotation angle.

As proposed in (Schmid et al., 1998), the

repeatability score is main criterion to evaluate

performance of interest point detectors. The

repeatability is a measurement of the point

correspondence between two images that are

transformed form (scaled, rotated or translated) of

each other. A robust interest point detector is

expected to detect the most of the same structures in

two images even they are scaled, rotated or

translated versions of each other. The repeatability

score is evaluated as:



,

(









)

min(



,



)

(10)

Where (



,



) is the number of corresponding

points detected in both images, 



and 



are the

numbers of the keypoints detected in first and

second images respectively.

(a)

Figure 3: Repeatability scores for “Zoom” sequence: (a)

Asterix. (b) Crolles. (c) VanGogh.

(b)

(c)

Figure 3: Repeatability scores for “Zoom” sequence: (a)

Asterix. (b) Crolles. (c) VanGogh (cont.).

Repeatability performance of R-LZMF with

image sets used for evaluation is plotted in Figure 3

for “Zoom” sequence and in Figure 4 for

“Zoom&Rotation” sequence. As seen from plots, R-

LZMF outperforms well-known detectors such as

SIFT, SURF, CenSurE (STAR), BRISK and ORB

for all image sets. We used OpenCV v2.4.8 to work

with these detectors and applied them on the image

sets with default parameter settings. As a note,

throughout the experiments, we used our detector

with same parameter settings as well: corner

threshold=0.51, nearby-edge threshold=5, =4,

=2, 



=1.8. From the bar charts in Figure 5, it

can be seen that R-LZMF has also the best

performance in terms of average repeatability when

compared with all other detectors.

RobustInterestPointDetectionbyLocalZernikeMoments

649

(a)

(b)

(c)

Figure 4: Repeatability scores for “Zoom&Rotation”

sequence: (a) East_park. (b) Laptop. (c) Resid.

(a)

(b)

(c)

(d)

(e)

(f)

Figure 5: Average repeatability scores for: (a) Asterix. (b)

Crolles. (c) VanGogh. (d) East_park. (e) Laptop. (f) Resid.

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

650

5 CONCLUSIONS

In this paper, we proposed a novel interest point

detector named as Robust Local Zernike Moment

based Features or R-LZMF. This detector is based

on local Zernike moments and invariant to geometric

transformations such as scale, rotation and

translation. We validated its robustness to these

transformations by testing it with the Inria Dataset

and reported that R-LZMF outperforms SIFT,

SURF, CenSurE (STAR), BRISK and ORB for all

image sets in the experiments. As a future work, we

plan to analyse the performance of R-LZMF for

affine transformation as well. Furthermore, we will

extend R-LZMF to have a descriptor by using LZM

again to utilize from its descriptive power so that it

will be a complete schema (detector and descriptor)

as in SIFT and SURF.

REFERENCES

Agrawal, M., Konolige, K., Blas, M. R., 2008. CenSurE:

Center surround extremas for realtime feature

detection and matching. In European Conference on

Computer Vision, pp. 102-115.

Bay, H., Ess, A., Tuytelaars, T., Gool, L.V., 2008. SURF:

Speeded Up Robust Features. In Computer Vision and

Image Understanding, vol. 110, no. 3, pp. 346-359.

Calonder, M., Lepetit V., Strecha, C., Fua, P., 2010.

BRIEF: Binary Robust Independent Elementary

Features. In European Conference on Computer

Vision, pp. 778-792.

Ghosal, S., Mehrotra, R., 1997. A moment based unified

approach to image feature detection. In IEEE Trans.

Image Processing, vol. 6, no. 6, pp. 781-793.

Harris, C., Stephens, M., 1988. A combined corner and

edge detector. Alvey Vision Conference, pp. 147-151.

Khotanzad, A., Hong, Y. H., 1990. Invariant image

recognition by Zernike moments. In IEEE

Trans.Pattern Analysis and Machine Intelligence, vol.

12, pp. 489-497.

Koenderink, J.J., 1984. The structure of images.

Biological Cybernetics, 50:363–396.

Leutenegger, S., Chli, M., Siegwart R., 2011. BRISK:

Binary Robust Invariant Scalable Keypoints. In

International Conference on Computer Vision, pp.

2548-2555.

Lindeberg, T., 1998. Feature detection with automatic

scale selection. In International Journal of Computer

Vision, 30(2):79-116.

Lowe, D.G., 2004. Distinctive image features from scale-

invariant keypoints. In International Journal of

Computer Vision, vol. 60, no. 2, pp. 91-110.

Mikolajczyk, K., Schmid, C., 2001. Indexing based on

scale invariant interest points. In International

Conference on Computer Vision, pp. 525-531.

Mikolajczyk, K., Schmid, C., 2002. An affine invariant

interest point detector. In European Conference on

Computer Vision, pp. 128-142.

Mikolajczyk, K., Schmid, C., 2004. Scale and affine

invariant interest point detectors. In International

Journal of Computer Vision, vol. 60, no. 1, pp. 63-86.

Özbulak, G., Gökmen, M., 2014. A rotation invariant local

Zernike moment based interest point detector. In Proc.

SPIE of International Conference on Machine Vision.

Rosten, E., Drummond, T., 2006. Machine learning for

high-speed corner detection. In European Conference

on Computer Vision, pp. 430-443.

Rublee, E., Rabaud, V., Konolige, K., Bradski, G., 2011.

ORB: an efﬁcient alternative to SIFT or SURF. In

Internatioanl Conference on Computer Vision, pp.

2564-2571.

Sariyanidi, E., Dagli, V., Tek, S.C., Tunc, B., Gokmen,

M., 2012. Local Zernike Moments: A new

representation for face recognition. In International

Conference on Image Processing, pp. 585-588.

Schmid, C., Mohr, R., Bauckhage, C., 1998. Comparing

and evaluating interest points. In IEEE International

Conference on Computer Vision, pp. 230-235.

Teague, M.R., 1980. Image analysis via the general theory

of moments, In J. Optical Soc. Am., Vol. 70, pp. 920-

930.

The Inria Dataset, http://lear.inrialpes.fr/people/

mikolajczyk/Database.

Witkin, A.P., 1983. Scale-space filtering. In International

Joint Conference on Artificial Intelligence, Karlsruhe,

Germany, pp. 1019–1022.

Zernike, F., 1934. Physica, vol. 1.

RobustInterestPointDetectionbyLocalZernikeMoments

651