Region-constrained Feature Matching with Hierachical

Agglomerative Clustering

Jung-Whan Jang, Mostafiz Mehebuba Hossain and Hyuk-Jae Lee

Inter-university Semiconductor Research Center, Department of Electrical Engineering,

Seoul National University, Seoul, Korea

Keywords: SIFT, Feature Detector, Feature Correspondence, Clustering, Segmentation.

Abstract: Local feature matching is one of the most fundamental issues in computer vision. Hierarchical

agglomerative clustering (HAC) has been effectively used to distinguish inliers from outliers. The drawback

of HAC is its large computational complexity which increases rapidly as the number of feature

correspondences increases. To overcome this drawback, this paper proposes a region-constrained feature

matching in which an image is segmented into small regions and feature correspondences are clustered

inside each region. Adjacent segmented regions are merged to form larger regions if the correspondences

inside regions are similar. The merge may increase the accuracy of clustering, and consequently, it improves

the accuracy of matching operations as well. The proposed region-constrained clustering dramatically

reduces the execution time by as much as 500 times compared to the previous clustering while it achieves a

similar matching accuracy.

1 INTRODUCTION

Due to a popular use of high-resolution image

sensors, high-definition (HD) images are widely

available in high-resolution CCTV cameras and

broadcasting cameras as well as mobile devices such

as mobile phones. A large amount of data in an HD

image requires a large computing power to process

image search, classification and recognition. Local

feature matching has been one of the widely used

techniques for object recognition. Hierarchical

agglomerative clustering (HAC) has been effectively

used to distinguish inliers from outliers but it suffers

from its large computational complexity which

increases rapidly as the number of feature

correspondences increases. To reduce the

computational complexity of HAC, this paper

proposes a region-constrained feature matching in

which an image is segmented into small regions and

feature correspondences are clustered inside each

region.

Local invariant features have been widely used

for image recognition because they are robust in

noise, light variation, and viewpoint change (Lowe,

2004; Bay et al., 2008; Rosten et al., 2010). Image

recognition based on local feature matching is

performed by finding the correspondences between

local features in different images. The local feature

matching has been used in a number of applications,

such as image stitching, 3D reconstruction, and

object identification. To find similarities between

features, Euclidean distance is calculated between

the feature vectors and the nearest neighbour

descriptor is selected or the distance ratio between

the nearest neighbour descriptor and second nearest

descriptor is used. The correspondence using only

similarity of feature vector may not always result in

the correct correspondence because comparison of

local patch may find partial similarity between

descriptors. Therefore, differentiation between

correct correspondences (inliers) and incorrect

correspondences (outliers) is needed. Hence,

effective methods to distinguish the inliers from the

outliers have been extensively investigated.

In (Lowe, 2004), an image is assumed to be a

rigid scene, and RANSAC (RANdom SAmple

Consensus) is used to fit a model to experimental

data and to reject inconsistent matches. However,

this method is not effective in the case of non-rigid

image deformation or complicated scene which

cannot be represented by an affine transform. A

number of image feature matching studies for non-

rigid image deformation have been conducted. One

promising approach for non-rigid image matching is

Whan Jang J., Mehebuba Hossain M. and Lee H..

Region-constrained Feature Matching with Hierachical Agglomerative Clustering.

DOI: 10.5220/0004744800150022

In Proceedings of the 9th International Conference on Computer Vision Theory and Applications (VISAPP-2014), pages 15-22

ISBN: 978-989-758-003-1

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

a use of the geometric information among local

features. These approaches find corresponding pairs

or triplets of points in the graph consisting of

features with properties of the distance and angle

between feature points in the image which generally

remains unchanged. This matching is performed by

considering the spatial layout between keypoints. If

there are many correspondences in an image, the

property of the invariant distance and angle can be

generated by a combination of mismatched points.

Therefore incorrect matching results can be

generated. In addition, this graph matching method

requires its computational complexity increasing

exponentially as the number of feature points

increases (Gomila and Meyer, 2003; Duchenne et

al., 2011;

Torresani et al., 2008).

Another approach for improved non-rigid image

matching uses clustering to get reliable feature set

(Cho et al., 2009). This approach repeatedly

performs clustering with all the feature

correspondence in an image until all inter-cluster

similarity is larger than the intra-cluster similarity.

Although this method successfully improves the

accuracy of matching results, it also suffers from a

rapid increase of the computational complexity with

the increase of the number of correspondences. As a

result, a fast algorithm to reduce the complexity is

necessary for a practical use.

This paper proposes a novel feature matching

algorithm that reduces the computational complexity

of the clustering-based outlier exclusion. The

proposed algorithm segments an image into small

regions and performs clustering only inside the

candidate region. Adjacent segmented regions are

merged to form larger regions if the correspondences

inside regions are similar. The merge may increase

the accuracy of clustering, and consequently, it

improves the accuracy of matching operations as

well. As the proposed algorithm uses only candidate

regions for clustering while the previous clustering

algorithm uses the whole image, the increase of the

complexity of the proposed algorithm is much less

than that of the previous algorithm.

This paper is organized as follows. Related

works are introduced in Section II and the proposed

feature matching algorithm is proposed in Section III.

Section IV presents experimental results and Section

V concludes this paper.

2 RELATED WORKS

This section describes HAC (Hierarchical

Agglomerative Clustering) algorithm which is used

as the base clustering algorithm in this paper

(Friedman et al., 2009). Figure 1 depicts the flow of

a general clustering algorithm. The first step

(Correspondence extraction) generates the features

that characterize the object to recognize. In the

second step (Cluster similarity), similarities between

features are measured. These similarities are used

for clustering in the next step, and then these two

steps are performed repeatedly until all inter-cluster

similarity is larger than intra-cluster similarity.

Figure 1: A general flow of a clustering algorithm.

HAC is one of the clustering algorithms that

adopts the same flow as Figure 1. A brief description

of HAC is given as follows (Xu et al., 2005).

HAC Algorithm

Step 1: Determine all inter-

correspondence similarities

Step 2: Select two closest

correspondences or clusters and

form a cluster

Step 3: Redefine similarities between

the new cluster generated in

Step 2 and the other

correspondences or clusters

Step 4: Return to Step 2 until inter-

cluster similarity is larger

than intra-cluster similarity

For a formal definition of geometric similarity,

the distance between two matches





and 



defined next. Let  and  be two keypoints in

different images matched by homography





. Let





and 





denote the respective positions of keypoints

and. Let the match between and be denoted













,





,





. Between two feature

correspondences 











,





,





and 







,





,



,

the distance is defined as follows (Cho et al., 2009):





,









|







|









|









































|





































1

where |∙| denotes Euclidean distance.

Let  and  represent two clusters of matches.

Then, the dissimilarity between the two clusters is

defined as the distance between closest matches of

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

the two clusters.





,





∀



∈,∀



∈





,





(2)

3 THE PROPOSED ALGORITHM

This section proposes a HAC-based clustering

algorithm that attempts to reduce the computational

complexity without a significant decrease of

matching accuracy. The algorithm is developed

under the assumption that SIFT is used as the local

feature. To reduce the computational complexity of

HAC, an image is segmented into small regions and

feature correspondences are clustered inside each

region. Adjacent segmented regions are merged to

form larger regions if the correspondences in the

adjacent regions are similar. The merge may

increase the accuracy of clustering, and

consequently, it may improve the accuracy of

matching operations as well.

3.1 Region-Constrained Clustering

This paper proposes a region-constrained clustering

that sets the candidate regions with similar attributes

and perform clustering with feature points inside the

candidate regions.

(a) (b)

Figure 2: (a) HAC. (b) Region-constrained clustering.

Figure 2 shows the difference between the

original HAC and region-constrained clustering. The

dashed-ellipse in Figure 2(b) indicates the candidate

regions. Both methods use the same feature points,

however region-constrained clustering perform

clustering only for feature points within the

candidate regions.

3.2 Geometric Relationship in

Correspondences

Let P and Q denote a set of local features obtained

from two images, respectively. Feature vectors









from two feature set P, Q are respectively

represented by









,



,



,









(3)















,





,



,









(4)

where 



,



 indicate coordinates of the

corresponding keypoint position,





and 



is the

scale and orientation information. 



is a feature

vector, called descriptor.

Initially, correspondences are evaluated by

comparing the distance of the closest neighbour to

that of the second-closest neighbour (Lowe, 2004).

From these initial correspondences, the geometric

similarity homography for each correspondence is

used to perform HAC inside each region. The

similarity is estimated from a pair of SIFT

descriptors for a given correspondence. Each SIFT

descriptor carries information about the scale and

orientation. Thus, the homograph matrix for a

correspondence can be expressed as the product of

matrices with the scale, rotation and translation

information.

h





∆σ







∆





T



∆x,∆y







∆ 0 0

0∆0

001



∆∆0

∆ ∆ 0

001



10∆

01∆

001



(5)

where ∆σ



/



, ∆θ







and ∆, ∆ are

given as follows.

∆



∆σcos∆





∆σsin∆







(6)

∆



∆σsin∆





∆σcos∆







(7)

3.3 Constrained Region

The area of the region may affect the accuracy and

complexity of the proposed algorithm. It is often the

case that object segmentation results in an image

with over-segmented regions. In this case, the

clustering constrained by the over-segmented

regions may not reflect the nature of the

correspondence. Therefore, this subsection proposes

an algorithm that merges the over-segmented

regions into large regions if they are turned out to be

similar regions. The merge of regions may improve

the accuracy of feature matching operations.

3.3.1 Region Homography Matrix

The segmented regions generated by a segmentation

algorithm such as watershed transform needs to be

merged and then used as candidate regions for

clustering. This paper defines the homography

matrix of each region and by using the similarity of

homography matrix, it merges a set of regions which

are likely to be used to constrain the boundary of a

clustering operation inside the region.

Region-constrainedFeatureMatchingwithHierachicalAgglomerativeClustering

From the homography of a feature correspondence

given by (5), the homography of a region

correspondence is defined as the regional average

homography of feature correspondence. The formal

definition is given as (8). If there exist  feature

correspondences in the area 



the area homography

of 



is formulated as the product of the average

ratios of the scale, the differences of rotation and

translation.





S∆







R∆







T∆







,∆









(8)

3.3.2 Region Similarity

Figure 3: Region homography projection.

This subsection discusses a method to measure the

similarity of the region homography matrices.

Suppose that the two adjacent regions





and 



Figure 3 have region homographies 



and 



respectively. If there is a similarity between





and





, the regions projected by 



and 



must be

similar, that is, the two projected regions must be

largely overlapped. Based on the above observation,

the similarity measure between two homographies is

derived from the overlapped region.

The overlapped region may be affected by the

area and shape of the region. To avoid this affect,

this paper defines an overlapping criterion that is

insensitive to the area and shape of a region. To this

end, a circle with the fixed size (corresponding to

radius 30), is used for the similarity measure. This

circle is called the unit circle (

) hereafter in this

paper. Note that a similar idea has been used in

(Mikolajczyk et al., 2005) to measure the matching

score in affine region detectors.

Region





projected by 



is expressed by the

product of two matrices. Matrix 





is defined to

represent all the pixels in region 

















,





where for all (



,



∈



(9)

The region projected by





is defined as follows:















(10)

For two adjacent regions 



, 



having

homography





, 



, respectively, the region

similarity between the regions are formulated as

follows:

Regionsimilarity



,





1, 







∩







∅

0, 







∩







∅

(11)

where 



represents the matrix representing all

points in

, that is 













,





where for all

(





,



∈.

In the above definition, similarity is “1” when the

two projected regions are overlapped. It is “0”,

otherwise. When the region similarity is “1”, then

the corresponding two regions are combined to make

a new region 









∪



for the independent

clustering operation.

The clustering accuracy is large when the

number of the inliers is larger than that of the

outliers in each region. The proposed algorithm

increase the accuracy by merging regions for the

case when the number of correspondences in over-

segment region is too small. Furthermore, the results

of the clustering operation inside a single region

becomes reliable when the regions are composed of

a set of similar homographies because the clustering

operations use the homography similarity between

correspondences.

For a reduction of computational complexity,

region merge is performed only among adjacent

regions. To this end, the segmented regions are

expressed by a graph which is commonly used data

structure for representing partitions (Kim et al.,

2010). Using this graph, the detection of adjacent

regions is easy to perform.

3.3.3 Complexity Analysis

Figure 4 shows the flow of the proposed algorithm

which segments an input image into small regions to

constrain the clustering operation. Using the result

of the initial correspondence, regions are merged to

form large regions. Then, HAC is performed for

correspondences in each region. The final clustering

result is obtained by collecting all the HAC results in

every region.

With a pair of N correspondences, conventional

HAC requires the construction of at most 1

clusters, and so

1 iterations (Step 2, 3, and 4) are

required. In addition, O



operations are required

in order to compute the similarity between clusters.

Therefore, the complexity is

O



. On the other

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

Figure 4: The flow of the proposed clustering algorithm.

hand, the complexity of the proposed algorithm

depends on the number of regions. Suppose that an

image is segmented into k regions. Let 







,⋯

,



denote the number of correspondences in the k

regions.

Then, equations (12) and (13) are true. Note

that the computational complexity of the i

region is

O







. Equation (13) proves that the proposed

algorithm requires much less complexity than that

the conventional HAC does.











,



∩



∅

(12)













⋯





≪



(13)

4 EXPERIMENTAL RESULTS

Experiments are conducted with two dataset with

shared contents. One is the eccv dataset used in

(Ferrari et al, 2006) that consists of 9 model objects

and 23 test images with a relatively small number of

correspondences. The other is Oxford dataset that

has been used for performance evaluation of affine

area detectors and local descriptors. The data sets

consist of images which are distorted by various

degradation (viewpoint change, image blur, JPEG

compression artifacts and illumination change).

Oxford dataset includes homography mapping

between the reference and distorted target images

that give the ground truth correspondences. These

dataset images are large and complex, composed of

a large number of correspondences.

The initial correspondences are generated using

the method of NNDR in (Lowe, 2004). Initial

correspondece and segemtation results are used to

derive candidate areas. Figure 5 shows the candidate

areas obtained by the proposed algorithm with

Graffiti image of Oxford dataset. As shown in

Figure 5(b), different candidate areas are represented

by different colors. The white colored areas includes

one or no correspondence, and therefore, they are

excluded from the clustering operations because

clustering needs at least two correspondences to

calculate similarity between correspondences.

(a) (b)

Figure 5: Candidate areas for clustering over Graffiti

image. (a) Original image. (b) Candidate areas with colors

representing segmentation results.

Figure 6: The ratio of each candidate area correspondences

against the total correspondences over Graffiti image.

Figure 6 shows the ratio of the number of

correspondences in the candidate area and the

number of correspondences in the whole image. This

figure shows only the 20 candidate areas with the

most correspondences. The result shows that the

ratio is less than 10% for all candidate areas. Recall

that the complexity is reduced when the number of

correspondences in each area is small (see (13)).

Therefore, the calculation time can be significantly

reduced.

Figure 7: Comparison of execution time between area

clustering and HAC.

Figure 7 shows the execution time according

tothe number of correspondences using Oxford

Graffiti dataset. The horizontal axis represents the

Region-constrainedFeatureMatchingwithHierachicalAgglomerativeClustering

number of correspondences, which is generated by

NNDR. The vertical axis represents the execution

time in the logarithmic scale. The execution time is

measured using a single-core Intel i5-750 processor

running at 2.67Ghz. With an increasing number of

correspondences, the difference between area

clustering and the HAC increases dramatically. Note

that the execution time of the area clustering

includes segmentation algorithm that generates

candidate regions.

Table 1: Recognition results over Oxford dataset.

Bikes Graffiti Boat Leuven

Propose

cluster size

733 1055 2131 1096

Proposed

Matching

score

72.7% 67.7% 78% 78.8%

HAC

cluster size

886 1056 2133 1189

HAC

Matching

score

65.4% 67.8% 76.3% 74.9%

Table 1 shows the accuracy of the proposed

clustering and the HAC. In order to evaluate the

accuracy, the value of the matching score is used,

which is often used as the metric in feature matching

algorithms (Mikolajczyk et al., 2005). For this

experiment, Oxford dataset is used with several

hundred initial feature correspondences in each

image pair. In all dataset images, the values of the

affine transform are presented. The proposed

algorithm gives a matching score higher than the

original HAC. The cluster size of the proposed

algorithm is the sum of each candidate region’s

clustering results. Although the cluster size is

reduced when compared to HAC, the proposed

algorithm achieves the higher matching score. This

indicates that the proposed algorithm effectively

removes outliers.

Figure 8: Models used in Figure 9.

Figure 8 shows the models in the test images

shown in Figure 9 which shows the correspondences

obtained from the experiments with the dataset that

has been used in (Ferrari et al, 2004). Figure 9 (a),

(c), (e), and (g) show the experimental results of

HAC whereas Figure (b), (d), (f) and (h) show the

results obtained by the proposed algorithm. The blue

circles show the correspondences which have been

determined to be inliers by clustering. The number

of clusters in the proposed algorithm is small but

inliers are only on the object. The candidate area

clustering is not affected by the correspondences in

the other area, and therefore, the possibility of

forming a cluster by outlier is reduced.

For the evaluation of the recognition accuracy,

the recall and precision rates are evaluated. Recall

and precision are based on the number of correct and

false matches between two images. Among positive

and negative matches, there are four possibilities, TP

(True Positive), FP (False Positive), TN (True

Negative), and FN (False Negative). Recall and

Precision are defined as follows:







(14)

1





(15)

Table 2 shows the precision and recall of figure 9.

Generally, the precision of the proposed algorithm is

high. However, the recall is less than the original

HAC because the proposed algorithm performs

clustering with only feature points in the candidate

region.

Table 2: Recall and precision of the pairwise object

matching on eccv dataset in Figure 9.

(a)(b) (c)(d) (e)(f) (e)(h)

Proposed

cluster size

51 79 340 100

Proposed

recall

0.71 0.71 0.85 0.67

Proposed

precision

0.96 0.95 0.99 0.95

HAC

cluster size

78 124 408 120

HAC

recall

0.81 0.92 0.97 0.70

HAC

precision

0.72 0.78 0.95 0.83

5 CONCLUSIONS

This paper proposes a region-constrained clustering

algorithm for outlier identification. An image is

segmented into small regions with similar geometric

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

properties and then HAC is performed with

correspondences only inside every region. The

possibility of incorrect clustering by the

correspondence outside the region is reduced. The

proposed algorithm is faster when compared to the

conventional HAC, as in the conventional HAC, the

complexity exponentially increases with the increase

of the input data size. Therefore, the proposed

algorithm is effective in an image with dense

correspondences.

The proposed algorithm uses region similarity to

merge regions to increase the region of clustering

operation and the accuracy of the clustering result.

Future research may investigate an effective merge

algorithm.

ACKNOWLEDGEMENTS

This work was supported by the Technology

Innovation Program (10039188, Development of

multimedia convergence programmable platform for

mart vehicles) funded by the Ministry of Trade,

Industry and Energy (MOTIE, Korea).

REFERENCES

Lowe, D. G. (2004). Distinctive image features from

scale-invariant keypoints. International journal of

computer vision, 60(2), 91-110.

Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008).

Speeded-up robust features (SURF). Computer vision

and image understanding, 110(3), 346-359.

Rosten, E., Porter, R., & Drummond, T. (2010). Faster and

better: A machine learning approach to corner

detection. Pattern Analysis and Machine Intelligence,

IEEE Transactions on, 32(1), 105-119.

Gomila, C., & Meyer, F. (2003). Graph-based object

tracking. In Image Processing, 2003. ICIP 2003.

Proceedings. 2003 International Conference on (Vol.

2, pp. II-41). IEEE.

Duchenne, O., Bach, F., Kweon, I. S., & Ponce, J. (2011).

A tensor-based algorithm for high-order graph

matching. Pattern Analysis and Machine Intelligence,

IEEE Transactions on, 33(12), 2383-2395.

Cho, M., Lee, J., & Lee, K. M. (2009). Feature

correspondence and deformable object matching via

agglomerative correspondence clustering. InComputer

Vision, 2009 IEEE 12th International Conference

on (pp. 1280-1287). IEEE.

Friedman, J., Hastie, T., & Tibshirani, R. (2009). The

Elements of Statistical Learning: Data Mining,

Inference, and Prediction. Springer Series in Statistics.

Xu, Rui, and Donald Wunsch. "Survey of clustering

algorithms." Neural Networks, IEEE Transactions

on 16.3 (2005): 645-678.

Mikolajczyk, K., & Schmid, C. (2005). A performance

evaluation of local descriptors. Pattern Analysis and

Machine Intelligence, IEEE Transactions on, 27(10),

1615-1630.

Ferrari, V., Tuytelaars, T., & Van Gool, L. (2006).

Simultaneous object recognition and segmentation

from single or multiple model views. International

Journal of Computer Vision, 67(2), 159-188.

Ferrari, V., Tuytelaars, T., & Van Gool, L. (2004).

Simultaneous object recognition and segmentation by

image exploration. In Computer Vision-ECCV

2004 (pp. 40-54). Springer Berlin Heidelberg.

Kim, T. H., Lee, K. M., & Lee, S. U. (2010). A unified

probabilistic approach to feature matching and object

segmentation. In Pattern Recognition (ICPR), 2010

20th International Conference on (pp. 464-467). IEEE.

Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman,

A., Matas, J., Schaffalitzky, F., Kadir, T., and Gool, L.

V. (2005). A comparison of afﬁne region detectors.

International Journal of Computer Vision, 65(1/2):43–

72.

Torresani, L., Kolmogorov, V., & Rother, C. (2008).

Feature correspondence via graph matching: Models

and global optimization. In Computer Vision–ECCV

2008 (pp. 596-609).

Region-constrainedFeatureMatchingwithHierachicalAgglomerativeClustering

APPENDIX

(a) (b)

(e) (f)

(g) (h)

Figure 9: HAC versus area clustering (a),(c),(e),(g) show the results by HAC (b),(d),(f),(h) show the results by the proposed

area clustering.

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications