strategy narrows the target area, while the relocation
strategy is used to provide more accurate locations by
reducing the loss of information in the neural network.
As the example in the last line of Figure 7, the method
proposed in this paper excludes interference from
objects in same categories with similar appearance,
while other methods are more susceptible to shallow
features and more likely to find false target then. Or, as
examples in the second and second-to-last figures, the
other methods work not as well as ours when
illumination is stronger or weaker than usual. And in
the first line of the Figure 7 we can find that even there
is only one salient target in the image (simple
background and low interference), our method can also
achieve a higher location accuracy than others.
Table 4: Speed test on MOTB datasets, where NCC and
SSD methods can only use CPU, while QATM and the
method in this paper can use GPU to accelerate the
positioning effect.
Methods SSD NCC QATM Ours
Backend CPU GPU
Average(ms) 296 321 1780 90
Finally, the matching speed is also an important
criterion to measure the performance of the algorithm
for practical application. Table.3 compares the
average time consumed by different matching and
locating methods on MOTB datasets, it is clear that
the methods proposed in this paper has obvious
advantages over traditional sliding window methods
and QATM with GPU acceleration.
4 CONCLUSIONS
We introduced a novel target matching framework,
which mainly includes Coverage- IoU based feature
extractor, verification process and relocation after
expanding region of interests. The idea of Coverage-
IoU loss in this framework comes from that the
existing IoU-loss cannot meet the coverage
requirement in some scenes. The coverage, shape
restriction and corner distance loss function can better
describe the regression process of the bounding box
and acquire more accurate position regression.
Moreover, the verification strategy present here is to
reduce false-positive results without the instance-
level template, so as to guide the regions of interest to
the target area. Finally, the inspiration of relocation
strategy comes from the location errors caused by the
information loss caused by pooling and other
operations in the neural network, while narrowing
input size and relocating in this area can reduce the
position errors to achieve better performance in
location accuracy. Also, the relocation strategy and
Coverage-IoU Loss proposed in this paper can be
easily ported to other common tasks like target
detection, instance segmentation and so on.
REFERENCES
James, Alex Pappachen, and Belur V Dasarathy. (2014).
Medical Image Fusion: {A} Survey of the State of the
Art. CoRR abs/1401.0.
Hashemi, Nazanin Sadat, Roya Babaie Aghdam, Atieh
Sadat Bayat Ghiasi, and Parastoo Fatemi. (2016).
Template Matching Advances and Applications in
Image Analysis. arXiv preprint arXiv:1610.07231.
Nan, Junyu, and David Held. (2019). Combining Deep
Learning and Verification for Precise Object Instance
Detection, no. CoRL: 1–20.
Perveen, Nazil, Darshan Kumar, and Ishan Bhardwaj.
(2013). An Overview on Template Matching
Methodologies and Its Applicatons 2 (10): 988–995.
Dekel, Tali, Shaul Oron, Michael Rubinstein, Shai Avidan,
and William T. Freeman. (2015). Best-Buddies Simil-
arity for Robust Template Matching. Proceedings of the
IEEE Computer Society Conference on Computer Vision
and Pattern Recognition 07-12-June. IEEE: 2021–2029.
Kat, Rotal, and Shai Avidan. (2018). Matching Pixels Using
Co-Occurrence Statistics. Proceedings of the IEEE
Computer Society Conference on Computer Vision and
Pattern Recognition, 1751–1759.
Cheng, Jiaxin, Yue Wu, and Premkumar Natarajan.
(2019)QATM: Quality-Aware Template Matching for
Deep Learning. Proceedings of the IEEE Computer
Society Conference on Computer Vision and Pattern
Recognition 2019-June: 11545–11554.
Ren, Shaoqing and Kaiming He. (2017). Faster R-CNN:
Towards Real-Time Object Detection with Region
Proposal Networks. IEEE Transactions on Pattern
Analysis and Machine Intelligence 39 (6): 1137–1149.
Ammirato, Phil, Cheng-Yang Fu, Mykhailo Shvets, Jana
Kosecka, and Alexander C. Berg. (2018). Target Driven
Instance Detection. arXiv preprint arXiv:1803.04610,
2018.
Girshick, Ross. (2015). Fast R-CNN. Proceedings of the
IEEE International Conference on Computer Vision
2015Inter: 1440–1448.
Rezatofighi, Hamid, Nathan Tsoi, JunYoung Gwak, Amir
Sadeghian, Ian Reid, and Silvio Savarese. (2019).
Generalized Intersection over Union: A Metric and A
Loss for Bounding Box Regression. Proceedings of the
IEEE Conference on Computer Vision and Pattern
Recognition. 2019: 658-666.
Joseph Redmon. (2013–2016). Darknet: Open Source
Neural Networks in C. http://pjreddie.com/darknet/,
[Access 23-August-2020].