SiamFC fails. Nevertheless, the SiamFC usually out-
puts a more precise BB of the object, as shown in Fig-
ure 4(c). It explains higher precision and success rates
by the SiamFC for small thresholds.
Figure 5 shows a comparison to several state-of-
the-art trackers. Notice that MDNET, CCOT, ECO
and SINT have higher AUC than our tracker. Due
to its simple and effective approach, however, our
method is lightweight with fairly comparable results.
We remark that the lack of object occlusion detection
greatly impairs our precision and success rates.
5 CONCLUSION
We proposed the use of two object descriptors for im-
proving tracking using an SNN. A long-term mem-
ory, based on the first object appearances, is combined
with a short-term memory, based on recent appear-
ances, to form an updated object descriptor. Its dy-
namic nature adapts to the object and is more suit-
able for long tracking. We applied the method to
the SiamFC descriptors but it can be adapted to other
SNNs.
The proposed short-term memory is obtained by
low-cost convolutions with a filter bank. The filter
bank is learned by a GA strategy from videos with
high F-measure. More specifically, only seven videos
from the VOT2015 dataset were used for training:
''bag'', ''racing'', ''ball1'', ''octopus'', ''bolt2'', ''pedes-
trian'', ''road''. Despite the high computational cost
for the GA step, the learned filter bank showed to be
general enough for tracking over the 50 videos of the
OTB50 dataset. Our filter learning proposal is thus ca-
pable to generalize object descriptor variations along
tracking.
Our novel approach presented promising results,
showing a consistent gain for object localization. The
obtained BBs on well-known datasets proved to be
adherent to the tracked object over time. Compared to
the state of the art trackers, our method has fair preci-
sion and success rates with a very low computational
cost. Our method is appropriate when good object lo-
calization throughout the whole video is paramount,
but low BB precision is not an issue. The frame rate is
only dependent on the underlying SNN performance,
since the proposed 1D convolutions add negligible ex-
tra cost.
Future works include investigating more power-
ful kernel learning approaches for temporal series, in-
cluding LSTM and deep neural networks. There is
the possibility of developing specific neural network
models for learning the convolution filters. Further-
more, adaptive filter learning seems to be a promis-
ing approach, albeit it is a very challenging task. The
use of more memories seems promising in contexts
with complex interactions between objects of inter-
est. Also, occlusion detection strategies are very im-
portant to improve the performance on the OTB50
dataset.
ACKNOWLEDGEMENTS
Authors thank CAPES, FAPEMIG (grant CEX-
APQ-01744-15), FAPESP (grants #2017/09160-1 and
#2017/12646-3), CNPq (grant #305169/2015-7) for
the financial support, and NVIDIA Corporation for
the donation of two Titan Xp (GPU Grant Program).
REFERENCES
Ahmad, S. and Antoniou, A. (2006). Cascade-form multi-
plierless fir filter design using orthogonal genetic al-
gorithm. In IEEE International Symposium on Signal
Processing and Information Technology, pages 932–
937.
Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A.,
and Torr, P. H. (2016). Fully-convolutional siamese
networks for object tracking. In European Conference
on Computer Vision, pages 850–865. Springer.
Bromley, J., Guyon, I., LeCun, Y., S
¨
ackinger, E., and Shah,
R. (1994). Signature verification using a” siamese”
time delay neural network. In Advances in Neural In-
formation Processing Systems, pages 737–744.
Cemes, R. and Ait-Boudaoud, D. (1993). Genetic approach
to design of multiplierless fir filters. Electronics Let-
ters, 29(24):2090–2091.
Danelljan, M., Bhat, G., Khan, F. S., Felsberg, M., et al.
(2017). Eco: Efficient convolution operators for track-
ing. In Computer Vision and Pattern Recognition, vol-
ume 1, page 3.
Danelljan, M., Robinson, A., Khan, F. S., and Felsberg, M.
(2016). Beyond correlation filters: Learning continu-
ous convolution operators for visual tracking. In Euro-
pean Conference on Computer Vision, pages 472–488.
Dey, A. K., Saha, S., Saha, A., and Ghosh, S. (2010). A
method of genetic algorithm for fir filter construction:
design and development with newer approaches in
neural network platform. International Journal of Ad-
vanced Computer Science and Applications, 1(6):87–
90.
Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., and
Wang, S. (2017). Learning dynamic siamese network
for visual object tracking. In IEEE International Con-
ference on Computer Vision.
Huang, Z. (2017). An investigation of deep tracking meth-
ods. In Conference on Technologies and Applications
of Artificial Intelligence, pages 58–61. IEEE.