images. Then a set of features are extracted using a
deep convolutional neural network bilinear pooling.
These extracted features are aggregated and fed into
a single layer which is connected to a fully connected
layer for the pose regression. In our training, we used
the adam optimizer instead of conventional stochas-
tic gradient descent. We also used ELU activation
functions. Furthermore, our method has fast infer-
ence time and needs only the event image to relocal-
ize the camera pose. The results on publicly available
datasets show that our approach generalizes well and
outperforms recent works including LSTM based ar-
chitectures.
REFERENCES
Badrinarayanan, V., Kendall, A., and Cipolla, R. (2017).
Segnet: A deep convolutional encoder-decoder ar-
chitecture for image segmentation. IEEE transac-
tions on pattern analysis and machine intelligence,
39(12):2481–2495.
Clevert, D.-A., Unterthiner, T., and Hochreiter, S.
(2015). Fast and accurate deep network learning
by exponential linear units (elus). arXiv preprint
arXiv:1511.07289.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). Imagenet: A large-scale hierarchical
image database. In 2009 IEEE conference on com-
puter vision and pattern recognition, pages 248–255.
Ieee.
Eitel, A., Springenberg, J. T., Spinello, L., Riedmiller, M.,
and Burgard, W. (2015). Multimodal deep learning for
robust rgb-d object recognition. In 2015 IEEE/RSJ In-
ternational Conference on Intelligent Robots and Sys-
tems (IROS), pages 681–687. IEEE.
Gallego, G. and Scaramuzza, D. (2017). Accurate angu-
lar velocity estimation with an event camera. IEEE
Robotics and Automation Letters, 2(2):632–639.
Kendall, A. and Cipolla, R. (2016). Modelling uncertainty
in deep learning for camera relocalization. In 2016
IEEE international conference on Robotics and Au-
tomation (ICRA), pages 4762–4769. IEEE.
Kendall, A., Grimes, M., and Cipolla, R. (2015). Posenet: A
convolutional network for real-time 6-dof camera re-
localization. In Proceedings of the IEEE international
conference on computer vision, pages 2938–2946.
Kingma, D. P. and Ba, J. (2014). Adam: A
method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
Lepetit, V., Moreno-Noguer, F., and Fua, P. (2009). Epnp:
An accurate o (n) solution to the pnp problem. Inter-
national journal of computer vision, 81(2):155–166.
Li, M., Chen, R., Liao, X., Guo, B., Zhang, W., and Guo, G.
(2020). A precise indoor visual positioning approach
using a built image feature database and single user
image from smartphone cameras. Remote Sensing,
12(5):869.
Lin, T.-Y., RoyChowdhury, A., and Maji, S. (2015). Bilin-
ear cnn models for fine-grained visual recognition. In
Proceedings of the IEEE international conference on
computer vision, pages 1449–1457.
Mahajan, D., Girshick, R., Ramanathan, V., He, K., Paluri,
M., Li, Y., Bharambe, A., and Van Der Maaten, L.
(2018). Exploring the limits of weakly supervised pre-
training. In Proceedings of the European conference
on computer vision (ECCV), pages 181–196.
Mueggler, E., Rebecq, H., Gallego, G., Delbruck, T., and
Scaramuzza, D. (2017). The event-camera dataset and
simulator: Event-based data for pose estimation, vi-
sual odometry, and slam. The International Journal of
Robotics Research, 36(2):142–149.
Mur-Artal, R. and Tard
´
os, J. D. (2017). Orb-slam2:
An open-source slam system for monocular, stereo,
and rgb-d cameras. IEEE transactions on robotics,
33(5):1255–1262.
Nguyen, A., Do, T.-T., Caldwell, D. G., and Tsagarakis,
N. G. (2019). Real-time 6dof pose relocalization for
event cameras with stacked spatial lstm networks. In
Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition Workshops,
pages 0–0.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,
Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,
Antiga, L., et al. (2019). Pytorch: An imperative style,
high-performance deep learning library. Advances in
neural information processing systems, 32.
Qu, C., Shivakumar, S. S., Miller, I. D., and Taylor, C. J.
(2022). Dsol: A fast direct sparse odometry scheme.
arXiv preprint arXiv:2203.08182.
Rebecq, H., Horstschaefer, T., and Scaramuzza, D. (2017).
Real-time visual-inertial odometry for event cameras
using keyframe-based nonlinear optimization.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and
Chen, L.-C. (2018). Mobilenetv2: Inverted residu-
als and linear bottlenecks. In Proceedings of the IEEE
conference on computer vision and pattern recogni-
tion, pages 4510–4520.
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recognition.
arXiv preprint arXiv:1409.1556.
Fully Convolutional Neural Network for Event Camera Pose Estimation
599