Figure 15: Visualisation of the brake predictions and the
ground truth velocity from a different dataset.
5 CONCLUSIONS
The implementation of a CNN model for vehicle
speed prediction using sequential image input
demonstrates the potential of leveraging temporal
information captured in sequential frames. The model
architecture, effectively extracts relevant spatial
features and captures high-level representations of the
input data. The model is evaluated using several
evaluation metrics, providing insights into its
accuracy and reliability. We extend our model to
provide sensorial data as input and we compare both
models with existing published work that uses
additional sensorial input. We found that our
approaches are more robust compared to other
methods that leverage additional data while using
recurrent neural networks, if the additional input
sensorial data is corrupt or erroneous. We have also
tested and evaluated our image based model to predict
brake pedal pressure given the same sequence of
input images, and the results are promising even on
un-seen data from different datasets. Further
experiments can enhance the understanding of the
model's capabilities and potentially lead to
improvements in vehicle speed prediction for real-
world applications, such as video forensics.
ACKNOWLEDGEMENTS
The research was supported by grants from the
Ministry of Research and Innovation, CNCS—
UEFISCDI, project number PN-III-P1-1.1-PD-2021-
0247 and PN-III-P4-ID-PCE2020-1700.
REFERENCES
Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B.,
Flepp, B., Goyal, P., ... & Zhang, X. (2016). End to end
learning for self-driving cars. In arXiv preprint.
arXiv:1604.07316.
Chollet, F., & others. (2015). Keras. GitHub. Retrieved
from https://github.com/fchollet/keras.
Codevilla, F., Miiller, M., López, A., Koltun, V., and
Dosovitskiy, A. (2018). End-to-End Driving Via
Conditional Imitation Learning. In 2018 IEEE
International Conference on Robotics and Automation
(ICRA). IEEE Press, pp. 1–9.
Codevilla, F., López, A., and Gaidon, A. (2019). Exploring
the limitations of behavior cloning for autonomous
driving. In Proceedings of the IEEE International
Conference on Computer Vision (ICCV), pp. 2485–2494.
Ding, Y., Zhang, Z., Li, Y., and Zhou, X. (2022).
EgoSpeed-net: forecasting speed-control in driver
behavior from egocentric video data. In Proceedings of
the 30th International Conference on Advances in
Geographic Information Systems (SIGSPATIAL '22).
Association for Computing Machinery (12), pp. 1–10.
Dosovitskiy, A., Ros, G., Codevilla, F., López, A., Koltun,
V. (2017). CARLA: An open urban driving simulator.
In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition Workshops (CVPRW),
pp. 953–961.
Gu, Z., Li, Z., Di, X., Shi, R. (2020). An LSTM-Based
Autonomous Driving Model Using a Waymo Open
Dataset. In Appl. Sci. 2020, 10, 2046.
He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep Residual
Learning for Image Recognition. In Computer Vision
and Pattern Recognition, pp. 770–778.
He, K., Gkioxari, G., Dollár, P., Girshick, R. (2017). Mask
R-CNN. In 2017 IEEE International Conference on
Computer Vision (ICCV), pp. 2980-2988.
Hochreiter, S., Schmidhuber, J. (1997). Long short-term
memory. In Neural Computation, 9(8), 1735–1780.
National Highway Traffic Safety Administration, (2023).
United States Department of Transportation. Available
online: https://www.nhtsa.gov/risky-driving/speeding.
Nedevschi, S., Danescu, R., Frentiu, D., Marita, T., Oniga,
F., Pocol, C., Schmidt, R., Graf, T. (2004). High
accuracy stereo vision system for far distance obstacle
detection. In Proceedings of the IEEE Intelligent
Vehicles Symposium, Parma, Italy, 14–17 June 2004,
pp. 292–297.
Ramanishka, V., Chen, Y., Misu, T., Saenko, K. (2018).
Toward Driving Scene Understanding: A Dataset for
Learning Driver Behavior and Causal Reasoning. In
2018 IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), Salt Lake City, UT, USA,
2018, pp. 7699–7707.
Xu, H., Gao, Y., Yu, F., and Darrell, T. (2017). End-to-End
Learning of Driving Models from Large-Scale Video
Datasets. In 2017 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), Honolulu, HI,
USA, 2017, pp. 3530–3538.
Xue, H., Huynh, D. Q., Reynolds, M. (2018). SS-LSTM: A
Hierarchical LSTM Model for Pedestrian Trajectory
Prediction. In 2018 IEEE Winter Conference on
Applications of Computer Vision (WACV), Lake Tahoe,
NV, USA, 2018, pp. 1186–1194.