An In-Depth Survey. arXiv preprint
arXiv:2204.07756.https://arxiv.org/pdf/2204.07756.pdf
Hermann, K. M., Kocisky, T., Grefenstette, E., Espeholt,
L., Kay, W., Suleyman, M., & Blunsom, P. (2015).
Teaching machines to read and comprehend. Advances
in neural information processing systems, 28.
https://proceedings.neurips.cc/paper/2015/file/afdec70
05cc9f14302cd0474fd0f3c96-Paper.pdf
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term
memory. Neural computation, 9(8), 1735-1780. DOI:
10.1162/neco.1997.9.8.1735
Islam, A., Long, C., & Radke, R. (2021, May). A hybrid
attention mechanism for weakly-supervised temporal
action localization. In Proceedings of the AAAI
Conference on Artificial Intelligence (Vol. 35, No. 2,
pp. 1637-1645). https://ojs.aaai.org/index.php/AAAI/
article/download/16256/16063
Jegham, I., Ben Khalifa, A., Alouani, I., & Mahjoub, M. A.
(2019, September). Mdad: A multimodal and multiview
in-vehicle driver action dataset. In International
Conference on Computer Analysis of Images and
Patterns (pp. 518-529). Springer, Cham. DOI:
https://doi.org/10.1007/978-3-030-29888-3_42
Jegham, I., Khalifa, A. B., Alouani, I., & Mahjoub, M. A.
(2018, December). Safe driving: Driver action
recognition using SURF keypoints. In 2018 30th
International Conference on Microelectronics (ICM)
(pp. 60-63). IEEE. DOI: 10.1109/ICM.2018.8704009
Jegham, I., Khalifa, A. B., Alouani, I., & Mahjoub, M. A.
(2020). Soft spatial attention-based multimodal driver
action recognition using deep learning. IEEE Sensors
Journal, 21(2), 1918-1925. DOI: 10.1109/JSEN.2020.
3019258
Jegham, I., Khalifa, A. B., Alouani, I., & Mahjoub, M. A.
(2020). A novel public dataset for multimodal
multiview and multispectral driver distraction analysis:
3MDAD. Signal Processing: Image Communication,
88, 115960. https://doi.org/10.1016/j.image.2020.
115960
Khemchandani, R., & Sharma, S. (2016). Robust least
squares twin support vector machine for human activity
recognition. Applied Soft Computing, 47, 33-46.
https://doi.org/10.1016/j.asoc.2016.05.025
Knowledge Center, “Categorical crossentropy” 2022,
last accessed 1/08/2022. [Online]. Available:
https://peltarion.com/knowledge-center/documentation/
modeling-view/build-an-ai-model/loss-functions/
categorical-crossentropy
Li, D., Yao, T., Duan, L. Y., Mei, T., & Rui, Y. (2018).
Unified spatio-temporal attention networks for action
recognition in videos. IEEE Transactions on Multimedia,
21(2), 416-428. DOI: 10.1109/TMM.2018.
Li, J., Liu, X., Zhang, W., Zhang, M., Song, J., & Sebe, N.
(2020). Spatio-temporal attention networks for action
recognition and detection. IEEE Transactions on
Multimedia, 22(11), 2990-3001. DOI: 10.1109/
TMM.2020.2965434
Li, P., Lu, M., Zhang, Z., Shan, D., & Yang, Y. (2019,
October). A novel spatial-temporal graph for skeleton-
based driver action recognition. In 2019 IEEE
Intelligent Transportation Systems Conference (ITSC)
(pp. 3243-3248). IEEE. DOI: 10.1109/ITSC.2019.
8916929
Liu, Q., Che, X., & Bie, M. (2019). R-STAN: Residual
spatial-temporal attention network for action
recognition. IEEE Access, 7, 82246-82255. DOI:
10.1109/ACCESS.2019.2923651
Machine learning mastery, “Loss and Loss Functions for
Training Deep Learning Neural Networks”, 2019, last
accessed 8/08/2022. [Online]. Available:
https://machinelearningmastery.com/loss-and-loss-
functions-for-training-deep-learning-neural-networks/
Martin, M., Roitberg, A., Haurilet, M., Horne, M., Reiß, S.,
Voit, M., & Stiefelhagen, R. (2019). Drive&act: A
multi-modal dataset for fine-grained driver behavior
recognition in autonomous vehicles. In Proceedings of
the IEEE/CVF International Conference on Computer
Vision (pp. 2801-2810).
Mattivi, R., & Shao, L. (2009, September). Human action
recognition using LBP-TOP as sparse spatio-temporal
feature descriptor. In International Conference on
Computer Analysis of Images and Patterns (pp. 740-
747). Springer, Berlin, Heidelberg. https://doi.org/
10.1007/978-3-642-03767-2_90
Meng, L., Zhao, B., Chang, B., Huang, G., Sun, W., Tung,
F., & Sigal, L. (2019). Interpretable spatio-temporal
attention for video action recognition. In Proceedings of
the IEEE/CVF International Conference on Computer
Vision Workshops (pp. 0-0). https://openaccess.
thecvf.com/content_ICCVW_2019/papers/HVU/Meng
_Interpretable_Spatio-Temporal_Attention_for_Video
_Action_Recognition_ICCVW_2019_paper.pdf
Muhammad, K., Ullah, A., Imran, A. S., Sajjad, M., Kiran,
M. S., Sannino, G., & de Albuquerque, V. H. C. (2021).
Human action recognition using attention-based LSTM
network with dilated CNN features. Future Generation
Computer Systems, 125, 820-830. https://doi.org/
10.1016/j.future.2021.06.045
Nanni, L., Lumini, A., & Brahnam, S. (2012). Survey on
LBP based texture descriptors for image classification.
Expert Systems with Applications, 39(3), 3634-3641.
https://doi.org/10.1016/j.eswa.2011.09.054
NHTSA, “Trafic tech Technology Transfer Series” 2017,
last accessed 07/10/2022 [Online]. Available:
https://www.nhtsa.gov/sites/nhtsa.gov/files/documents
/812396_ttnighttimeseatbeltwa_0.pdf
Niu, Z., Zhong, G., & Yu, H. (2021). A review on the
attention mechanism of deep learning.
Neurocomputing, 452, 48-62. https://doi.org/10.1016/
j.neucom.2021.03.091
Sharma, S., Kiros, R., & Salakhutdinov, R. (2015). Action
recognition using visual attention. arXiv preprint
arXiv:1511.04119. [Online]. Available: http://arxiv.
org/abs/1511.04119
Siami-Namini, S., Tavakoli, N., & Namin, A. S. (2019,
December). The performance of LSTM and BiLSTM in
forecasting time series. In 2019 IEEE International
Conference on Big Data (Big Data) (pp. 3285-3292).
IEEE. DOI: 10.1109/BigData47090.2019.9005997.
DOI: 10.1109/BigData47090.2019.9005997