Howard, A. G., Zhu, M., Chen, B., Kalenichenko,
D., Wang, W., Weyand, T., Andreetto, M., and
Adam, H. (2017). Mobilenets: Efficient convolu-
tional neural networks for mobile vision applications.
arXiv:1704.04861 [cs].
Howard, J. (2020). Imagenette and imagewood. Online:
https://github.com/fastai/imagenette/.
Hu, J., Shen, L., Albanie, S., Sun, G., and Wu, E. (2019).
Squeeze-and-excitation networks. arXiv:1709.01507.
Huang, G., Liu, Z., van der Maaten, L., and Weinberger,
K. Q. (2018). Densely connected convolutional net-
works. arXiv:1608.06993 [cs].
Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K.,
Dally, W. J., and Keutzer, K. (2016). Squeezenet:
Alexnet-level accuracy with 50x fewer parameters and
¡0.5mb model size. arXiv:1602.07360 [cs].
Jastrzebski, S., Arpit, D., Ballas, N., Verma, V., Che, T., and
Bengio, Y. (2018). Residual connections encourage
iterative inference.
Jiang, P., Ergu, D., Liu, F., Cai, Y., and Ma, B. (2022). A re-
view of yolo algorithm developments. Procedia Com-
puter Science, 199:1066–1073.
Jocher, G., Chaurasia, A., and Qiu, J. (2023). Ultralytics
yolov8. https://github.com/ultralytics/ultralytics.
Kingma, D. P. and Ba, J. (2017). Adam: A method for
stochastic optimization. arXiv:1412.6980 [cs].
Krizhevsky, A. Learning multiple layers of features from
tiny images.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
agenet classification with deep convolutional neural
networks. In Advances in Neural Information Pro-
cessing Systems, volume 25. Curran Associates, Inc.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard,
R. E., Hubbard, W., and Jackel, L. D. (1989). Back-
propagation applied to handwritten zip code recogni-
tion. Neural Computation, 1(4):541–551.
Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998).
Gradient-based learning applied to document recogni-
tion. Proceedings of the IEEE, 86(11):2278–2324.
McMahan, B., Moore, E., Ramage, D., Hampson, S., and
y Arcas, B. A. (2017). Communication-efficient learn-
ing of deep networks from decentralized data. In Ar-
tificial intelligence and statistics, pages 1273–1282.
PMLR.
Mohimont, L. (2023). Deep learning for post-harvest grape
diseases detection. In Workshop Proceedings of the
19th International Conference on Intelligent Environ-
ments (IE2023), volume 32, page 157. IOS Press.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and
Chen, L.-C. (2019). Mobilenetv2: Inverted residuals
and linear bottlenecks. arXiv:1801.04381 [cs].
Shao, J. and Cheng, Q. (2021). E-fcnn for tiny fa-
cial expression recognition. Applied Intelligence,
51(1):549–559.
Simonyan, K. and Zisserman, A. (2015). Very deep con-
volutional networks for large-scale image recognition.
arXiv:1409.1556 [cs].
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,
Anguelov, D., Erhan, D., Vanhoucke, V., and Rabi-
novich, A. (2014). Going deeper with convolutions.
arXiv:1409.4842 [cs].
Tavanaei, A. (2020). Embedded encoder-decoder
in convolutional networks towards explainable ai.
arXiv:2007.06712 [cs].
Terven, J. and Cordova-Esparza, D. (2023). A com-
prehensive review of yolo architectures in com-
puter vision: From yolov1 to yolov8 and yolo-
nas. Machine Learning and Knowledge Extraction,
5(4):1680–1716. arXiv:2304.00501 [cs].
Vasu, P. K. A., Gabriel, J., Zhu, J., Tuzel, O., and Ranjan,
A. (2023). Mobileone: An improved one millisecond
mobile backbone. arXiv:2206.04040 [cs].
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L. u., and Polosukhin,
I. (2017). Attention is all you need. In Guyon,
I., Luxburg, U. V., Bengio, S., Wallach, H., Fer-
gus, R., Vishwanathan, S., and Garnett, R., editors,
Advances in Neural Information Processing Systems,
volume 30. Curran Associates, Inc.
Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y.,
Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W., and Xiao,
B. (2020). Deep high-resolution representation learn-
ing for visual recognition. arXiv:1908.07919 [cs].
Wu, T., Tang, S., Zhang, R., and Zhang, Y. (2019). Cgnet:
A light-weight context guided network for semantic
segmentation. arXiv:1811.08201 [cs].
Xiao, H., Rasul, K., and Vollgraf, R. (2017). Fashion-
mnist: a novel image dataset for benchmarking ma-
chine learning algorithms. arXiv:1708.07747.
Zeiler, M. D. and Fergus, R. (2014). Visualizing and un-
derstanding convolutional networks. In Fleet, D., Pa-
jdla, T., Schiele, B., and Tuytelaars, T., editors, Com-
puter Vision – ECCV 2014, Lecture Notes in Com-
puter Science, page 818–833, Cham. Springer Inter-
national Publishing.
Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017).
Pyramid scene parsing network. (arXiv:1612.01105).
arXiv:1612.01105 [cs].
CLOSER 2024 - 14th International Conference on Cloud Computing and Services Science
310