AP scores across the dataset corpus.
We encountered a significant challenge adjusting
the optimal training time while building a universal
template for a wide range of possible real-world use
cases. This also affected the learning rate sched-
uler. To solve this we introduced an additional iter-
ation patience parameters for early stopping and Re-
duceOnPlateau along with the epoch patience param-
eter. This enabled us to balance the requirements for
small and large datasets while maintaining optimal
training times irrespective of the dataset sizes.
Our efforts result in 3 (one per 3 different
performance-accuracy regimes) dataset-agnostic tem-
plates for object detection training, that provide a
strong baseline on a wide variety of datasets and can
be deployed on CPU using the OpenVINO™ toolkit.
REFERENCES
Anisimov, D. and Khanova, T. (2017). Towards lightweight
convolutional neural networks for object detection. In
2017 14th IEEE international conference on advanced
video and signal based surveillance (AVSS), pages 1–
8. IEEE.
Cai, Z. and Vasconcelos, N. (2018). Cascade r-cnn: Delving
into high quality object detection. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 6154–6162.
Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X.,
Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng,
D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu,
R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy,
C. C., and Lin, D. (2019). MMDetection: Open mm-
lab detection toolbox and benchmark. arXiv preprint
arXiv:1906.07155.
Crawshaw, M. (2020). Multi-task learning with deep
neural networks: A survey. arXiv preprint
arXiv:2009.09796.
Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. (2021). Yolox:
Exceeding yolo series in 2021. arXiv preprint arXiv:
Arxiv-2107.08430.
Gulrajani, I. and Lopez-Paz, D. (2021). In search of lost
domain generalization. In International Conference
on Learning Representations.
H
¨
ani, N., Roy, P., and Isler, V. (2020). Minneapple: a bench-
mark dataset for apple detection and segmentation.
IEEE Robotics and Automation Letters, 5(2):852–858.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Huang, W. and Wei, P. (2019). A pcb dataset for de-
fects detection and classification. arXiv preprint
arXiv:1901.08204.
Li, S., Yan, Q., and Liu, P. (2020). An efficient fire de-
tection method based on multiscale feature extrac-
tion, implicit deep supervision and channel attention
mechanism. IEEE Transactions on Image Processing,
29:8467–8475.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,
Ramanan, D., Doll
´
ar, P., and Zitnick, C. L. (2014).
Microsoft coco: Common objects in context. In Euro-
pean conference on computer vision, pages 740–755.
Springer.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu,
C.-Y., and Berg, A. C. (2016). Ssd: Single shot multi-
box detector. In Leibe, B., Matas, J., Sebe, N., and
Welling, M., editors, Computer Vision – ECCV 2016,
pages 21–37, Cham. Springer International Publish-
ing.
Prechelt, L. (2000). Early stopping - but when?
Prokofiev, K. and Sovrasov, V. (2021). Towards effi-
cient and data agnostic image classification training
pipeline for embedded systems.
Redmon, J. and Farhadi, A. (2018). Yolov3: An incre-
mental improvement. arXiv preprint arXiv: Arxiv-
1804.02767.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster
r-cnn: Towards real-time object detection with region
proposal networks. Advances in neural information
processing systems, 28:91–99.
Ridnik, T., Ben-Baruch, E., Noy, A., and Zelnik-Manor,
L. (2021). Imagenet-21k pretraining for the masses.
arXiv preprint arXiv:2104.10972.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and
Chen, L.-C. (2018). Mobilenetv2: Inverted residu-
als and linear bottlenecks. In Proceedings of the IEEE
conference on computer vision and pattern recogni-
tion, pages 4510–4520.
Santos, T. T., de Souza, L. L., dos Santos, A. A., and Avila,
S. (2020). Grape detection, segmentation, and track-
ing using deep neural networks and three-dimensional
association. Computers and Electronics in Agricul-
ture, 170:105247.
Tian, Z., Shen, C., Chen, H., and He, T. (2019). Fcos:
Fully convolutional one-stage object detection. In
2019 IEEE/CVF International Conference on Com-
puter Vision (ICCV), pages 9626–9635.
Triantafillou, E., Larochelle, H., Zemel, R., and Du-
moulin, V. (2021). Learning a universal template
for few-shot dataset generalization. arXiv preprint
arXiv:2105.07029.
Wang, C., Bochkovskiy, A., and Liao, H. Scaled-yolov4:
Scaling cross stage partial network. arxiv 2020. arXiv
preprint arXiv:2011.08036.
Yang, R. and Yu, Y. (2021). Artificial convolutional neural
network in object detection and semantic segmenta-
tion for medical imaging analysis. Frontiers in Oncol-
ogy, 11:573.
Ye, T., Zhang, X., Zhang, Y., and Liu, J. (2020). Railway
traffic object detection using differential feature fusion
convolution neural network. IEEE Transactions on
Intelligent Transportation Systems, 22(3):1375–1387.
Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni,
L. M., and Shum, H.-Y. (2022). Dino: Detr with im-
proved denoising anchor boxes for end-to-end object
detection. arXiv preprint arXiv:2203.03605.
How to Train an Accurate and Efficient Object Detection Model on any Dataset
777