tions is not limited to a particular network, but is
common to the considered detectors. These state-
of-the-art models trained on high-quality image data-
sets make unreliable predictions when they encounter
compression artifacts in their inputs due to an inabi-
lity to generalize from their sharp training sets. To
create object detectors that are more robust to these
degradations, new designs may need to be introdu-
ced. One obvious solution to this problem is to fine-
tune/train these detectors on images with artifacts,
which may boost their performance when applied on
video frames, but perhaps this may decrease their per-
formance on high-quality images. An investigation of
the benefits of fine-tunning with video frames is left
for future work.
Our analysis provides guidance for developing
machine vision systems in practical, non-idealized,
applications where quality distortions may be present.
We expect our findings to be relevant in make decisi-
ons on video compression in the design of automated
video surveillance systems.
ACKNOWLEDGEMENTS
This work was performed in part through the finan-
cial assistance award, Multi-tiered Video Analytics
for Abnormality Detection and Alerting to Improve
Response Time for First Responder Communications
and Operations (Grant No. 60NANB17D178), from
U.S. All statements of fact, opinion or conclusions
contained herein are those of the authors and should
not be construed as representing the official views or
policies of the sponsors Department of Commerce,
National Institute of Standards and Technology.
REFERENCES
Dai, J., Li, Y., He, K., and Sun, J. (2016). R-fcn: Object de-
tection via region-based fully convolutional networks.
In Neural Information Processing Systems (NIPS).
Dodge, S. and Karam, L. (2016). Understanding how image
quality affects deep neural networks. arXiv preprint
arXiv:1604.04004.
Donahue, J., Hendricks, L. A., Guadarrama, S., Rohrbach,
M., Venugopalan, S., Saenko, K., and Darrell, T.
(2015). Long-term recurrent convolutional networks
for visual recognition and description. In CVPR.
Everingham, M., Eslami, S. M. A., Gool, L. V., Williams, C.
K. I., Winn, J., and Zisserman, A. (2015). The pascal
visual object classes challenge: A retrospective. In-
ternational Journal of Computer Vision, 111:98136.
Everingham, M., Gool, L. V., Williams, C. K. I., Winn,
J., and Zisserman, A. (2010). The pascal visual ob-
ject classes (voc) challenge. International Journal of
Computer Vision, 88:303–338.
Girshick, R. (2015). Fast r-cnn. In International Conference
on Computer Vision (ICCV).
Goodfellow, I. J., Shlens, J., and Szegedy, C. (2015). Ex-
plaining and harnessing adversarial examples. In In-
ternational Conference on Learning Representations
(ICLR).
He, K., Gkioxari, G., Dollar, P., and Girshick, R. (2017).
Mask r-cnn. In International Conference on Computer
Vision (ICCV).
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep re-
sidual learning for image recognition. arXiv preprint
arXiv:1512.03385.
Hoiem, D., Chodpathumwan, Y., and Dai, Q. (2012). Diag-
nosing error in object detectors. In European Confe-
rence on Computer Vision (ECCV).
Juurlink, B., Alvarez-Mesa, M., Chi, C. C., Azevedo, A.,
Meenderinck, C., and Ramirez, A. (2012). Under-
standing the application: An overview of the h.264
standard. Scalable Parallel Programming Applied to
H.264/AVC Decoding, pages 5–15.
Karahan, S., Yldrm, M. K., Krtac, K., Rende, F. S., Butun,
G., and Ekenel, H. K. (2016). How image degrada-
tions affect deep cnn-based face recognition? arXiv
preprint arXiv:1608.05246.
Karam, L. J. and Zhu, T. (2015). Quality labeled faces in the
wild (qlfw): a database for studying face recognition
in real-world environments. In International Society
for Optics and Photonics, volume 9394.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012).
Imagenet classification with deep convolutional neu-
ral networks. In Advances in neural information pro-
cessing systems, pages 1097–1105.
Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B.,
and Belongie, S. (2017a). Feature pyramid networks
for object detection. In CVPR.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Dollar, P.
(2017b). Focal loss for dense object detection. In
International Conference on Computer Vision (ICCV).
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,
Ramanan, D., Dollar, P., and Zitnick, C. L. (2014).
Microsoft coco: Common objects in context. In Euro-
pean Conference on Computer Vision (ECCV), pages
740–755.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu,
C.-Y., and Berg, A. C. (2016). Ssd: Single shot mul-
tibox detector. In European Conference on Computer
Vision (ECCV).
Redmon, J. and Farhadi, A. (2018). Yolov3: An incremental
improvement. arXiv preprint arXiv:1804.02767.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster
r-cnn: Towards real-time object detection with region
proposal networks. In Neural Information Processing
Systems (NIPS).
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan,
D., Goodfellow, I., and Fergus, R. (2014). Intriguing
properties of neural networks. In CVPR.
Understanding How Video Quality Affects Object Detection Algorithms
103