5 CONCLUSIONS
In this paper, we propose several uncertainty-based
active learning metrics for object detection. They
only require a distribution of classification scores per
detection. Depending on the specific task, an object
detector that will report objects of unknown classes
is also important. Additionally, we propose a sample
weighting scheme to balance selections among clas-
ses.
We evaluate the proposed metrics on the PASCAL
VOC 2012 dataset (Everingham et al., 2010) and offer
quantitative and qualitative results and analysis. We
show that the proposed metrics are able to guide the
annotation process efficiently which leads to superior
performance in comparison to a random selection ba-
seline. In our experimental evaluation, the Sum me-
tric is able to achieve best results overall which can
be attributed to the fact that it tends to select batches
with many single objects in it. However, the targe-
ted scenario is an application with huge amounts of
unlabeled data where we consider the amount of ima-
ges to be evaluated as more critical than the time nee-
ded to draw single bounding boxes. Examples would
be camera streams or camera trap data. To expedite
annotation, our approach could be combined with a
weakly supervised learning approach as presented in
(Papadopoulos et al., 2016). We also showed that our
weighting scheme leads to even increased accuracies.
All presented metrics could be applied to other
deep object detectors, such as the variants of SSD
(Liu et al., 2016), the improved R-CNNs e.g., (Ren
et al., 2015) or the newer versions of YOLO (Red-
mon and Farhadi, 2017). Moreover, our proposed me-
trics are not restricted to deep object detection and
could be applied to arbitrary object detection met-
hods if they fulfill the requirements. It only requires
a complete distribution of classifications scores per
detection. Also the underlying uncertainty measure
could be replaced with arbitrary active learning me-
trics to be aggregated afterwards. Depending on the
specific task, an object detector that will report objects
of unknown classes is also important.
The proposed aggregation strategies also genera-
lize to selection of images based on segmentation re-
sults or any other type of image partition. The re-
sulting scores could also be applied in a novelty de-
tection scenario.
REFERENCES
Abramson, Y. and Freund, Y. (2006). Active learning for
visual object detection. Technical report, University
of California, San Diego.
Beluch, W. H., Genewein, T., N
¨
urnberger, A., and K
¨
ohler,
J. M. (2018). The power of ensembles for active lear-
ning in image classification. In Computer Vision and
Pattern Recognition (CVPR).
Bietti, A. (2012). Active learning for object detection on
satellite images. Technical report, California Institute
of Technology, Pasadena.
Ertekin, S., Huang, J., Bottou, L., and Giles, L. (2007). Le-
arning on the border: active learning in imbalanced
data classification. In Conference on Information and
Knowledge Management.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn,
J., and Zisserman, A. (2010). The pascal visual ob-
ject classes (voc) challenge. International Journal of
Computer Vision (IJCV).
Feng, C., Liu, M.-Y., Kao, C.-C., and Lee, T.-Y. (2017).
Deep active learning for civil infrastructure defect de-
tection and classification. In International Workshop
on Computing in Civil Engineering (IWCCE).
Fu, C.-J. and Yang, Y.-P. (2015). A batch-mode active le-
arning svm method based on semi-supervised cluste-
ring. Intelligent Data Analysis.
Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., and Berg, A. C.
(2017). Dssd: Deconvolutional single shot detector.
arXiv preprint arXiv:1701.06659.
Gal, Y., Islam, R., and Ghahramani, Z. (2017). Deep bay-
esian active learning with image data. arXiv preprint
arXiv:1703.02910.
Girshick, R. (2015). Fast R-CNN. In International Confe-
rence on Computer Vision (ICCV).
Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014).
Rich feature hierarchies for accurate object detection
and semantic segmentation. In Computer Vision and
Pattern Recognition (CVPR).
Hoi, S. C., Jin, R., and Lyu, M. R. (2006). Large-scale text
categorization by batch mode active learning. In In-
ternational Conference on World Wide Web (WWW).
Huang, J., Child, R., Rao, V., Liu, H., Satheesh, S., and
Coates, A. (2016). Active learning for speech re-
cognition: the power of gradients. arXiv preprint
arXiv:1612.03226.
Jain, P. and Kapoor, A. (2009). Active learning for large
multi-class problems. In Computer Vision and Pattern
Recognition (CVPR).
K
¨
ading, C., Freytag, A., Rodner, E., Perino, A., and Den-
zler, J. (2016a). Large-scale active learning with ap-
proximated expected model output changes. In Ger-
man Conference on Pattern Recognition (GCPR).
K
¨
ading, C., Rodner, E., Freytag, A., and Denzler, J.
(2016b). Fine-tuning deep neural networks in con-
tinuous learning scenarios. In ACCV Workshop on
Interpretation and Visualization of Deep Neural Nets
(ACCV-WS).
K
¨
ading, C., Rodner, E., Freytag, A., and Denzler, J.
(2016c). Watch, ask, learn, and improve: A lifelong
learning cycle for visual recognition. In European
Symposium on Artificial Neural Networks (ESANN).
Active Learning for Deep Object Detection
189