Table 3: Configurations for the visual output, including pre-
cision, recall, training and inference time (for a 10.000 ×
10.000 pixel image).
Model Precision Recall Train Infer
V&J 90.64% 81.12% 2h 10m
ACF 90.55% 86.43% 30m 5m
DN19 97.31% 88.58% 24h 2m30s
background by the model but actually containing a
coconut tree).
Comparing the different output images, we clearly
see some expected behaviour. The VJ model suffers
from a higher false positive rate than the ACF model.
This can be explained by the fact that VJ does not
take into account colour information and thus triggers
several detections on coconut tree shadows, whereas
ACF is more robust to this. Comparing the ACF mo-
del to the Darknet19 model, we see that the Darknet19
model has almost no false positive detections, hence
the high precision at a high recall rate. However the
approach still suffers from false negative detections.
We are convinced that this is partly due to the step
size of 50 pixels, used for this evaluation. Decreasing
the step size towards 25 or even 10 pixels, should furt-
her reduce the number of false negative detections.
7 CONCLUSIONS
With this research we have proven both the capabi-
lities of boosted cascade as well as deep learned de-
tection models for coconut tree localisation in aerial
images. Our best boosted cascade performs at an
average precision of 94.56% while our best deep lear-
ning model achieves a top1-accuracy of 97.4%. Alt-
hough our deep learning pipeline evaluates two times
as fast, we reckon that boosted cascades are still in the
race, especially given the lower computational com-
plexity demands, but the high classification accuracy
and speed of deep learning can simply not be ignored.
As future work we suggest taking a look at re-
gion proposal networks, to combine with our classi-
fication deep learning networks. This would reduce
the amount of image patches drastically and make the
complete pipeline even faster. On top of that we also
notice that more recent research, focusses on combi-
ning the best of both worlds, as described in (Ouy-
ang et al., 2017; Zhang et al., 2017). While using
the principle of a boosted cascade, to benefit from
the early rejection principle, the weak classifiers are
built using convolutional neural network architectu-
res, which guarantees a higher average precision in
the long end.
ACKNOWLEDGEMENTS
This work is supported by the KU Leuven, Campus
De Nayer and the Flanders Innovation & Entrepre-
neurship (AIO).
REFERENCES
Abadi, M., Agarwal, A., et al. (2015). TensorFlow: Large-
scale machine learning on heterogeneous systems.
Ahonen, T., Hadid, A., and Pietik
¨
ainen, M. (2004). Face
recognition with local binary patterns. Proceedings of
the European Conference on Computer Vision, pages
469–481.
Bradski, G. and Kaehler, A. (2000). The opencv library.
Doctor Dobbs Journal, 25(11):120–126.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei,
L. (2009). Imagenet: A large-scale hierarchical image
database. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pages 248–
255.
Doll
´
ar, P. (2005). Piotr’s Computer Vision Matlab Toolbox
(PMT). https://github.com/pdollar/toolbox.
Doll
´
ar, P., Belongie, S., J, S., and Perona, P. (2010). The
fastest pedestrian detector in the west. In Proceedings
of the British Machine Vision Conference, volume 2.
Doll
´
ar, P., Tu, Z., Perona, P., and Belongie, S. (2009). In-
tegral channel features. In Proceedings of the British
Machine Vision Conference, volume 2, pages 5–12.
Everingham, M., Van Gool, L., Williams, C. K., Winn,
J., and Zisserman, A. (2010). The pascal visual ob-
ject classes (voc) challenge. International Journal of
Computer Vision, 88(2):303–338.
Iandola, F., Moskewicz, M., Karayev, S., Girshick, R., Dar-
rell, T., and Keutzer, K. (2014). Densenet: Imple-
menting efficient convnet descriptor pyramids. arXiv.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J.,
Girshick, R., Guadarrama, S., and Darrell, T. (2014).
Caffe: Convolutional architecture for fast feature em-
bedding. In Proceedings of the International Confe-
rence on Multimedia, pages 675–678. ACM.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012).
Imagenet classification with deep convolutional neu-
ral networks. In Advances in Neural Information Pro-
cessing Systems, pages 1097–1105.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,
Ramanan, D., Doll
´
ar, P., and Zitnick, C. L. (2014).
Microsoft coco: Common objects in context. In Pro-
ceedings of the European Conference on Computer Vi-
sion, pages 740–755. Springer.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
Fu, C.-Y., and Berg, A. C. (2016). SSD: Single shot
multibox detector. In Proceedings of the European
Conference on Computer Vision, pages 21–37. Sprin-
ger.
Margineantu, D. D. and Dietterich, T. G. (1997). Pruning
adaptive boosting. In Proceedings of the International
VISAPP 2018 - International Conference on Computer Vision Theory and Applications
240