assumed to be more effective in improving the local-
ization accuracy of unknown class objects or using a
robust image similarity index for the misalignment of
the bounding box.
We also show some the visual results of the
greedy-NMS and the proposed method for compari-
son. As shown in Fig. 1, “toothpaste”, which is an un-
known class object to be detected, was removed when
greedy-NMS was used (Fig. 1a), whereas it was de-
tected in the position indicated by the green box in the
proposed method (Fig. 1b).
7 CONCLUSION
In this paper, in addition to the IoU of the bounding
boxes, we present an NMS method using the image
similarity index of the images in the two bounding
boxes. To evaluate our method’s ability to detect un-
known class objects, we constructed a new dataset
including unknown class objects. Our experiment
shows that the proposed method can reduce the num-
ber of unknown class objects mistakenly removed by
NMS. In the future, we plan to develop an effective
feature extraction method for unknown class objects
and to use it with NMS.
REFERENCES
Bodla, N., Singh, B., Chellappa, R., and Davis, L. S. (2017).
Soft-nms–improving object detection with one line of
code. In ICCV, pages 5561–5569.
Cai, Z. and Vasconcelos, N. (2018). Cascade r-cnn: Delving
into high quality object detection. In CVPR, pages
6154–6162.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). Imagenet: A large-scale hierarchical
image database. In CVPR, pages 248–255. IEEE.
Girshick, R. (2015). Fast r-cnn. In ICCV, pages 1440–1448.
IEEE.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In CVPR, pages
770–778. IEEE.
Hosang, J., Benenson, R., and Schiele, B. (2017). Learning
non-maximum suppression. In CVPR, pages 4507–
4515.
Huang, X., Ge, Z., Jie, Z., and Yoshie, O. (2020). Nms
by representative region: Towards crowded pedestrian
detection by proposal pairing. In CVPR, pages 10750–
10759.
Lai, K., Bo, L., Ren, X., and Fox, D. (2011). A large-scale
hierarchical multi-view rgb-d object dataset. In ICRA,
pages 1817–1824. IEEE.
Lin, T.-Y., Doll
´
ar, P., Girshick, R., He, K., Hariharan, B.,
and Belongie, S. (2017a). Feature pyramid networks
for object detection. In CVPR, volume 1, page 4.
IEEE.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll
´
ar, P.
(2017b). Focal loss for dense object detection. In
ICCV, pages 2980–2988.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ra-
manan, D., Doll
´
ar, P., and Zitnick, C. L. (2014). Mi-
crosoft coco: Common objects in context. In ECCV,
pages 740–755. Springer.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
Fu, C.-Y., and Berg, A. C. (2016). Ssd: Single shot
multibox detector. In ECCV, pages 21–37. Springer.
Liu, Y., Liu, L., Rezatofighi, H., Do, T.-T., Shi, Q.,
and Reid, I. (2019). Learning pairwise relationship
for multi-object detection in crowded scenes. arXiv
preprint arXiv:1901.03796.
Redmon, J., Divvala, S. K., Girshick, R. B., and Farhadi, A.
(2015). You only look once: Unified, real-time object
detection. CoRR, abs/1506.02640.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster
r-cnn: Towards real-time object detection with region
proposal networks. In NIPS, pages 91–99.
Uijlings, J. R., Van De Sande, K. E., Gevers, T., and Smeul-
ders, A. W. (2013). Selective search for object recog-
nition. International Journal of Computer Vision,
104(2):154–171.
Zhou, X., Wang, D., and Kr
¨
ahenb
¨
uhl, P. (2019). Objects as
points. arXiv preprint arXiv:1904.07850.
Non-Maximum Suppression for Unknown Class Objects using Image Similarity
449