cations as complex as garment detection in catwalk
videos. We showed how some of the assumptions
made in existing available datasets, such as Modanet,
are not suitable for real-world applications. We ar-
gued that to use the models trained on these datasets
in real-world applications, we might need to introduce
new assumptions to the training procedure that might
not improve the results on the original dataset but in-
crease the accuracy of the application. Finally, we
have presented a relative-benchmarking framework to
compare the accuracy of different methods for our ap-
plication without the need for extensive annotations.
As discussed, we were not able to solve all the chal-
lenges of this problem using off-the-shelf methods
(Robust garment classification) and we believe that
addressing these problems can only be done by build-
ing custom and sophisticated models.
What we discussed in this paper can be applied
to almost any computer vision application with sim-
ilar properties. In any application, one should inves-
tigate if the assumption made in the research datasets
and code-bases are relevant to the application. At the
same time, the relative-benchmark proposed in this
paper can be a valuable tool for examining these as-
sumptions and finding a correct solution to the prob-
lem.
REFERENCES
Cychnerski, J., Brzeski, A., Boguszewski, A., Mar-
molowski, M., and Trojanowicz, M. (2017). Clothes
detection and classification using convolutional neural
networks. In 2017 22nd IEEE International Confer-
ence on Emerging Technologies and Factory Automa-
tion (ETFA), pages 1–8.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). Imagenet: A large-scale hierarchical
image database. In 2009 IEEE conference on com-
puter vision and pattern recognition, pages 248–255.
Ieee.
Ge, Y., Zhang, R., Wu, L., Wang, X., Tang, X., and Luo, P.
(2019). A versatile benchmark for detection, pose esti-
mation, segmentation and re-identification of clothing
images. CVPR.
He, K., Gkioxari, G., Doll
´
ar, P., and Girshick, R. (2017).
Mask r-cnn. In Proceedings of the IEEE international
conference on computer vision, pages 2961–2969.
Kucer, M. and Murray, N. (2019). A detect-then-retrieve
model for multi-domain fashion item retrieval. In Pro-
ceedings of the IEEE Conference on Computer Vision
and Pattern Recognition Workshops, pages 0–0.
Lin, T., Maire, M., Belongie, S. J., Bourdev, L. D., Girshick,
R. B., Hays, J., Perona, P., Ramanan, D., Doll
´
ar, P.,
and Zitnick, C. L. (2014). Microsoft COCO: common
objects in context. CoRR, abs/1405.0312.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
Fu, C.-Y., and Berg, A. C. (2016a). Ssd: Single shot
multibox detector. In European conference on com-
puter vision, pages 21–37. Springer.
Liu, Z., Luo, P., Qiu, S., Wang, X., and Tang, X. (2016b).
Deepfashion: Powering robust clothes recognition and
retrieval with rich annotations. In Proceedings of
IEEE Conference on Computer Vision and Pattern
Recognition (CVPR).
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,
Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,
Antiga, L., Desmaison, A., Kopf, A., Yang, E., De-
Vito, Z., Raison, M., Tejani, A., Chilamkurthy, S.,
Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019).
Pytorch: An imperative style, high-performance deep
learning library. In Wallach, H., Larochelle, H.,
Beygelzimer, A., d'Alch
´
e-Buc, F., Fox, E., and Gar-
nett, R., editors, Advances in Neural Information Pro-
cessing Systems 32, pages 8024–8035. Curran Asso-
ciates, Inc.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.
(2016). You only look once: Unified, real-time object
detection. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 779–
788.
Redmon, J. and Farhadi, A. (2018). Yolov3: An incremental
improvement. CoRR, abs/1804.02767.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster R-
CNN: Towards real-time object detection with region
proposal networks. In Advances in Neural Informa-
tion Processing Systems (NIPS).
Sidnev, A., Trushkov, A., Kazakov, M., Korolev, I., and
Sorokin, V. (2019). Deepmark: One-shot clothing
detection. In The IEEE International Conference on
Computer Vision (ICCV) Workshops.
Sou
ˇ
cek, T., Moravec, J., and Loko
ˇ
c, J. (2019). Transnet: A
deep network for fast detection of common shot tran-
sitions. arXiv preprint arXiv:1906.03363.
Zheng, S., Yang, F., Kiapour, M. H., and Piramuthu, R.
(2018). Modanet: A large-scale street fashion dataset
with polygon annotations. In ACM Multimedia.
Zhou, X., Wang, D., and Kr
¨
ahenb
¨
uhl, P. (2019). Objects as
points. CoRR, abs/1904.07850.
Garment Detection in Catwalk Videos
479