REFERENCES
Apple. Applying Matte Effects to People in Images and
Video.
Bazarevsky, V., Grishchenko, I., Raveendran, K., Zhu, T.,
Zhang, F., and Grundmann, M. (2020). Blazepose:
On-device real-time body pose tracking. CoRR,
abs/2006.10204.
Dai, X., Wan, A., Zhang, P., Wu, B., He, Z., Wei, Z., Chen,
K., Tian, Y., Yu, M., Vajda, P., et al. (2021). Fbnetv3:
Joint architecture-recipe search using predictor pre-
training. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages
16276–16285.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). Imagenet: A large-scale hierarchical
image database. In 2009 IEEE conference on com-
puter vision and pattern recognition, pages 248–255.
Ieee.
Dibra, E., Jain, H.,
¨
Oztireli, C., Ziegler, R., and Gross,
M. (2016). Hs-nets: Estimating human body shape
from silhouettes with convolutional neural networks.
In 2016 fourth international conference on 3D vision
(3DV), pages 108–117. IEEE.
Dibra, E., Jain, H., Oztireli, C., Ziegler, R., and Gross, M.
(2017). Human shape from silhouettes using genera-
tive hks descriptors and cross-modal neural networks.
In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 4826–4836.
Dukhan, M., Wu, Y., and Lu, H. (2020). Qnnpack: open
source library for optimized mobile deep learning.
https://github.com/pytorch/QNNPACK.
Goyal, P., Doll
´
ar, P., Girshick, R. B., Noordhuis, P.,
Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., and
He, K. (2017). Accurate, large minibatch SGD: train-
ing imagenet in 1 hour. CoRR, abs/1706.02677.
Gupta, A., Doll
´
ar, P., and Girshick, R. B. (2019). LVIS:
A dataset for large vocabulary instance segmentation.
CoRR, abs/1908.03195.
Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., and Xu, C.
(2020). Ghostnet: More features from cheap opera-
tions. In Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition, pages 1580–
1589.
Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B.,
Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V.,
et al. (2019). Searching for mobilenetv3. In Pro-
ceedings of the IEEE/CVF International Conference
on Computer Vision, pages 1314–1324.
Ji, Z., Qi, X., Wang, Y., Xu, G., Du, P., Wu, X., and Wu,
Q. (2019). Human body shape reconstruction from
binary silhouette images. Computer Aided Geometric
Design, 71:231–243.
Kirillov, A., Girshick, R., He, K., and Dollar, P. (2019).
Panoptic feature pyramid networks. In Proceedings
of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR).
Kirillov, A., Wu, Y., He, K., and Girshick, R. (2020).
Pointrend: Image segmentation as rendering. In Pro-
ceedings of the IEEE/CVF conference on computer vi-
sion and pattern recognition, pages 9799–9808.
Knapp, J. (2021). Real-Time Person Segmentation on Mo-
bile Phones. PhD thesis, Wien.
Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J. R. R.,
Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Mal-
loci, M., Duerig, T., and Ferrari, V. (2018). The open
images dataset V4: unified image classification, object
detection, and visual relationship detection at scale.
CoRR, abs/1811.00982.
Li, Y., Luo, A., and Lyu, S. (2020). Fast portrait seg-
mentation with highly light-weight network. In 2020
IEEE International Conference on Image Processing
(ICIP), pages 1511–1515. IEEE.
Liang, Z., Guo, K., Li, X., Jin, X., and Shen, J. (2022).
Person foreground segmentation by learning multi-
domain networks. IEEE Transactions on Image Pro-
cessing, 31:585–597.
Lin, T.-Y., Doll
´
ar, P., Girshick, R., He, K., Hariharan, B.,
and Belongie, S. (2017a). Feature pyramid networks
for object detection. In Proceedings of the IEEE con-
ference on computer vision and pattern recognition,
pages 2117–2125.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll
´
ar, P.
(2017b). Focal loss for dense object detection. In
Proceedings of the IEEE international conference on
computer vision, pages 2980–2988.
Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick,
R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L.,
and Doll
´
ar, P. (2014). Microsoft coco: Common ob-
jects in context. cite arxiv:1405.0312Comment: 1)
updated annotation pipeline description and figures;
2) added new section describing datasets splits; 3) up-
dated author list.
Orts-Escolano, S. and Ehman, J. (2022). Accu-
rate Alpha Matting for Portrait Mode Selfies
on Pixel 6 . https://ai.googleblog.com/2022/01/
accurate-alpha-matting-for-portrait.html.
Park, H., Sj
¨
osund, L. L., Yoo, Y., and Kwak, N. (2019).
Extremec3net: Extreme lightweight portrait segmen-
tation networks using advanced c3-modules. arXiv
preprint arXiv:1908.03093.
Smith, B. M., Chari, V., Agrawal, A., Rehg, J. M., and
Sever, R. (2019). Towards accurate 3d human body
reconstruction from silhouettes. In 2019 Interna-
tional Conference on 3D Vision (3DV), pages 279–
288. IEEE.
Song, D., Tong, R., Chang, J., Yang, X., Tang, M., and
Zhang, J. J. (2016). 3d body shapes estimation from
dressed-human silhouettes. In Computer Graphics Fo-
rum, volume 35, pages 147–156. Wiley Online Li-
brary.
Song, D., Tong, R., Du, J., Zhang, Y., and Jin, Y. (2018).
Data-driven 3-d human body customization with a
mobile device. IEEE Access, 6:27939–27948.
Strohmayer, J., Knapp, J., and Kampel, M. (2021). Efficient
models for real-time person segmentation on mobile
phones. In 2021 29th European Signal Processing
Conference (EUSIPCO), pages 651–655. IEEE.
ALiSNet: Accurate and Lightweight Human Segmentation Network for Fashion E-Commerce
753