Towards Deep People Detection using CNNs Trained on Synthetic Images
Roberto Martín-López, David Fuentes-Jiménez, Sara Luengo-Sánchez, Cristina Losada-Gutiérrez, Marta Marrón-Romera, Carlos Luna
2020
Abstract
In this work, we propose a people detection system that uses only depth information, provided by an RGB-D camera in frontal position. The proposed solution is based on a Convolutional Neural Network (CNN) with an encoder-decoder architecture, formed by ResNet residual layers, that have been widely used in detection and classification tasks. The system takes a depth map as input, generated by a time-of-flight or a structured-light based sensor. Its output is a probability map (with the same size of the input) where each detection is represented as a Gaussian function, whose mean is the position of the person’s head. Once this probability map is generated, some refinement techniques are applied in order to improve the detection precision. During the system training process, there have only been used synthetic images generated by the software Blender, thus avoiding the need to acquire and label large image datasets. The described system has been evaluated using both, synthetic and real images acquired using a Microsoft Kinect II camera. In addition, we have compared the obtained results with those from other works of the state-of-the-art, proving that the results are similar in spite of not having used real data during the training procedure.
DownloadPaper Citation
in Harvard Style
Martín-López R., Fuentes-Jiménez D., Luengo-Sánchez S., Losada-Gutiérrez C., Marrón-Romera M. and Luna C. (2020). Towards Deep People Detection using CNNs Trained on Synthetic Images. In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2020) - Volume 5: VISAPP; ISBN 978-989-758-402-2, SciTePress, pages 225-232. DOI: 10.5220/0008879102250232
in Bibtex Style
@conference{visapp20,
author={Roberto Martín-López and David Fuentes-Jiménez and Sara Luengo-Sánchez and Cristina Losada-Gutiérrez and Marta Marrón-Romera and Carlos Luna},
title={Towards Deep People Detection using CNNs Trained on Synthetic Images},
booktitle={Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2020) - Volume 5: VISAPP},
year={2020},
pages={225-232},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008879102250232},
isbn={978-989-758-402-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2020) - Volume 5: VISAPP
TI - Towards Deep People Detection using CNNs Trained on Synthetic Images
SN - 978-989-758-402-2
AU - Martín-López R.
AU - Fuentes-Jiménez D.
AU - Luengo-Sánchez S.
AU - Losada-Gutiérrez C.
AU - Marrón-Romera M.
AU - Luna C.
PY - 2020
SP - 225
EP - 232
DO - 10.5220/0008879102250232
PB - SciTePress