Authors:
Roberto Martín-López
;
David Fuentes-Jiménez
;
Sara Luengo-Sánchez
;
Cristina Losada-Gutiérrez
;
Marta Marrón-Romera
and
Carlos Luna
Affiliation:
Department of Electronics. University of Alcalá, Politechnics School, Campus Universitario S/N, Alcalá de Henares, Spain
Keyword(s):
People Detection, Synthetic images, Convolutional Neural Networks, Depth Images.
Abstract:
In this work, we propose a people detection system that uses only depth information, provided by an RGB-D camera in frontal position. The proposed solution is based on a Convolutional Neural Network (CNN) with an encoder-decoder architecture, formed by ResNet residual layers, that have been widely used in detection and classification tasks. The system takes a depth map as input, generated by a time-of-flight or a structured-light based sensor. Its output is a probability map (with the same size of the input) where each detection is represented as a Gaussian function, whose mean is the position of the person’s head. Once this probability map is generated, some refinement techniques are applied in order to improve the detection precision. During the system training process, there have only been used synthetic images generated by the software Blender, thus avoiding the need to acquire and label large image datasets. The described system has been evaluated using both, synthetic and real
images acquired using a Microsoft Kinect II camera. In addition, we have compared the obtained results with those from other works of the state-of-the-art, proving that the results are similar in spite of not having used real data during the training procedure.
(More)