sidering just the pure gpu processing pipeline (com-
parable to subsection 4.1), is about 40fps with one re-
constructed object beside the background. The fram-
erate including five windows and grabbing during the
reconstruction of one object is around 25fps (in com-
parison with 12.6fps of the original KinFu from PCL).
The framerate is limited by the painting of the win-
dows. The complete algorithm with one object and
deactivated visualization runs with more than 30fps.
If three objects are initialized the reduction of the
framerate is negligible. This is due to the fact that the
bottleneck is the communication and synchronization
between CPU and GPU. Accordingly, the computer
hardware is insufficient utilized. Since we use GPU
streams, more objects just mean a better utilization.
We did some experiments with tracking of humans
as example for non-rigid objects, too. We get good re-
sults if we allow just one moving object. This obser-
vation conforms to (Izadi et al., 2011). Without object
number limit several parts of the body get a separate
voxel grid. The separation is reasonable (individual
movement of the extremities) but some parts become
relative small and may get occluded during the move-
ment. Since our algorithm does not consider relation-
ships between the tracked objects this is not allowed
at the moment. However, as long occlusion is avoid
and the body parts do not get too small the results are
stable.
5 CONCLUSIONS
We extended KinectFusion by the ability to track
and reconstruct several moving objects simultane-
ously. We propose an alternative matching strategy
and some further modifications to the GPU processing
pipeline. The capabilities of our system are demon-
strated with three examples. It was shown, that the
stability of object tracking is enhanced due to sliding
reduction. Furthermore, the robustness of the deter-
mination of the camera poses in scenes with moving
objects is improved. Finally, we give an example in
which, at the beginning a small object can be tracked
during its movement parallel to the image plane. De-
spite the new functionalities our systems is still real-
time capable.
For future work, we plan to focus on the initializa-
tion. First, we would like to move data from the back-
ground voxel grid to the new object voxel grid dur-
ing the initialization. Second, the detection of mov-
ing object pixels during the initialization should rely
on a complete pixel based segmentation of the corre-
spondence map. This would enhance the complete-
ness of the object initialization and allow to relax the
constrains for triggering a new initialization. Beyond
this, we intend to consider the relations between the
objects with technics from the field of articulated ob-
jects. This can further improve the registration re-
sults – especially of small independent moved object
parts – and enable us to implement a voxel grid merge
mechanism for objects which were wrongly initial-
ized twice.
REFERENCES
Curless, B. and Levoy, M. (1996). A volumetric method for
building complex models from range images. In Pro-
ceedings of the 23rd Annual Conference on Computer
Graphics and Interactive Techniques, SIGGRAPH
’96, pages 303–312, New York, NY, USA. ACM.
Endres, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D.,
and Burgard, W. (2012). An evaluation of the RGB-
D SLAM system. In Proc. of the IEEE Int. Conf. on
Robotics and Automation (ICRA), St. Paul, MA, USA.
Gelfand, N., Ikemoto, L., Rusinkiewicz, S., and Levoy, M.
(2003). Geometrically stable sampling for the ICP al-
gorithm. In Fourth International Conference on 3D
Digital Imaging and Modeling (3DIM).
Heredia, F. and Favie, R. (2012). Kinfu Large Scale in PCL.
http://www.pointclouds.org/blog/srcs/fheredia/.
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe,
R., Kohli, P., Shotton, J., Hodges, S., Freeman, D.,
Davison, A., and Fitzgibbon, A. (2011). Kinectfusion:
Real-time 3d reconstruction and interaction using a
moving depth camera. ACM Symposium on User In-
terface Software and Technology.
Klein, G. and Murray, D. (2007). Parallel tracking and
mapping for small AR workspaces. In Proc. Sixth
IEEE and ACM International Symposium on Mixed
and Augmented Reality (ISMAR’07), Nara, Japan.
Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D.,
Kim, D., Davison, A. J., Kohli, P., Shotton, J., Hodges,
S., and Fitzgibbon, A. (2011). KinectFusion: Real-
Time Dense Surface Mapping and Tracking. In IEEE
ISMAR. IEEE.
Rusu, R. B. and Cousins, S. (2011). 3D is here: Point
Cloud Library (PCL). In International Conference on
Robotics and Automation, Shanghai, China.
Sturm, J., Engelhard, N., Endres, F., Burgard, W., and Cre-
mers, D. (2012). A benchmark for the evaluation of
rgb-d slam systems. In Proc. of the International Con-
ference on Intelligent Robot Systems (IROS).
Whelan, T., Kaess, M., Fallon, M., Johannsson, H.,
Leonard, J., and McDonald, J. (2012). Kintinuous:
Spatially extended KinectFusion. In RSS Workshop on
RGB-D: Advanced Reasoning with Depth Cameras,
Sydney, Australia.
KinFuMOT:KinectFusionwithMovingObjectsTracking
657