cameras at once, so no correspondences can be
established. For the six-camera scenario specifically,
we expect to solve this by utilizing the fused output
of four spatially distributed Kinect cameras as
ground truth.
The rather long processing time when using the
Argos 3D P100 camera is consistent with the
timings measured for the pmd[vision] S3 cameras:
The Argos3D P100 nominally provides about six
times more points per frame, for which
correspondences have to be determined, which leads
to an increase in processing time from 46ms to about
230ms. However, this calculation is currently
performed on CPU in a single thread so we are
expecting to achieve a large speedup by parallelizing
on CPU and/or GPU. Further optimizations of the
frame rate and image quality are expected by using a
different high-speed ToF camera, the upcoming
Argos 3D P320, which features 12 instead of 2
LEDs for illumination and thereby increases the
effective sensing range.
6 CONCLUSIONS
We have proposed a new approach for pre-
calculating the body point cloud of a human based
on time-delayed ground truth. It features two distinct
processing pipelines: One pipeline processes the
ground truth, that corresponds to a past measurement
frame, and propagates it forward to the current
frame. The other pipeline handles the incoming data
from the faster 3D camera system and calculates a
tracking estimate based on 2D optical flow in
combination with a customized background model
and various refinement steps.
The algorithm has been implemented and
evaluation has been performed on two different
scenarios. Results for the latency minimization
scenario show that the presented approach
consistently achieves very good results for the
evaluated data sets. The distinction between two
different data sets for each evaluation shows that
apart from the initial delay until a tracking is
established, the magnitude of the latency doesn’t
affect the high tracking quality of the algorithm.
While still good, the accuracy of the second scenario
is lower than that of the first scenario and the current
processing time prohibits its intended usage. For this
reason, optimization of the algorithm in terms of
computational costs and the optimization of our test
bed for the second scenario will be addressed as
detailed above.
In addition, we plan to integrate the algorithm
into the full OP:Sense supervision system by pre-
calculating human tracking simultaneously on all six
ToF cameras, based on fused ground truth from four
different Kinect cameras. We envision that the
fusion of the results will further improve the
accuracy and thereby provide a reliable modality to
be used for human-robot interaction. Also, we aim to
apply the algorithm to other kinds of tracking
scenarios using different input modalities.
ACKNOWLEDGEMENTS
This work was funded by the European
Commission’s Seventh Framework program within
the project ’Active Constraints Technologies for Ill-
defined or Volatile Environments (ACTIVE)’ under
grant no. 270460.
REFERENCES
Beyl, T. et al., 2013. Multi Kinect People Detection for
Intuitive and Safe Human Robot Cooperation in the
Operating Room. In ICAR ’13, International
Conference on Advanced Robotics, pp. 1 – 6.
Bradski, G. R., Pisarevsky, V., 2000. Intel’s Computer
Vision Library: Applications in calibration, stereo,
segmentation, tracking, gesture, face and object
recognition. In CVPR’00, IEEE International
conference on Computer Vision and Pattern
Recognition, vol. 2, pp. 796 – 797.
Jóźków, G., et al., 2014. Combined Matching of 2D and
3D Kinect ™ Data to support Indoor Mapping and
Navigation. In Proceedings of Annual Conference of
American Society for Photogrammetry and Remote
Sensing.
Klappstein, J. et al., 2009. Moving Object Segmentation
Using Optical Flow and Depth Information. In
Lecture Notes in Computer Science: Advances in
Image and Video Technology, vol. 5414, pp. 611 –
623.
Moennich, H. et al., 2011. A supervision system for the
intuitive usage of a telemanipulated surgical robotic
setup. In ROBIO ’11, IEEE International conference
on Robotics and Biomimetics, pp. 449 – 454.
Okada, R., Shirai, Y., Miura, J., 2000. Tracking a person
with 3-D Motion by Integrating Optical Flow and
Depth. In Fourth IEEE International Conference on
Automatic Face and Gesture Recognition, pp. 336–
341.
Quigley, M. et al., 2009. ROS: an open source Robot
Operating System. In ICRA ’09, International
Conference on Robotics and Automation Workshop
on Open Source Software.
Rusu, R. B., Cousins, S., 2011. 3D is here: Point Cloud
Library (PCL). In ICRA ’11, International Conference
ContinuousPre-CalculationofHumanTrackingwithTime-delayedGround-truth-AHybridApproachtoMinimizing
TrackingLatencybyCombinationofDifferent3DCameras
129