show that the proposed method can deliver high qual-
ity foreground segmentation masks compare to the
ones of (Kammerl et al., 2012) and cloud to cloud
subtraction. We were able to eliminate the noise and
preserve only the moving person in the scene by im-
proving the approach of (Kammerl et al., 2012) in
algorithm 1. Results in the previous section also
showed that the accuracy of the foreground strongly
reflects on the accuracy of the ellipsoid. Noise in
the surrounding can provide misleading information
which does not help the monitoring process and even-
tually will result false interpretation of the behavior.
We were able to achieve a deviation less than 15%
from the ground truth in comparison to the other ap-
proaches, most of the time retaining a deviation larger
than 40% from ground truth. We also tried to filter
out these noisy blobs from the processed clouds using
different 3D filters but in all cases the resulting fore-
ground was very much affected by the noise in the
scene.
In the preprocessing steps, calibration was manda-
tory for maximizing reliability of the produced re-
sults. The internal parameters were mainly used for
generating the point clouds and also for projecting the
3D points on a binary image as discussed in section
2.3. Bundle adjustment was performed keeping the
internal parameters fixed in the convergence process
optimizing only the external values of the cameras.
Future research involves enhancing the quality of
the existing foreground so it remains invariant to noise
in the point cloud. This is an essential step because
the accuracy of the ellipsoid is highly dependent from
the accuracy of the foreground. Taking advantage of
the multi camera configuration, all data extracted by
each sensor could be fused in order to increase the
confidence and the quality of the foreground. More-
over, a multi camera approach could also handle mul-
tiple human instances in the scene and tackle the prob-
lem of occlusions. In terms of computational perfor-
mance, this would require having one computer per
sensor due to the amount of power required to man-
age all sensors simultaneously.
We have acquired many data sets from the train
experiment, containing several scenarios of everyday
situations in a wagon. This dataset will become pub-
licly available in the future, containing several RGBD
data from different scenarios, calibration parameters
for every sensor and benchmark information.
REFERENCES
Baum, M. and Hanebeck, U. D. (2013). Extended object
tracking with random hypersurface models. CoRR,
abs/1304.5084.
Brown, D. C. (1971). Close-range camera calibration. Pho-
togrammetric Engineering, 37(8):855–866.
Buys, K., Cagniart, C., Baksheev, A., De Laet, T., De Schut-
ter, J., and Pantofaru, C. (2014). An adaptable system
for rgb-d based human body detection and pose esti-
mation. J. Vis. Comun. Image Represent., 25(1):39–
52.
Faion, F., Baum, M., and Hanebeck, U. D. (2012). Tracking
3D Shapes in Noisy Point Clouds with Random Hy-
persurface Models. In Proceedings of the 15th Inter-
national Conference on Information Fusion (Fusion
2012), Singapore.
Hegger, F., Hochgeschwender, N., Kraetzschmar, G., and
Ploeger, P. (2013). People Detection in 3d Point
Clouds Using Local Surface Normals, volume 7500 of
Lecture Notes in Computer Science, book section 15,
pages 154–165. Springer Berlin Heidelberg.
Kammerl, J., Blodow, N., Rusu, R. B., Gedikli, S., Beetz,
M., and Steinbach, E. (2012). Real-time compression
of point cloud streams. In IEEE International Confer-
ence on Robotics and Automation (ICRA), Minnesota,
USA.
Lepetit, V., F.Moreno-Noguer, and P.Fua (2009). Epnp: An
accurate o(n) solution to the pnp problem. Interna-
tional Journal Computer Vision, 81(2).
Moshtagh, N. (2005). Minimum volume enclosing ellip-
soid.
Munaro, M., Basso, F., and Menegatti, E. (2012). Tracking
people within groups with rgb-d data. In IROS, pages
2101–2107. IEEE.
Shotton, J., Girshick, R., Fitzgibbon, A., Sharp, T., Cook,
M., Finocchio, M., Moore, R., Kohli, P., Criminisi,
A., Kipman, A., and Blake, A. (2013). Efficient hu-
man pose estimation from single depth images. IEEE
Transactions on Pattern Analysis and Machine Intel-
ligence, 35(12):2821–2840.
Sigalas, M., Pateraki, M., Oikonomidis, I., and Trahanias,
P. (2013). Robust model-based 3d torso pose estima-
tion in rgb-d sequences. In The IEEE International
Conference on Computer Vision (ICCV) Workshops.
Todd, M. J. and Yildirim, E. A. (2007). On khachiyan’s
algorithm for the computation of minimum-
volume enclosing ellipsoids. Discrete Appl. Math.,
155(13):1731–1744.
Ziegler, J., Nickel, K., and Stiefelhagen, R. (2006). Track-
ing of the articulated upper body on multi-view stereo
image sequences. In CVPR (1), pages 774–781. IEEE
Computer Society.
VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications
200