Table 2: Comparison to SOA methods.
method POM
a
3DMPP
a
Our method
b
Precision 87.20 97.5 91.28
Recall 95.56 95.5 95.01
a
evaluated on EPFL and PETS sequences (Utasi and
Benedek, 2011), 395 frames with 1554 objects
b
evaluated on EPFL sequence, 179 frames with 661
objects
Table 3: Average processing time of steps (4 views, single
threaded implementation, 2.4GHz Core 2 Quad CPU).
EPFL SZTAKI
foreground detection 51.2ms 32.5ms
forming cones 3.43ms 4.87ms
matching/detection 907us 618us
threaded implementation.
However, foreground detection and forming cones
can be done independently for views, on multicore
platforms or even on smart cameras. Matching and
detection requires all cone information, but is ex-
tremely fast, real-time processing would still be pos-
sible with more views.
Many methods, including POM and 3DMPP,
project parts or whole foreground masks to planes,
which is computationally expensive, and distributing
computation is not possible due to data dependencies.
7 CONCLUSIONS
We proposed a multiview-detection algorithm that re-
tracts 3D position of people using multiple calibrated
and synchronized views. In our case, unlike other al-
gorithms, non-planar ground can be present. This is
done by modeling possible positions of feet with 3D
primitives, cones in scene space and searching for in-
tersections of these cones.
For good precision, height map of ground should
be known. Our method can compute height map on
the fly, reaching high precision after a startup time.
After height map detection we measured preci-
sion and recall values comparable to SOA methods
on commonly used data set. Our algorithm worked
well also on our test videos we made to demonstrate
capabilities of handling non-planar ground.
In the future we plan to examine tracking people
by their leaning leg positions(Havasi et al., 2007).
REFERENCES
Benedek, C. and Szirányi, T. (2008). Bayesian foreground
and shadow detection in uncertain frame rate surveil-
lance videos. IEEE Image Processing, 17(4):608–621.
Berclaz, J., Fleuret, F., and Fua, P. (2006). Robust people
tracking with global trajectory optimization. In IEEE
CVPR, pages 744–750.
EPFL (2011). Multi-camera pedestrian videos.
http://cvlab.epfl.ch/data/pom/.
Eshel, R. and Moses, Y. (2010). Tracking in a dense
crowd using multiple cameras. International Journal
of Computer Vision, 88:129–143.
Fleuret, F., Berclaz, J., Lengagne, R., and Fua, P. (2008).
Multicamera people tracking with a probabilistic oc-
cupancy map. IEEE Trans. Pattern Anal. Mach. In-
tell., 30(2):267–282.
Havasi, L. and Szlavik, Z. (2011). A method for object lo-
calization in a multiview multimodal camera system.
In CVPRW, pages 96–103.
Havasi, L., Szlávik, Z., and Szirányi, T. (2007). Detec-
tion of gait characteristics for scene registration in
video surveillance system. IEEE Image Processing,
16(2):503–510.
Iwase, S. and Saito, H. (2004). Parallel tracking of all soccer
players by integrating detected positions in multiple
view images. In IEEE ICPR, pages 751–754.
Jeong, K. and Jaynes, C. (2008). Object matching in disjoint
cameras using a color transfer approach. Machine Vi-
sion and Applications, 19:443–455.
Khan, S. and Shah, M. (2006). A multiview approach to
tracking people in crowded scenes using a planar ho-
mography constraint. In ECCV 2006, Lecture Notes
in Computer Science, pages 133–146.
Khan, S. and Shah, M. (2009). Tracking multiple occluding
people by localizing on multiple scene planes. PAMI,
31(3):505 –519.
Kim, K. and Davis, L. S. (2006). Multi-camera tracking
and segmentation of occluded people on ground plane
using search-guided particle filtering. In ECCV, pages
98–109.
Mittal, A. and Davis, L. (2001). Unified multi-camera de-
tection and tracking using region-matching. In IEEE
Multi-Object Tracking, pages 3 –10.
Mittal, A. and Davis, L. (2002). M2tracker: A multi-view
approach to segmenting and tracking people in a clut-
tered scene using region-based stereo. In ECCV 2002,
pages 18–33.
Utasi, Á. and Benedek, C. (2011). A 3-D marked point pro-
cess model for multi-view people detection. In IEEE
CVPR, pages 3385–3392.
VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications
680