Figure 9: Examples of detections (cyan) and the detection
that match to GPS position of the car (yellow) and the corre-
sponding foreground background segmentation (right). The
green border indicates the area in which the search takes
place.
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
0
1
2
3
4
5
6
x, distance error [m]
f(x), estimated pdf [%]
Figure 10: Parzen estimate (h = 0.02) of density for euclid-
ian distances between middle of rectangles from detection
and GPS.
rectangles (detected and GPS), in the cases where it
was considered to be detected, was around 0.5m. A
Parzen window estimated density (Parzen, 1962) for
this distance can be found in Fig. 10.
4 CONCLUSIONS
A system searching for a specific 3D shape, a
car in this case, has been presented. The pro-
posed methodology utilizes camera calibration, a de-
fined search space in the ground-plane, and fore-
ground/background segmentation. Given this, the 3D
object, with additional context, is proposed to be uti-
lized in order to find a score for detection. Further,
a non-maximum suppression on rotated rectangles in
the ground plane is conducted to yield final detec-
tions. The system has been applied to real data with
mixed traffic. Ground truth for one car in this data
could be extracted by the use of a GPS. Experiments
on this real data indicate that the car could be de-
tected in 91.4% of the time it was visible and inside
the search area. Furthermore, detections matching the
ground-truth has an average error of 0.5m.
5 DISCUSSION AND FUTURE
WORK
While the results are promising, improvements to the
proposed framework to handle more complexity and
improvement of accuracy is here discussed. For
starters, currently only one model has been used, a
sedan car, this should be extended with more relevant
3D shapes (vans, trucks, pedestrians, bicyclists etc).
A straight forward way to perform this is to use the
system described up to the Non-Maximum Suppres-
sion (NMS) for serval 3D shapes and then perform
NMS for all objects.
Another extension is to place more cameras to bet-
ter handle occlusions. Different approaches could be
adopted here. One way could be to run the whole
system up to NMS for all views. This way a score
fusion could be adopted before NMS, possibly with
some weighting, to produce scores taking into ac-
count scores from all views.
The system propped here does not perform any
temporal processing. One possibility is to extend the
system with a following tracking and thus making
temporal assignments and smoothing. Given tracks to
an object, yet another extension could be to adjust a
detected 3D model further by optimizing the position,
the angle, and the 3D shape. For example, by allow-
ing the 3D points, which defines the shape, freedom
to move with some constraints.
REFERENCES
Ard¨o, H. and
˚
Astr¨om, K. (2009). Bayesian formulation
of image patch matching using cross-correlation. In
Third ACM/IEEE International Conference on Dis-
tributed Smart Cameras, pages 1–8.
Ard¨o, H. and Sv¨ard, L. (2014). Bayesian formulation of gra-
dient orientation matching. Submitted to CVPR 2014.
Carr, P., Sheikh, Y., and Matthews, I. (2012). Monocu-
lar object detection using 3d geometric primitives. In
Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., and
Schmid, C., editors, Computer Vision ECCV 2012,
volume 7572 of Lecture Notes in Computer Science,
pages 864–878. Springer Berlin Heidelberg.
Doll´ar, P., Appel, R., and Kienzle, W. (2012). Crosstalk
cascades for frame-rate pedestrian detection. In Pro-
ceedings of the 12th European conference on Com-
puter Vision - Volume Part II, ECCV’12, pages 645–
659, Berlin, Heidelberg. Springer-Verlag.
Felzenszwalb, P., Girshick, R., McAllester, D., and Ra-
manan, D. (2010). Object detection with discrim-
inatively trained part-based models. Pattern Analy-
sis and Machine Intelligence, IEEE Transactions on,
32(9):1627–1645.
Ferryman, J., Worrall, A., Sullivan, G., and Baker, K.
(1997). Visual surveillance using deformable mod-
els of vehicles. Robotics and Autonomous Systems,
19(34):315 – 335.
Khan, S. M. and Shah, M. (2006). A multiview approach
to tracking people in crowded scenes using a planar
InSearchofaCar-Utilizinga3DModelwithContextforObjectDetection
423