image element descriptors and V = 400). A proposed
real-time system using BoW with GP-DC image el-
ements and V = 16 gave a performance of 92.7%.
For this, several compromises were made to minimise
time and memory consumption. The choice of GP-
DC as descriptor was made due to speed considera-
tions, but since the problem at hand is of limited com-
plexity, GP-DC showed to provide excellent perfor-
mance, comparable to that of the most complex meth-
ods evaluated. The small vocabulary size was cho-
sen to comply with memory demands, but investiga-
tions showed that the performance converged towards
the maximum for quite small vocabulary sizes (Fig-
ure 4), due to information saturation in the vocabu-
laries. Thus, a very small vocabulary size did not in-
flict serious performance degradations. The quality of
the vocabulary in terms of ability to separate the two
classes was increased notably when floating search
was used to select visual words compared to the com-
monly used k-means clustering. When studying the
misclassified images, many of them (about 30%) were
found to be caused by temporally limited effects such
as passing cars, turns when close to buildings, trees
planted in the city and so on. Thus temporal filtering
of the classification results would increase the general
performance substantially. This is however left as an
issue for further research. Based on this investigation,
we conclude that a road scene classification system
that can be operated during night time at real-time
speed can be constructed to give satisfactory classi-
fication performance.
REFERENCES
Battiato, S., Farinella, G. M., Gallo, G., and Ravì, D.
(2008). Scene categorization using bag of textons on
spatial hierarchy. In ICIP, pages 2536–2539. IEEE.
Bosch, A., Muñoz, X., and Martí, R. (2007). Which is the
best way to organize/classify images by content? Im-
age and Vision Computing, 25(6):778–791.
Bosch, A., Zisserman, A., and Muoz, X. (2008). Scene
classification using a hybrid generative/discriminative
approach. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 30(4):712–727.
Chang, C.-C. and Lin, C.-J. (2001). LIBSVM: a library for
support vector machines. Av. at
http://www.csie.
ntu.edu.tw/~cjlin/libsvm
.
Devendran, V., Thiagarajan, H., and Santra, A. K. (2007).
Scene categorization using invariant moments and
neural networks. In Proceedings of ICCIMA, vol-
ume 1, pages 164–168.
Forslund, D. (2008). Realtime scene analysis in infrared
images. Master’s thesis, Uppsala University, Sweden.
Hu, M.-K. (1962). Visual pattern recognition by moment
invariants. IRE Transactions on Information Theory,
8(2):179–187.
Lowe, D. (2004). Distinctive image features from scale-
invariant keypoints. Int. Journal of Computer Vision,
60(2):91–110.
Oliva, A. and Torralba, A. (2001). Modeling the shape of
the scene: A holistic representation of the spatial en-
velope. Int. Journal of Computer Vision, 42(3):145–
175.
Oliva, A. and Torralba, A. (2003). Statistics of natural im-
age categories. Network: Computation in Neural Sys-
tems, pages 391–412.
Pudil, P., Ferri, F., Novovicova, J., and Kittler, J. (1994).
Floating search methods for feature selection with
nonmonotonic criterion functions. In ICPR94, pages
279–283.
Quelhas, P., Monay, F., Odobez, J. M., Gatica-Perez, D.,
Tuytelaars, T., and Van Gool, L. (2005). Modeling
scenes with local descriptors and latent aspects. In
Tenth IEEE Int. Conf. on Computer Vision, 2005, vol-
ume 1, pages 883–890.
Sivic, J. and Zisserman, A. (2003). Video google: a text re-
trieval approach to object matching in videos. In Ninth
IEEE Int. Conf. on Computer Vision, 2003, pages
1470–1477.
Szummer, M. and Picard, R. W. (1998). Indoor-outdoor
image classification. In Proceedings of the 1998
Int. Workshop on Content-Based Access of Image and
Video Databases, page 42.
Vailaya, A., Figueiredo, M. A. T., Jain, A. K., and Zhang,
H.-J. (2001). Image classification for content-based
indexing. IEEE Transactions on Image Processing,
10(1):117–130.
Vailaya, A., Jain, A., and Zhang, H. J. (1998). On image
classification: City vs. landscape. In Proceedings of
the IEEE Workshop on Content - Based Access of Im-
age and Video Libraries, pages 3–8.
Walker, L. L. and Malik, J. (2003). When is scene
recognition just texture recognition. Vision Research,
44:2301–2311.
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
356