object to the object notion in the brain. The other is
the linkage from the object notion to its potential
usage.
With more biological support, Lam (Lam, 1992)
explains the visual perception as an active
information-seeking process directed and interpreted
by the brain, which can be explained in four
transitions: T1, T2, T3, and T4 between five
concepts: Object, Light Pattern, Electric Signals,
Object Notion, and Cue (Figure.1).
Figure 1: Visual perception process.
Through the above process, some objects in built
environment are selectively perceived by human
beings as cues for the specific motivation. In the
following introduction on the human visual
perception simulation, the object, the transition T1,
the light pattern, the transition T2, and the electric
signals are compacted as an image input in pixel grid.
The simulation focuses on the transition T3, namely
from the pixel grid images to object notions.
3 STANDARD FEATURE MODEL
In recent years, some researchers turned back to look
at the object recognition problem from the biology
science side, and obtained very good results. Among
them, Serre developed a hierarchical system which
can be used for the recognition of complex visual
scenes, called SFM (Standard Feature Model of
visual cortex) (Serre et al., 2004, 2007). The system
is motivated by a number of models of the visual
cortex. Earlier, object recognition models aimed at
improving the efficiency of the algorithms, optimize
the representation of the object or the object
category. Not much attention was focused on the
biological features for higher complexity, not to
mention applying the neurobiological models of
object recognition to deal with real-world images.
The SFM model follows a theory of the feed
forward path of object recognition in the cortex,
which accounts for the first 100-200 milliseconds of
processing in the ventral stream of the primate visual
cortex (Riesenhuber et al., 1999); (Serre et al., 2005).
The SFM model tries to summarize what most of the
visual neuroscientists agree on: firstly, the first part
of visual processing of information in the primate
cortex follows a feed-forward way. Secondly, the
whole visual processing is hierarchical. Thirdly,
along this hierarchy the receptive fields of the
neurons will increase while the complexity of their
optimal stimuli will increase as well. Last but not
least, the modification and the learning of the object
categories can happen at all stages.
In the SFM model there are four layers, each
containing one kind of computational units. There
are two kinds of computational units, namely S
(simple) units and C (complex) units. The function
of the S unit is to combine the input stimuli with
Gaussian-like tuning as to increase object selectivity
and variance while the C unit aims to introduce
invariance to scale and translation. We simply call
the four layers as S
1
, C
1
, S
2
, and C
2
. A brief
description of the functions, input and output to the
four layers are listed in Table 1.
This biological motivated object recognition
system has been proven to be able to learn from few
examples and give a good performance. Moreover,
this generic approach can be used for scene
understanding. Last but not least, the features
generated by the model can work with standard
computer vision techniques; furthermore it can be
used as a supporting tool to improve the
performance of those computer vision techniques.
4 EXPERIMENT AND FINDING
There are different cues in the built environment,
which can be divided into three groups, namely non-
fixed cues, semi-fixed cues and fixed cues. Non-
fixed cues are defined as a type of information
perceived from the dynamic objects. Objects like
maps, signage, and different decorations are semi-
fixed cues, and the architectural cues are fixed cues.
We conducted two experiments applying the
SFM model to training sets for architectural cues
recognition. Sets of images of scenes containing
architectural cues are used as the training examples.
VISAPP 2009 - International Conference on Computer Vision Theory and Applications
386