how neural properties (from photoreceptors to visual
cortical neurons) are adapted to the statistics of the
visual environment. Additionally, artificial models of
biological image processing have been developed and
used to verify the influences of ecological niches on
the characteristics of neural receptive fields (Balboa
and Grzywacz, 2003; Field, 1987; Doi et al., 2012).
While the majority of the research in this sci-
entific field has been focus in evaluating the spatial
frequency of images, a full consideration of image
statistics must certainly include time (Hateren, 1993;
Dong and Atick, 1995). Images falling on the retina
have important temporal structures arising from self-
motion of the observer, as well as from the motion of
objects in the world. Despite the complexity of daily
image sequences captured by the biological systems,
natural vision systems appear to work well in complex
3D scenes. Many fast moving animals, either simple
as flies and bees, or more complex biological systems
as birds, seem to have little trouble navigating through
the environments. In fact, vision is concerned with the
perception of objects in a dynamical world, one that
appears to be constantly changing when viewed over
extended periods of time.
Looking at these findings from an engineering
point-of-view, in this paper we illustrate how simple
statistis of simulated and real images vary as a func-
tion of the interaction between the world and the ob-
server. A methodology that highlights those changes
on the statistical properties, according to distance or
scene complexity, is here proposed, which could be
easily implemented in a simple robotic platform. Re-
sults show how simple image statistics can be used
to predict the presence or absence of objects in the
scene, before exploring the image/environment.
For a better understanding of the work here pro-
posed, the paper is organized as follows: in section
2, a brief literature review is performed, in order to
point out important work previously developed in this
scientific field. In section 3, the mathematical formu-
lation of the methodology here proposed is described
in detail, as well as the image sequences developed
and used to test the methodology. In section 4, impor-
tant experimental results are presented. Finally, the
conclusion of the work here described is presented on
section 5.
2 RELATED WORK
The statistical properties of static images have been
studied for many years (Burton and Moorhead, 1987),
seeking to describe the spatial regularities and corre-
lations of such images. However, during those years,
the regularities in time-varying images had been stud-
ied in a very limited way, mainly due to the high cost
associated with the technology to capture and analyse
motion pictures on computers, by then.
Posteriorly, in 1992, van Hateren (Hateren, 1992)
performed the first research aiming to character-
ize, indirectly, the spatio-temporal structure of visual
stimuli. This was determined by the spatial power
spectrum of the natural images, combined with the
distribution of velocities perceived by the visual sys-
tem, when moving in the environment. Through this,
van Hateren was able to infer about the joint spatio-
temporal spectrum obtained for the situations tested
and, subsequently, about the optimal neural filter for
maximizing the information rate in the photorecep-
tive channels of the eye. This analysis enabled van
Hateren to verify the high correlation between the
temporal response properties of biological visual neu-
rons and the optimal neural filter derived from this
study.
In 1995, Dong and Attick (Dong and Atick, 1995)
measured the spatio-temporal correlations for a group
of motion pictures segments, through the computation
of the three-dimensional Fourier transform on these
movie segments and then by averaging together their
power spectra. In Dong and Attick work (Dong and
Atick, 1995), it was shown that the slope of the spatial
power spectrum becomes more flat at higher tempo-
ral frequencies. At the temporal frequency spectrum
domain, the slope becomes more flat at higher spa-
tial frequencies. These results showed that the depen-
dence between spatial and temporal frequencies is, in
general, non-separable. A theorical derivation of this
scaling behaviour was proposed, being demonstrated
that it emerges from objects, with a static power spec-
trum, appearing at a variety of depths and moving
at different velocities relative to the observer. Ad-
ditionally, and similarly to the methodology imple-
mented by van Hateren (Hateren, 1992), Dong and
Attick computed the optimal temporal filter to remove
time correlations. The filter proposed was proved
to closely match the lateral geniculate neurons’ fre-
quency response function.
More recently, Rivait and Langer (Rivait and
Langer, 2007) examined the spatiotemporal power
spectra of image sequences depicting dense motion
parallax, namely the parallax seen by an observer
moving laterally in a cluttered environment. A pa-
rameterized set of computationally generated images
sequences were used and the structure of its spatio-
temporal spectrum was analysed in detail. This work
specifically addressed lateral translation. However,
the analysis here proposed could be generalized to
more complex type of motion, including components
ATime-analysisoftheSpatialPowerSpectraIndicatestheProximityandComplexityoftheSurroundingEnvironment
149