Figure 4: Quantitative results obtained by the proposed S-
TAPPMOG method: f.neg,f.pos.,t.e.,and T.Err mean false
negative, false positive per-pixel FG detections, total errors
on the specific sequence and total errors summed on all
the sequences analyzed, respectively. Our method outper-
formed the most effective general purposes BG subtraction
scheme (Wallflower, Bayesian decision, TAPPMOG), and
is comparable with methods which are more time demand-
ing and strongly constrained by data-driven initial hypothe-
ses (SACON and Tracey Lab LP).
paper. There, the RGB-normalized signal covariance
matrix was modeled as a diagonal matrix, while this
fact is not correct, as mentioned in ((Mittal and Para-
gios, 2004)).
5.2 “Traffic” Dataset
This dataset is formed by outdoor traffic sequences.
We focus on two of them, the “Snow” and the “Fog”
sequences, which are characterized by very hard
weather conditions, see Fig.5, first row.
As comparison against our method, we apply the
TAPPMOG algorithm, choosing the following param-
eters set: α = 0.005, w
init
= 0.01, σ
init
= 7.5. With the
same parameters setting, we apply the S-TAPPMOG
algorithm with γ
max
= 20 and ρ
max
= 7. In order to
speed up the processing, we down-sample both the
sequences reducing them to 160 × 120 pixel frames,
obtaining performances of 8 frames per sec. with the
TAPPMOG method and 6 frames per sec. with the S-
TAPPMOG algorithm, with MATLAB not-optimized
code.
Some qualitative results are shown in Fig.5. In gen-
eral, TAPPMOG method produces a large amount
of false FG detections. The following considera-
tions explain this phenomenon. In the “Snow” se-
quence (please refer to Fig.5, first three columns),
the scene can be modeled by a bi-modal BG, i.e.,
one mode modeling the outdoor environment, and
the other modeling the snow. The snow generates a
high-variance color intensity pattern, which can be in-
tended as a spatial texture (i.e., a pattern which glob-
ally cover the scene). Modeling this texture by taking
into account for signals coming from different close
positions is equivalent to better capture the intrinsic
high variance of the appearance of the snow. As an
example, see the red false FG detections in the related
figures, which are globally fewer than in the TAPP-
MOG approach. In particular, in Fig.5a, the snow
causes more false FG detections in the center of the
scene with the TAPPMOG model.
At the same time, the other component modeling the
clean environment (not corrupted by the snow), can
be learnt more precisely (with a smaller standard de-
viation), refining the per-pixel signal estimation with
the neighboring similar pixels signals. Looking at
Fig.5b), one can note that the car on the bottom is
not discovered by TAPPMOG approach, whereas it is
partially detected by S-TAPPMOG. A similar obser-
vation can be assessed by observing the car on Fig.5c,
which is better modeled by S-TAPPMOG.
In any case, the per-region analysis of the S-
TAPPMOG brings a side effect: when a white ob-
ject passes over the scene, this can be absorbed by the
white large variance BG Gaussian component which
characterizes the snow, causing a FG miss. This is
visible in Fig.5a, where the first car from the top is
partially covered by the lamp on the upper left part of
the image and some gray part of the tram on Fig.5b
and Fig.5c.
As visual explanation of how differently the two
methods model the scene, please refer to Fig.6. From
the images depicting the σ values, it is visible that our
method permits to better extract FG objects where the
scene is more uniform, e.g., the street, whereas in the
zones in which the scene can be confused with the
snow, standard deviation values are higher. As a com-
parison, in the corresponding images of the TAPP-
MOG method, no spatial distinction is made in the
FG discrimination, and, in general, the value of the
standard deviation is higher. From the µ images, in
S-TAPPMOG, we can see that the FG objects better
protrude with respect to the rest of the BG scene. This
means that the mean values that characterize FG and
BG objects are better differentiated by S-TAPPMOG
with respect to the TAPPMOG method. Similar con-
siderations can be stated for the “Fog” sequence (re-
fer to Fig.5, third column). Here, the scene can be
characterized by a bimodal BG, where one compo-
nent models the scene heavily occluded by the fog,
and the other explains the scene when the fog drasti-
cally diminishes, due to the characteristic dynamics of
the fog banks. In this case, the low-variance, per-pixel