Uncertainty Fusion based Object Recognition and Tracking in Maritime

Scenes using Spatiotemporal Active Contours

Ikhlef Bechar

, Frederic Bouchara

, Thibault Lelore

, Vincente Guis

and Michel Grimaldi

LSIS Laboratory, Toulon University, Toulon, France

PROTEE Laboratory, Toulon University, Toulon, France

Keywords:

Airborne Video System, Maritime Surveillance, Vessel Recognition, Dynamic Background, Chromatic

Uncertainty, Dynamic Texture Uncertainty, MAP Estimation, Energy Minimization, Spatiotemporal Active

Contours.

Abstract:

This article addresses the problem of near real time video analysis of a maritime scene using a (moving)

airborne RGB video camera in the goal of detecting and eventually recognizing a target maritime vessel. This

is a very challenging problem mainly due to the high level of uncertainty of a maritime scene including a

dynamic and noisy background, camera’s and target’s motions, and broad variability of background’s versus

target’s appearances. We propose an approach which attempts to combine several types of spatiotemporal

uncertainty in a single probabilistic framework. This allows to achieve a likelihood ratio with respect to any

possible spatiotemporal conﬁguration of the 2D + T video volume. Using the MAP estimation criterion, such

a problem can be recast as as an energy minimization problem that we solve efﬁciently using a spatiotemporal

active contour approach. We demonstrate the feasibility of the proposed approach using real maritime videos.

1 INTRODUCTION

Maritime surveillance is an important customs ap-

plication aiming at an efﬁcient monitoring of mar-

itime trafﬁc, and securing sea coasts and harbors from

fraudulent activities such as smuggling, thefts, pira-

cies, intrusions, and human trafﬁcking (Bloisi and

Iocchi, 2009; Pires et al., 2010). Traditionally, it con-

sists of a workﬂow of laborious tasks performed by

human operators (e.g., coast guards). Recently, semi-

automated and automated airborne video-surveillance

systems have gained increasing popularity in mar-

itime surveillance. The latter generate huge volumes

of video data that thus need be analyzed automatically

and in near real time in the goal of recognizing mar-

itime targets and ranking their activity (e.g., usual ver-

sus fraudulent activity) (see Fig.1).

In this paper, we describe the video processing

system that we have developed for automatic mar-

itime object (e.g., vessel) recognition using an air-

borne visible light (i.e., RGB) video camera. The

hardware architecture chosen for the project allows

to continuously acquire video-streams of a maritime

scene in the visible light spectrum (400-700 nm) in-

volving a single target at once. Each movie of a tar-

get is stored on a local (airborne) computer and it is

analyzed in quasi real time on board for recognition

purposes (see (Bechar et al., 2013) for more details).

1.1 Motivations and Related Work

Most currently existing video based maritime surveil-

lance systems are based on static (i.e., grounded)

RGB cameras. From an algorithmic point of view,

they have borrowed existing video-surveillance tech-

niques originally developed for rather "gentle" en-

vironments, and thus they might not be well suited

to highly dynamic scenes such as maritime environ-

ments, such as background subtraction techniques

(Stauffer and Grimson, 1999), optical ﬂow (Lu-

cas and Kanade, 1981), statistical learning (Bloisi

et al., 2012), and son on. Furthermore, there ex-

ists other surveillance systems which are based on

different (though more expensive) data acquisition

modalities such as infrared imagery (Smith and Teal,

1999), multiple sensor information fusion such as

radar/AIS(B. J. Rhodes, 2007), and RGB/thermal in-

frared imagery (Bechar et al., 2013) in order either to

cover larger areas of the sea or to account for RGB

system’s unreliability.

In this work, we consider a non-static RGB video

system and our goal is to devise an automatic tech-

682

Bechar I., Bouchara F., Lelore T., Guis V. and Grimaldi M..

Uncertainty Fusion based Object Recognition and Tracking in Maritime Scenes using Spatiotemporal Active Contours.

DOI: 10.5220/0004755406820689

In Proceedings of the 9th International Conference on Computer Vision Theory and Applications (VISAPP-2014), pages 682-689

ISBN: 978-989-758-003-1

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

(a) (b) (c) (d)

Figure 1: Four maritime RGB video shots showing: (a) a cargo; (b) a medium-sized vessel; (c) a yacht; (d) a sailboat.

nique which combines several types of RGB video

information in order to yield a likelihood ratio with re-

spect to any spatiotemporal trajectory of a video that

it corresponds to a maritime target’s. Eventually, we

would like to exploit such a result in order to detect

as most accurately as possible (ideally delineate) the

target in a video with an overwhelming probability.

Let us emphasize that in contrast to static video sys-

tems in which objects are only ﬁlmed when they enter

the ﬁeld of view of the camera (in which case, a prior

modeling via learning of the background without ob-

jects, combined with a proper background subtraction

technique may be envisaged for object detection), our

airborne (and thus non-static) video system acquires a

speciﬁc maritime scene of a target directly, based on a

radar based notiﬁcation. Therefore, video processing

should determine alone the spatiotemporal location of

a target in the whole video.

To this end, we advocate the use of an approach

which attempts to fuse several types of uncertainty re-

garding main chromaticity (main color) and dynamic

texture (i.e., stochastic deviations around principal

color) in a common probabilistic approach. This al-

lows to yield a probability ratio with respect to all

possible spatiotemporal conﬁgurations (i.e., both the

spatial location and the temporal trajectory) of a target

in a video. After that, using the maximum a posteri-

ori (MAP) estimation criterion (in the log-likelihood

form), we are able to recast the original problem as

an energy minimization problem which is solved us-

ing an active contour approach (Mumford and Shah,

1989; Vese and Chan, 2002).

2 MATHEMATICAL MODEL

Given video data X

1,T



,t = 1,T



, where for

all t = 1,T , X

stands for the t-th video frame, the

goal is to recognize and to track a maritime object

throughout the whole video. Let us then denote by

Ω the image domain, which is identical for all video

frames. We adopt a probabilistic approach, and our

goal is to assign to any spatiotemporal conﬁguration

1,T

⊆ Ω×1, T a probability that it corresponds to the

spatiotemporal trajectory of the target object, given

the observed video data. Eventually, we extract the

spatiotemporal trajectory O

∗

1,T

⊆ Ω×1, T that is most

likely to correspond to the actual target’s one.

Such a problem can indeed be formulated us-

ing the maximum a posteriori (MAP) principle as

it will be described in the sequel. It should be

mentioned however that our MAP approach 1 dif-

fers from another well-known energy based approach

which is based on naive Bayes classiﬁcation, in

the sense the latter requires an explicit modeling

simultaneously of the foreground p(x

/target) and

background p(x

/backrgound) probabilities, whereas

our approach solely focuses on foreground model

p(x

/target). This has thus the advantage to alleviate

the burden of having to estimate a background model

(which does not seem to be a trivial task when dy-

namic backgrounds are considered) via a MAP esti-

mation framework (which can also be seen as a hy-

pothesis testing against hypothesis H

which is fore-

ground model).

Having said this, by using the classical Bayes rule,

one can write



1,T





1,T





1,T





1,T



(1)

where: P



1,T



stands for the a posteriori like-

lihood of spatiotemporal region O

1,T

⊆ Ω × 1, T ,



1,T



stands for the likelihood of video data

given O

1,T

, P



1,T



stands for a priori probability,

and ﬁnally P



1,T



stands for the likelihood of video

data. The goal is to ﬁnd the spatiotemporal conﬁgura-

tion which maximizes formula 1. Note that the latter

formula is often rewritten using the log-likelihood (or

energy) form (which turns out to be more intuitive and

easier to optimize in practice) as follows

− log



1,T

i

= − log



1,T

i

− log



1,T

i

− log



1,T

i

(2)

UncertaintyFusionbasedObjectRecognitionandTrackinginMaritimeScenesusingSpatiotemporalActiveContours

683

where now:

• E



1,T



:= −log



1,T

i

stands for the

total energy of O

1,T

given that object is located at

1,T

;

• E

f id



1,T



:= −log



1,T



stands for a data

ﬁdelity term of the total energy of O

1,T

;

• E

reg



1,T



:= −log



1,T

i

stands for a

spatiotemporal regularization term;

Our main task in the remainder consists in the

modeling of the a priori probability P



1,T



of the a

posteriori likelihood P



1,T



and respectively,

since P



1,T



stands for a constant that we may ig-

nore in the remainder.

2.1 Modeling P



1,T



One ﬁrstly notes that P



1,T



models purely spa-

tiotemporal geometric information (or the spatiotem-

poral trajectory) of a target in a video (of course,

up to camera motion). For instance, if one knows

in advance that the target corresponds to some rigid

object moving according to a similarity motion (ro-

tation, translation and scale), thus it could be very

useful to incorporate this information in P



1,T



in a way which penalizes spatiotemporal trajectories

1,T

⊆ Ω × 1, T that do not ﬁt to the aforementioned

prior geometric knowledge. Nevertheless, this sug-

gests a proper parametrization of the total energy (us-

ing for instance a similarity matrix) and this is known

to be computationally untractable. Therefore, it is

often replaced with a computationally efﬁcient ap-

proximate model (e.g., using a random Markov ﬁeld

model) but which nonetheless may yield quite satis-

factory practical performances.

Therefore in this paper we propose to model

directly E

reg



1,T



:= −log



1,T

i

as the

sum of a multiplicative factor of the total 2D surface

of the spatiotemporal volume O

1,T

and of a multi-

plicative factor of its total 3D volume. This gives rise

to a regularization term which is composed of a tradi-

tional total variation (TV) term (Cremers et al., 2011)

and a linear term in a continuous (relaxed) form of the

total energy 2 (see section 3 for more details). While

the former ﬁnds a pretty interpretation as a disconti-

nuity preserving smoothing term of a spatiotemporal

trajectory of target, the latter penalizes the total vol-

ume of the spatiotemporal trajectory of a target. Let

us note that because of some (mainly computational)

considerations that will be motivated in subsection

2.3, we assume that all video frames are aligned with

each other prior to model’s optimization. Therefore,

such a regularization term operates directly on the

aligned video.

2.2 Modeling P



1,T



In this subsection, we describe our approach for mod-

eling P



1,T



in equation 1. As mentioned

earlier in this section, the latter models the a poste-

riori likelihood of video data X

1,T

if a target is lo-

cated at spatiotemporal location O

1,T

of the video vol-

ume Ω × 1,T. It is clear that in the absence of a

closed-form expression of a theoretical data observa-

tion model, one may only resort to an approximate

formula of it. This is a difﬁcult task because of the big

variability of both appearance (i.e., intensity) models

of the dynamic maritime background (i.e., the sea)

and of a target (i.e., a vessel).

Indeed, depending on the weather conditions and

on a target’s speed and size, the background’s dynam-

ics may exhibit drastic differences from one video

to another one (i.e., varying from a quasi-static blue

sea to a rough, wavy and dark sea background). On

the other hand, a vessel’s appearance (in terms of

main colors and their spatiotemporal variations) can-

not be anticipated, because of color variability and

important illumination changes inherent to a maritime

scene. Therefore it makes sense to seek an expert

system which can trade-off (merge) different types of

RGB video uncertainty in order to achieve generally

a good detection of a vessel. Having said this, the

starting idea for achieving such a goal is that both

chromaticity (i.e., as the principal pixel’s color) and

spatiotemporal (dynamic) texture turn out generally

to be good features of both sea background and tar-

get. Before going into more details about this idea

in the goal of achieving a good approximate model

of P



1,T



, let us ﬁrst consider the following

(additive) video data model:

1,T

= C

1,T

+ K

1,T

+ n

1,T

(3)

where C

1,T

stands for the principal color (or the

chromaticity) of the video pixels, K

1,T

stands for a

(statistical) spatiotemporal (dynamic) texture (Der-

panis and Wildes, 2012) which superimposes onto

the principal color of video pixels, and n

1,T

mod-

els system’s plus environment’s noise. Further-

more, we view



1,T



as a couple of ran-

dom vector variables having some probability dis-

tribution P



1,T



target



with respect to a tar-

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

684

get object, and some other probability distribu-

tion P



1,T



background



with respect to back-

ground. An elaborated method would attempt to esti-

mate in a common (parametric) framework simultane-

ously for



1,T



and the spatiotemporal location

of a target. However this might be too costly compu-

tationally for near real time video processing. There-

fore, if we assume that one may ﬁgure out determinis-

tically, using video preprocessing (e.g., color cluster-

ing and texture ﬁltering) the couple



1,T



from

1,T

according to decomposition 3, then one may be

able to exploit both principal color and spatiotempo-

ral texture information in order to characterize target’s

likelihood in a video. This can be seen (after ignoring

the noisy component n

1,T

in formula 3) by replacing



1,T



with



1,T



:= f



1,T



(4)

where f (·) is some function which models an ex-

pert’s knowledge regarding both chromaticity and

(dynamic) texture at target’s spatiotemporal locations.

In this work, we model such an expert function

f (·) based on the following remarks:

• The brighter (whiter) a pixel is, the less likely it is

to belong to the foreground as brightness is gen-

erally (but not always) characteristic of the foam;

• The more blue a pixel is, the more likely it is to

belong to the background (i.e., to the foam);

• The more scattered in the image plan a pixel’s

color is, the less likely it is to belong to a target,

as generally the sea occupies a spatially scattered

region of a video frame;

• Target’s dynamic texture is normally zero, up to

imaging artifacts (such as illumination changes).

Therefore, we propose to take f (c,k) :=



w(c),b(c), s(c),k



where g stands for some

positive real valued function trading off W (c),B(c),

S(c) and k which stand respectively for some func-

tions of how much a color c is bright (white) (W(c)),

blue (B(c)), scattered in the image plane (S(c)), and

(dynamic) texture (k). As aforementioned, we have

assumed that one may be able to extract each of

the components c and k from any spatiotemporal

video location using video preprocessing. Obviously,

there is no single method for choosing the functions

W (c),B(c), and S(c), therefore we propose to model

them based on RGB video information as follows.

For computational convenience, we ﬁrstly assume

statistical independence between pixels, and we

propose to take

W (c) ∝ exp



−



c − c

∗



2σ

∗



1(W )

where c

∗

stands for the mean intensity of the brightest

color class in a video frame (if any, and hence the use

of 1(W )), and σ

∗

stands for its standard deviation.

B(c) ∝ exp



−



c − c

∗



2σ

∗



1(B)

where c

∗

stands for the mean intensity of the blue

color class in a video frame (if any) and σ

∗

its stan-

dard deviation. Such brightest color and blue color

classes are identiﬁed with respect to each video frame

using color recognition techniques based respectively

on the sum of three RGB components c

and

(roughly) on the ratio

and by using a cluster-

ing technique (such as the Otsu algorithm). S(c) is

modeled as some decreasing function of the standard

deviation s(c) of the position in a video frame of color

c. We take

S(c) ∝ exp



−

(c)



1(B)

where c

∗

stands for the principal color with biggest

standard deviation. Finally, we estimate the dynamic

texture k at each video pixel by analyzing the video

signals in its neighborhood of size N using the white

noise assumption of dynamic texture at target. We

take k as some increasing function of the minimum

absolute value of the correlation ρ between any pair

of neighboring signals (assuming prior alignment of

video frames). Obviously, the expected value of k is

the expected value of a normalized random variable

with mean 0, and therefore one expects ρ to be gener-

ally much smaller for target pixels than for dynamic

sea pixels (whose dynamic texture instead does not

satisfy the white noise assumption). Thus we take

k ∝ exp



−



Finally, we take

f (z) ∝ 1 − αW (c)B(c)S(c)k (5)

where α stands for some positive constant in ]0,1].

It should be mentioned however that we don’t claim

that such a choice of f (z) is the most pertinent one,

nevertheless such a made simpliﬁcation may be seen

as the price to pay for speeding up computations for

achieving quasi real time system’s performance.

2.3 Video Frame Alignment

As above-mentioned, in order to be able to ﬁgure

out dynamic texture of a spatiotemporal conﬁgura-

tion and to reduce problem’s combinatoric complex-

ity, therefore prior to video processing, we align all

video frames in such a way that imaged points of a

maritime scene correspond to a same pixel location

across the video. This is achieved using pixel tracking

UncertaintyFusionbasedObjectRecognitionandTrackinginMaritimeScenesusingSpatiotemporalActiveContours

685

via optical ﬂow (Lucas and Kanade, 1981). However,

it is important to account for illumination changes be-

tween video frames in the optical ﬂow model. There-

fore, we ﬁrst proceed by transforming linearly each

RGB video frame X

to a positive gray image I

in a

way which maximizes frame’s contrast. Then, by as-

suming a linear illumination model, we perform op-

tical ﬂow (u,v) at current log-transformed gray video

frame J

:= log[I

] by assuming the following optical

equation:

∂J

∂x

u +

∂J

∂y

v +

∂J

∂t

+ ν = n (6)

with n standing for white noise, and ν models illumi-

nation variation between consecutive frames. Such

a model is estimated as (Lucas and Kanade, 1981)

using least square criterion in the neighborhood of a

pixel of size N. One notes that a bicubic interpola-

tion and image sampling to obtain a lower resolution

video frame is used prior to optical ﬂow estimation.

This allows to reduce overall video noise and to ﬂatten

highly textured sea regions (such as foam) in order to

yield good estimation of spatiotemporal video gradi-

ent, and thereby to obtain a reliable optical ﬂow. The

latter is recomputed for the original video and later

used to align video frames with each other.

3 ACTIVE CONTOUR

IMPLEMENTATION

The goal is to minimize the following energy model

with respect to all possible spatiotemporal conﬁrgua-

tions O:

(

F(z) + λPerim





+ βA





)

(7)

where F(z) := −log[ f (z)] with f (z) given by formula

5, λ and stand for some positive constant, Perim





and A





stand respectively for the total surface

and the total volume of spatiotemporal conﬁguration

O.However while model’s constant λ (which enforces

target’s spatiotemporal smoothness) is generally easy

to tune as a wide range of values may be suitable for

the task, the choice of β is not easy though crucial for

obtaining a reasonable segmentation of a target. Fur-

thermore, ideally we want to let the system choose

adaptively the best value of µ to use, provided that

we can inform it accordingly about what good val-

ues of β correspond to. Nevertheless, as we are deal-

ing with grey videos (as −log of pixel-wise proba-

bilities), therefore we may detect a target as the most

contrasted spatiotemporal conﬁguration in the video,

in a sense which we will specify hereafter. We show

indeed in this paper that model 7 is equivalent to a

traditional Mumford-Chah model, namely the Chan-

Vese model 11 (Vese and Chan, 2002). The proof our

claim along with the transformation of model 7 into

an equivalent Chan-Vese model are detailed in the ap-

pendix section. Therefore the energy that we mini-

mize is written as:

(

( f (z) − c

)

( f (z) − c

)

+λPerim





)

(8)

where c

and c

correspond to some constants that are

estimated adaptively using video data. Such a model 8

is ﬁrstly relaxed (M. Nikolova and Chan, 2006; Pock

et al., 2008; Cremers et al., 2011; Chambolle et al.,

2010) using variables u(z;t) ∈ [0,1] as

(



( f (z) − c

)

−( f (z) − c

)

)u(z)+λTV (u)

)

(9)

where λTV (u) stands for the total variation of u(z;t).

For known c

and c

, such a model 9 is convex and

thus can be solved for exactly using the following it-

erative scheme, starting from an initial solution u

j+1

= u

− λdiv



∇u

|∇u



where ∇ stands for the spa-

tiotemporal gradient sign and div stands for the spa-

tiotemporal divergence operator. The constants c

and

are updated simultaneously with u

as the mean in-

tensities inside and outside current target respectively.

4 RESULTS

The current version of the method is implemented in

C++ and runs in quasi real time on a standard 1.7Ghz

PC and for video resolution of 30 frames/sec. and

about 1 mégabyte frame resolution. Fig.2 and Fig.3

show results of the proposed method using two real

maritime videos with λ := 1000.

We have validated the proposed method using

a dozen of realistic video sequences under differ-

ent weather conditions, and target appearances. The

method has shown to perform generally very well

for detecting a vessel target except in some situations

where a target is mainly characterized with a whitish

color and the airborne camera lies so far away from

the target that the sea foam does not exhibit enough

texture that may distinguish it from the target.

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

686

(a) (b) (c) (d) (e)

(f) (g) (h) (i) (j)

(k) (l) (m) (n) (o)

Figure 2: Uncertainty based recognition and tracking of a maritime target (a cargo) using a spatiotemporal active contour.

approach. Upper row: A posteriori log-likelihood of the target (normalized between 0 and 255) at video frames number 1,

100, 200, 300, and 400 (resp). Middle row: The converged spatiotemporal active contour; Lower row: Target recognition.

(a) (b) (c) (d) (e)

(f) (g) (h) (i) (j)

(k) (l) (m) (n) (o)

Figure 3: Uncertainty based recognition and tracking of a maritime target (a yacht) using a spatiotemporal active contour.

Upper row: A posteriori log-likelihood of the target (normalized between 0 and 255) at video frames number 1, 100, 200,

300, and 400 (resp). Middle row: The converged spatiotemporal active contour; Lower row: Target recognition.

5 CONCLUSIONS

We have described a novel method for detecting ves-

sels in dynamic maritime background using a fusion

of uncertainty based approach, and we have solved the

problem efﬁciently using a spatiotemporal active con-

tour approach. Our tests using a dozen of real video

sequences of several minutes duration have shown

that our methods outperforms some state of the art

object tracking techniques such as meanshift/camshift

techniques. In the current version of our system, all

method’s parameters have been hardly coded based

UncertaintyFusionbasedObjectRecognitionandTrackinginMaritimeScenesusingSpatiotemporalActiveContours

687

on experience. Nevertheless it makes sense to devise

an automatic parameter selection technique in a fu-

ture version of the system. Also as future work, we

plan to consider other types of video information such

as more sophisticated texture models and geometric

prior knowledge (such as object’s rigidity, height and

shape) in order to yield as most reliable vessel detec-

tion algorithm as possible.

ACKNOWLEDGEMENTS

Thanks to the French customs for funding.

REFERENCES

B. J. Rhodes, e. a. (2007). Seecoast: persistent surveil-

lance and automated scene understanding for ports

and coastal areas. Ed., vol. 6578, no. 1. SPIE, p.

65781M.

Bechar, I., Lelore, T., Bouchara, F., Guis, V., and Grimaldi,

M. (2013). Toward an airborne system for near

real-time maritime video-surveillance based on syn-

chronous visible light and thermal infrared video in-

formation fusion. an active contour approach. In Proc.

Ocoss’2013, Nice, France.

Bloisi, D. and Iocchi, L. (2009). Argos - a video surveil-

lance system for boat trafﬁc monitoring in venice. In

IJPRAI, vol. 23 (7), pp. 1477–1502.

Bloisi, D., Iocchi, L., Fiorini, M., and Graziano, G. (2012).

Camera based recognition for marine awareness, great

lakes and st. lawrence seaway border regions. In Int.

Conf. Infor. Fusion, pp. 1982–1987.

Chambolle, A., Caselles, V., Novaga, M., Cremers, D., and

Pock, T. (2010). An introduction to total variation for

image analysis. In Chapter in Theoretical Founda-

tions and Numerical Methods for Sparse Recovery, De

Gruyter.

Cremers, D., Pock, T., Kolev, K., and Chambolle, A. (2011).

Convex relaxation techniques for segmentation, stereo

and multiview reconstruction. In Chapter in Markov

Random Fields for Vision and Image Processing. MIT

Press.

Derpanis, K. and Wildes, R. (2012). Spacetime texture rep-

resentation and recognition based on a spatiotemporal

orientation analysis. In PAMI,34(6):1193-205.

Lucas, B. and Kanade, T. (1981). An iterative image regis-

tration tech- nique with an application to stereo vision.

In In Proceedings of the International Joint Confer-

ence on Artiﬁcial Intelligence, pp. 674–679.

M. Nikolova, S. E. and Chan, T. (2006). Algorithms for

ﬁnding global minimizers of image segmentation and

denoising models. In SIAM Journal of Applied Math-

ematics 66, 1632–1648.

Mumford, D. and Shah, J. (1989). Optimal approximations

by piecewise smooth functions and associated varia-

tional problems. In Comm. Pure. Appl. Math. 42:577-

685.

Pires, N., Guinet, J., and Dusch, E. (2010). Asv: an inno-

vative automatic system for maritime surveillance. In

Navigation, vol. 58(232), pp. 1–20.

Pock, T., Schoenemann, T., Graber, G., Bischof, H., and

Cremers, D. (2008). A convex formulation of contin-

uous multi-label problems. In ECCV’08.

Smith, A. and Teal, M. (1999). Identiﬁcation and tracking

of marine objects in nearinfrared image sequences for

collision avoidance. In In 7th Int. Conf. Im. Proc. Ap-

plic., pp. 250–254.

Stauffer, C. and Grimson, W. E. L. (1999). Adaptive back-

ground mixture models for real-time tracking. in

CVPR’99, pp. 2246–2252.

Vese, L. and Chan, T. (2002). A new multiphase level set

framework for image segmentation via the mumford

and shah model. In IJCV, Vol. 50, pp. 271–293.

APPENDIX

Let us prove the claim we made in section 3. For sim-

plicity’s sake and without loss of generality, we con-

sider the following MAP based image segmentation

problem:

min

λPerim





+ βA





g(z) (10)

where g(z) is a positive function as it corresponds

−log of a probability. Now, let us consider the Chan

& Vese image segmentation model

λPerim





( f (z) − c

)

( f (z) − c

)

(11)

One may rewrite model 11 equivalently (after throw-

ing away the constant term) as follows

λPerim







( f (z) − c

)

−

( f (z) − c

)



)

(

λPerim







(z)



−



− 2 f (z)



−









= λPerim







(z)



−



− 2 f (z)



−



+ K



+ (



− K + R



A(O)

where K is the smallest positive constant (perhaps 0)

which makes the integrand term



(z)



−



−

2 f (z)



−



+ K



always positive, whatever z.

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

688

Now putting

(z)



−



− 2 f (z)



−



+ K = g(z)

− K = β

which is always possible via an appropriate tuning of

and c

such that

− K is positive and equal

to β, and by solving for the second degree equation

with respect to f (z) in order to ﬁnd the formula of

f (z) as a function of g(z), and hence the proof.

UncertaintyFusionbasedObjectRecognitionandTrackinginMaritimeScenesusingSpatiotemporalActiveContours

689