COVARIANCE BASED FISH TRACKING IN REAL-LIFE

UNDERWATER ENVIRONMENT

Concetto Spampinato

, Simone Palazzo

, Daniela Giordano

, Isaak Kavasidis

, Fang-Pang Lin

and Yun-Te Lin

Department of Electrical, Electronics and Computer Engineering, University of Catania,

Viale Andrea Doria, 6 -95125, Catania, Italy

National Center of High Performance Computing, No. 7, R&D 6th Rd., Hsinchu Science Park, Hsinchu City, Taiwan

Keywords:

Object Tracking under Extreme Conditions, Covariance Approach, Intelligent Underwater Video Analysis.

Abstract:

In this paper we present a covariance based tracking algorithm for intelligent video analysis to assist marine

biologists in understanding the complex marine ecosystem in the Ken-Ding sub-tropical coral reef in Taiwan

by processing underwater real-time videos recorded in open ocean. One of the most important aspects of

marine biology research is the investigation of ﬁsh trajectories to identify events of interest such as ﬁsh preying,

mating, schooling, etc. This task, of course, requires a reliable tracking algorithm able to deal with 1) the

difﬁculties of following ﬁsh that have multiple degrees of freedom and 2) the possible varying conditions

of the underwater environment. To accommodate these needs, we have developed a tracking algorithm that

exploits covariance representation to describe the object’s appearance and statistical information and also

to join different types of features such as location, color intensities, derivatives, etc. The accuracy of the

algorithm was evaluated by using hand-labeled ground truth data on 30000 frames belonging to ten different

videos, achieving an average performance of about 94%, estimated using multiple ratios that provide indication

on how good is a tracking algorithm both globally (e.g. counting objects in a ﬁxed range of time) and locally

(e.g. in distinguish occlusions among objects).

1 INTRODUCTION

Typically marine biologists study ﬁsh populations

in their natural habitat using casting nets in the

ocean, human underwater observation and photogra-

phy (Rouse, 2007), combined net casting and acoustic

(sonar) (Brehmera et al., 2006) and human hand-held

video ﬁlming. However these approaches either are

invasive (such as the net casting method) or provide

scarce information (such as photography). In order to

overcome these limitations, underwater cameras have

been widely used in the last years, since they do not

inﬂuence ﬁsh behavior and also provide large amount

of video material (with cameras active day and night,

the only limitation is the amount of mass memory re-

quired to store the videos). On the other hand, it is

impractical to manually analyze this huge quantity of

video data, both because it requires a lot of time and

also because it is error prone – it is unrealistic to as-

sume people can fully investigate all the information

in the videos. Therefore, automatic video analysis

methods are heavily demanded such as the one devi-

sed in the Fish4Knowledge

project, which uses live

video feeds from ten underwater cameras located in

the coral reefs of Taiwan’s shores and aims at devel-

oping an automatic system for integrated data captur-

ing, video analysis, ﬁsh detection and classiﬁcation,

and querying, for the marine biologists to use, in order

to study ﬁsh populations, behavior and interactions.

The main difﬁculty in this kind of tasks is the na-

ture of the videos to be processed. Traditionally, such

tasks have involved the analysis of videos taken in

controlled environments, such as tanks (Morais et al.,

2005; Petrell et al., 1997), where for example light-

ing conditions do not change with time, the back-

ground is static to simplify ﬁsh detection, the type

of ﬁsh is known, etc. The lack of these assumptions

greatly complicates the task to be accomplished and

requires the development of automatic analysis meth-

ods which are robust enough to handle all the pos-

sible varying conditions of the environment. In this

direction, key roles are played by image preprocess-

http://ﬁsh4knowledge.eu

409

Spampinato C., Palazzo S., Giordano D., Kavasidis I., Lin F. and Lin Y..

COVARIANCE BASED FISH TRACKING IN REAL-LIFE UNDERWATER ENVIRONMENT.

DOI: 10.5220/0003866604090414

In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2012), pages 409-414

ISBN: 978-989-8565-04-4

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

ing (e.g. (Cannav

o et al., 2006)), object detection,

tracking and recognition (e.g. (Spampinato, 2009)

and (Spampinato et al., 2010)).

One aspect to deal with when analyzing marine

ecosystems is ﬁsh tracking, whose importance goes

beyond simple population counting. In fact, be-

havior understanding and ﬁsh interactions’ analysis,

which are interesting perspectives for marine biolo-

gists to study, strictly rely on trajectories extracted us-

ing tracking approaches. However, tracking presents

a few major difﬁculties, which become greater in un-

derwater environments where objects have multiple

degrees of freedom or when the scene conditions can-

not be controlled.

Many different approaches have been studied in

literature on how to solve the visual tracking prob-

lem such as Kalman ﬁlter-based tracking (Doucet

et al., 2001), particle ﬁlter tracking (Gordon et al.,

1979), point feature tracking, mean-shift tracking

(Comaniciu and Meer, 2002). However, to the best

of our knowledge, only a variation of mean-shift,

the CAMSHIFT (Bradski, 1998), has been applied

to underwater environments (Spampinato et al., 2008)

achieving an average tracking performance (estimated

as correct counting rate) of about 85%. However, the

CAMSHIFT shows a main drawback when dealing

with ﬁsh-ﬁsh and ﬁsh-background occlusions mainly

due to the fact that it exploits only color informa-

tion. In this paper we propose a tracking algorithm

where ﬁsh are modeled as covariance matrices (Tuzel

et al., 2006) of feature built out of each pixel be-

longing to the ﬁsh’s region. This representation al-

lows to embody both spatial and statistical proper-

ties of non-rigid objects, unlike histogram represen-

tations (which disregard the structural arrangement of

pixels) and appearance models (which ignore statisti-

cal properties). As shown in the experimental results

section, the performance of the proposed approach is

very encouraging and better that the ones achieved

with CAMSHIFT, thus also indicating how our co-

variance based approach performs very well under ex-

treme conditions.

The remainder of the paper is: Section 2 de-

scribes the details of the proposed covariance based

ﬁsh tracking algorithm; Section 3, instead, shows the

achieved tracking results with hand-labeled ground

truth data. Finally, Section 4 points out the conclud-

ing remarks.

2 COVARIANCE BASED

TRACKING ALGORITHM

In the following description, we use “tracked object”

to indicate an entity that represents a unique ﬁsh and

contains information about the ﬁsh appearance his-

tory and its current covariance model; and “detected

object” to indicate a moving object, which has not

been associated to any tracked object yet. For each

detected object, the corresponding covariance matrix

is computed by building a feature vector for each

pixel, made up of the pixel coordinates, the RGB and

hue values and the mean and standard deviation of

the histogram of a 5×5 window with the target pixel

as centre. The covariance matrix, which models the

object, is then computed from this feature vector and

associated to the detected object. Afterwards, this ma-

trix is used to compare the object with the currently

tracked objects, in order to decide which one it re-

sembles the most. The main issue in comparing co-

variance matrices lies in the fact that they do not lie

on the Euclidean space—for example, the covariance

space is not closed under multiplication with nega-

tive scales. For this reason, as suggested in (Porikli

et al., 2005), we used F

orstner’s distance (Forstner

and Moonen, 1999), which is based on generalized

eigenvalues, to compute the similarity between two

covariance matrices:

ρ(C

) =

∑

k=1

) (1)

where d is the order of the matrices and



)



are the generalized eigenvalues of covariance matri-

ces C

and C

, computed from

−C

= 0 k = 1 ···d (2)

The model of each tracked object is then com-

puted as a mean (based on Lie algebra (Porikli et al.,

2005)) of the covariance matrices corresponding to

the most recent detections of that object. In order to

deal with occlusions, the algorithm handles the tem-

porary loss of tracked objects, by keeping for each

of them a counter (T T L) of how many frames it

has been missing; when this counter reaches a user-

deﬁned value (for 5-fps videos, the best value, ob-

tained empirically, was 6), the object is considered

lost and discarded. In order to decide whether a de-

tected object is a feasible candidate as the new appear-

ance of a current tracked object, we check if the de-

tected object’s region overlaps, at least partially, with

the tracked object’s search area, which by default is

equal to the bounding box of that object’s latest ap-

pearance. To manage the temporary loss of an ob-

ject, and the fact that while the object has not been

detected it might have moved away from its previous

location, we modify the search area, expanding it pro-

portionally to the number of frames where the object

VISAPP 2012 - International Conference on Computer Vision Theory and Applications

410

(a)

(b)

Figure 1: Search area expansion: when an object (1(a)) is

lost, the corresponding search area (1(b)) is composed of a

“central search area” (in dark blue) and a “directional search

area” (in lighter blue), oriented towards the estimated direc-

tion of the ﬁsh.

has been missing. In this case, the search area is made

up of two rectangles: a “central search area”, which is

a rectangle centered at the object’s previous location,

initialized to the object’s previous bounding box and

expanded evenly in all directions; and a “directional

search area” which takes into consideration the recent

motion trend of the object to give a simple estimate of

its direction, and which consists of a rectangle with

a vertex at the object’s previous location and the cor-

respondent opposite vertex located accordingly with

the estimated object’s direction. An example of the

search area expansion mechanism is shown in Fig-

ure 1. The steps performed by the proposed tracking

algorithm are shown in Algorithm 1.

3 RESULTS

To test the proposed algorithm we used 10 sample un-

derwater videos. Each videoclip was 10 minutes long,

sampled at 320×240 with a 24-bit color depth, at a

frame rate of 5 fps. In total we had 30000 frames

and 1262 unique ﬁsh. The recorded scenes were fea-

tured by: 1) sudden and gradual light changes to-

gether with the periodical gleaming typical of under-

water scenes, 2) bad weather conditions (e.g. cloudi-

ness, storms and typhoons) that affect image contrast,

3) murky water: the clarity of the water changed dur-

ing the day due to the drift and the presence of plank-

ton and 4) multiple ﬁsh occlusions: due to the absence

of the third dimension (we process 2D images) a lot

Algorithm 1: Pseudo-code implementation of the tracking

algorithm.

detected objects ← runDetectionAlgorithm()

tracked objects ← getCurrentlyTrackedObjects()

feasible associations ←

Compute covariance matrix for each detected object

for D

in detected objects do

.computeCovarianceMatrix()

end for

Compute set of possible associations between tracked objects and de-

tected objects

for T

in tracked objects do

for D

in detected objects do

if T

.getCurrentPosition() overlaps D

.getBlob() then

i j

← computeCovarianceDistance(T

.currentModel(),D

.cov

arianceMatrix())

feasible associations.add((T

, D

, d

i j

))

end if

end for

sortByDistance(feasible associations)

Assign each detected object to the covariance-closer tracked object

for (T

, D

, d

i j

) in feasible associations do

if not D

.isAssigned() then

.assignTo(T

)

.updateModel()

end if

end for

If a tracked object has been missing for too many frames, remove it

for T

in tracked objects do

if T

.foundInCurrentFrame() then

.resetTTL()

else

.decreaseTTL()

if T

.getTTL() = 0 then

tracked objects.remove(T

)

end if

end for

Add new tracked objects

for D

in detected objects do

tracked objects.createNew(D

)

end for

of occlusions among ﬁsh were present in the analyzed

scenes. For each video, ground-truth data (against

which we compared the output of the algorithm) was

hand-labeled. For each frame, the set of signiﬁcant

(i.e. large enough to be clearly identiﬁed as a ﬁsh)

ﬁsh was selected, and for each of such ﬁsh its posi-

tion and contour was hand-drawn. Tracking identi-

ﬁcation numbers (IDs) were used to label detections

in different frames as belonging to the same unique

ﬁsh, in such a way that ﬁsh which underwent tem-

porary occlusions would then be re-assigned to the

same tracking ID. We directly fed our tracking algo-

rithm with the detection data (i.e. the ﬁsh) provided

by the ground truth, so that the tracking results would

not be inﬂuenced by detection errors. The perfor-

COVARIANCE BASED FISH TRACKING IN REAL-LIFE UNDERWATER ENVIRONMENT

411

Table 1: Comparison between the results obtained by the

proposed algorithm and CAMSHIFT on the ground-truth

data.

Covariance tracker CAMSHIFT

CCR 91.3% 83.0%

ATM 95.0% 88.2%

CDR 96.7% 91.7%

mance of our algorithm was also compared with the

one achieved by the CAMSHIFT since it is the only

approach tested on underwater videoclips in (Spamp-

inato et al., 2008). To assess the ground-truth-vs-

algorithm comparison we adopted the following met-

rics, which are based on the ones existing in the liter-

ature (Bashir and Porikli, 2006), but that describe the

performance of a tracking algorithm both globally, at

the trajectory level (e.g. the correct counting rate and

the average trajectory matching), and locally, at the

single tracking decision level (e.g. the Correct deci-

sion rate):

• Correct Counting Rate (CCR): percentage of cor-

rectly identiﬁed ﬁsh out of the total number of

ground-truth ﬁsh.

• Average Trajectory Matching (ATM): average per-

centage of common points between each ground-

truth trajectory and its best-matching tracker-

computed trajectory.

• Correct Decision Rate (CDR): let a “tracking de-

cision” be an association between a ﬁsh at frame

and a ﬁsh at frame t

, where t

< t

; such track-

ing decision is correct if it corresponds to the ac-

tual association, as provided by the ground truth.

The correct decision rate is the percentage of cor-

rect tracking decisions, and gives an indication on

how well the algorithm performs in following an

object, which is not necessarily implied by the av-

erage trajectory matching (see Figure 2).

Table 1 shows the results obtained by the co-

variance tracking algorithm compared to the ones

achieved by the CAMSHIFT algorithm, in terms of

the above-described indicators. It is clear how our ap-

proach performs better than CAMSHIFT and also has

a very good absolute accuracy, being able to correctly

identify more than 90% of unique objects with a very

high degree of correspondence to the ground-truth tra-

jectories.

Figure 3 shows an example of how the proposed

algorithm and CAMSHIFT handle ﬁsh-ﬁsh occlu-

sions. It is possible to notice that the covariance

tracker is able to correctly follow the clownﬁsh (in

the blue box) after it is hidden behind the other ﬁsh,

whereas in the same frame sequence CAMSHIFT is

(a)

(b)

(c)

Figure 2: Difference between the trajectory matching score

and the correct decision rate. Fig. 2(a) shows two ground

truth trajectories of two ﬁsh, whereas the other two images

represent two examples of tracking output. In Fig. 2(b),

although the tracker fails at each tracking decision the tra-

jectory matching score is 50%, whereas the correct decision

rate is 0. Differently, in Fig. 2(c) the tracker fails only in

one step and the trajectory matching score is 50% (as the

previous case) whereas the correct decision rate is 80% (4

correct associations out of 5).

not able to identify the clownﬁsh’s appearances as be-

longing to the same unique ﬁsh, and after the occlu-

sion it starts tracking it as a new object.

4 CONCLUDING REMARKS

In this work we tackled the problem of ﬁsh track-

ing, which shows several difﬁculties due to the un-

constrained environment, the uncontrolled scene con-

ditions and the nature of the targets to be tracked, i.e.

ﬁsh, whose motion tends to be erratic, with sudden

direction and speed variations, and whose appearance

can undergo quick changes. In order to deal with

these problems, the approach we adopted is based on

a covariance-based modeling of objects, which has

proved to be suitable for tracking non-rigid objects in

a noisy environment, by representing an object’s spa-

tial and statistical information in a unique compact

structure, the covariance matrix. The performance

evaluation showed that the proposed tracker outper-

forms CAMSHIFT (previously applied to the same

scenes) and is able to correctly detect more than 90%

of the objects, with a correct decision rate higher than

96%. Since this approach has proved to be very effec-

tive on real-life underwater environments, further de-

velopments on this work will investigate the use and

adaption of this algorithm in different contexts, e.g.

VISAPP 2012 - International Conference on Computer Vision Theory and Applications

412

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 3: Tracking results on three occluding ﬁsh with the proposed approach (top row) and CAMSHIFT (bottom row). We

can see that the CAMSHIFT tracker fails to recognize that the ﬁsh in the yellow box is the same as the one in the blue box.

pedestrian or vehicle tracking in urban environments

(Faro et al., 2008) and (Faro et al., 2011).

ACKNOWLEDGEMENTS

This research was funded by European Commission

FP7 grant 257024, in the Fish4Knowledge project

(www.ﬁsh4knowledge.eu).

REFERENCES

Bashir, F. and Porikli, F. (2006). Performance Evaluation of

Object Detection and Tracking Systems. In IEEE In-

ternational Workshop on Performance Evaluation of

Tracking and Surveillance (PETS2006).

Bradski, G. R. (1998). Computer Vision Face Tracking For

Use in a Perceptual User Interface. Intel Technology

Journal, Q2.

Brehmera, P., Chib, T. D., and Mouillotb, D. (2006). Am-

phidromous ﬁsh school migration revealed by com-

bining ﬁxed sonar monitoring (horizontal beaming)

with ﬁshing data. Journal of Experimental Marine Bi-

ology and Ecology, 334(1):139–150.

Cannav

o, F., Giordano, D., Nunnari, G., and Spampinato,

C. (2006). Variational method for image denoising by

distributed genetic algorithms on grid environment. In

Proc. of WETICE-2006, pages 227 – 232.

Comaniciu, D. and Meer, P. (2002). Mean shift: A robust

approach toward feature space analysis. IEEE Trans-

actions on Pattern Analysis and Machine Intelligence,

24(5):603–619.

Doucet, A., De Freitas, N., and Gordon, N., editors (2001).

Sequential Monte Carlo methods in practice. Springer

Verlag.

Faro, A., Giordano, D., and Spampinato, C. (2008). Evalu-

ation of the trac parameters in a metropolitan area by

fusing visual perceptions and cnn processing of web-

cam images. IEEE Transactions on Neural Networks,

19(6):1108–1129.

Faro, A., Giordano, D., and Spampinato, C. (2011). In-

tegrating location tracking, trafﬁc monitoring and se-

mantics in a layered its architecture. IET Intelligent

Transport Systems, 5(3):197–206.

Forstner, W. and Moonen, B. (1999). A metric for covari-

ance matrices. Technical report, Dept. of Geodesy and

Geoinformatics, Stuttgart University.

Gordon, N., Doucet, A., and Freitas, N. (1979). An algo-

rithm for tracking multiple targets. IEEE Trans. Au-

tom. Control, 24(6):843–854.

Morais, E. F., Campos, M. F. M., Padua, F. L. C., and

Carceroni, R. L. (2005). Particle ﬁlter-based pre-

dictive tracking for robust ﬁsh counting. Computer

Graphics and Image Processing, Brazilian Sympo-

sium on, 0:367–374.

Petrell, R., X.Shi, Ward, R., Naiberg, A., and Savage, C.

(1997). Determining ﬁsh size and swimming speed in

cages and tanks using simple video techniques. Aqua-

cultural Engineering, 16(63-84).

Porikli, F., Tuzel, O., and Meer, P. (2005). Covariance track-

ing using model update based on lie algebra. In Proc.

IEEE Conf. on Computer Vision and Pattern Recogni-

tion.

Rouse, W. (2007). Marine biology research experiment:

Population dynamics of barnacles in the intertidal

zone.

Spampinato, C. (2009). Adaptive objects tracking by us-

ing statistical features shape modeling and histogram

analysis. In Seventh International Conference on Ad-

vances in Pattern Recognition, pages 270–273.

COVARIANCE BASED FISH TRACKING IN REAL-LIFE UNDERWATER ENVIRONMENT

413

Spampinato, C., Chen-Burger, Y.-H., Nadarajan, G., and

Fisher, R. B. (2008). Detecting, tracking and counting

ﬁsh in low quality unconstrained underwater videos.

In VISAPP (2), pages 514–519.

Spampinato, C., Giordano, D., Di Salvo, R., Chen-Burger,

Y.-H. J., Fisher, R. B., and Nadarajan, G. (2010). Au-

tomatic ﬁsh classiﬁcation for underwater species be-

havior understanding. In Proceedings of the ACM

ARTEMIS 2010, pages 45–50, New York, NY, USA.

ACM.

Tuzel, O., Porikli, F., and Meer, P. (2006). Region covari-

ance: A fast descriptor for detection and classiﬁcation.

Proc. 9th European Conf. on Computer Vision.

VISAPP 2012 - International Conference on Computer Vision Theory and Applications

414