Vehicle Tracking and Origin-destination Counting System for Urban

Environment

Jean Carlo Mendes, Andrea Gomes Campos Bianchi and

Alvaro R. Pereira J

unior

Departamento de Computac¸

ao, Universidade Federal de Ouro Preto, Ouro Preto, Minas Gerais, Brazil

Keywords:

Tracking, Optical Flow, Vehicle Counting.

Abstract:

Automatic counting of vehicles and estimation of origin-destination tables have become potential applications

for trafﬁc surveillance in urban areas. In this work we propose an alternative to Optical Flow tracking to seg-

ment and track vehicles with scale/size variation during movement known as adaptive size tracking problem.

The performance evaluation of our proposed framework has been carried out on both public and privacy data

sets. We show that our approach achieves better origin destination tables for urban trafﬁc than the Optical

Flow method which is used as baseline.

1 INTRODUCTION

The increasing availability of trafﬁc surveillance and

high performance video processing hardware opens

up exciting possibilities for tracking trafﬁc analysis

under computer vision techniques (Buch et al., 2011).

Trafﬁc surveillance availability has been one of the

main issues in video capturing because the amount of

data makes manual analysis unworkable.

Some authors report studies using a tag-based ve-

hicle information, such as global positioning satellite-

GPS (Fleischer et al., 2012) or tag-based vehicle

(Fawzi M. Al-Naima, 2012). However tagged track-

ing presents privacy-related problems since personal

identiﬁcation is possible and because vehicles re-

quires installation of sensors. On the other hand, com-

puter vision methodologies provide an anonymous

vehicle tracking, avoiding problems concerning pri-

vacy and sensor installation.

Computer vision systems are very attractive for

such purpose as their cost is low compared to other

methodologies. Although several manuscripts de-

voted to present systems for tracking vehicles in a

scene, there are still lots of challenges involved in the

whole process.

Urban trafﬁc is more challenging than road trafﬁc.

Besides data overload and illumination conditions, ur-

ban trafﬁc presents a higher rate of vehicles, frequent

total or partial occlusions and environmental variation

in image capturing. In Figure 1 we show sample

frames of our urban environment.

Many authors have been suggested solutions for

trafﬁc situations such as automatic detection and

tracking of vehicle in an urban trafﬁc approach, as

well as vehicle turning movement counts and genera-

tion of origin-destination trip tables.

In this paper, we show an automatic vehicle track-

ing and turning movement counting system for ur-

ban environments. In our approach, multiple moving

objects are initially segmented from background us-

ing a background-subtraction technique (Stauffer and

Grimson, 1999). Sequently, each segmented region

receives a label. Then tracking begins using a mod-

iﬁed version of Optical Flow approach. The track-

ing methodology was modiﬁed to include vehicle size

variation during movement as Optical Flow doesn’t

deal with variation in size. Lastly, moving objects

are tracked and an origin-destination table is gener-

ated and compared with ground truth. The tracking

approaches were tested on real video scenes of ur-

ban trafﬁc under different light conditions, poor im-

age quality, intense trafﬁc, presence of static fore-

ground objects, vehicles of different categories, and

occlusion.

The paper is organized as follows. We begin with

a discussion on related work in Section 2. In Section

3, we discuss the proposed methodology for tracking

vehicles in urban environments, from ROI initializa-

tion, background extraction, blob analysis, and the

strategy for computing a modiﬁed Optical Flow track-

ing for vehicles. Section 4 provides experimental de-

tails of vehicle counts turning movements for two dif-

ferent approaches and a description of video database

in Section 4.1. Section 5 concludes and brings a

600

Mendes J., Gomes Campos Bianchi A. and Júnior Á..

Vehicle Tracking and Origin-destination Counting System for Urban Environment.

DOI: 10.5220/0005317106000607

In Proceedings of the 10th International Conference on Computer Vision Theory and Applications (VISAPP-2015), pages 600-607

ISBN: 978-989-758-091-8

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

(a) High-Trafﬁc

(b) Occlusion

Figure 1: Video frames of urban environment. Observe high

trafﬁc (a), presence of occlusion (b) and foreground static

objects, size variation, relative size variation because move-

ment, lack of image quality.

summary of our research and suggests possible future

work.

2 RELATED WORKS

There is a small number of methods designed to de-

termine automatic origin-destination of vehicles in ur-

ban environment. In (Lee and Baik, 2006) authors

present a method based on turning movements where

the vehicle’s trajectory is obtained by using ﬁsh eyes

lenses to cover a wide area and then to track it using

a Bayesian tracker combined with a particle ﬁltering

approach.

An urban trafﬁc tracker that uses a combination of

blob and feature tracking was presented in (Jodoin

et al., 2014), where authors extract a blob and track it

along the scene.

A simple technique that performs frame differ-

ence between current frame and a background image

(scene without vehicles) is also employed (Han and

Zhang, 2008). Although this technique presents good

performance, it is not suitable for environments with

frequent light changing conditions.

In (Jodoin et al., 2014) authors show a method

that uses blob analysis and object tracking in urban

environments.

A modiﬁed version of KLT

(Kanade–Lucas–Tomasi feature tracker) is shown

in (Xue et al., 2012) where authors constructed a

framework to track multiple objects in a single scene.

In (Roberts et al., 2008) authors show a solution

for urban environments using Harris corner detector

combined with a prediction algorithm to track points

along video frames.

A simple blob analysis was used in (Chen et al.,

2007) to extract vehicles features like area, perimeter,

aspect ratio, and dispersiveness. These features were

used to segment and count a vehicle.

Some methods are based on freeways environment

where problems like occlusion and stationary vehi-

cles are minimized. In (Pe

nate S

anchez et al., 2012)

authors presented a method using adaptative back-

ground and a probability function to minimize seg-

mentation errors in freeways environment.

Norbert Buch presents (Buch et al., 2011) a re-

view of the state-of-art computer vision for trafﬁc

video.

3 URBAN ENVIRONMENT

MOTION DETECTION

In this section we present a method for trafﬁc-intense

vehicle tracking and origin-destination counting. Fig-

ure 2 presents the components of methodology.

Algorithm 1 presents the method in pseudo-code

in order to highlight the key functions required for an

effective trafﬁc intensive characterization. Compute-

Foreground procedure is used to initialize our fore-

ground detector with the ﬁrst 1800 frames, which cor-

responds to a one-minute-long video. After this ini-

tial step, the system analyzes each frame updating the

positions of the blobs detected by optical ﬂow tech-

nique (performed by the update TracksOpticalFlow

function) and update a tracking list that contains blobs

tracking positions. If a blob is not visible by a limited

number of frames or if its optical ﬂow points were

not visible anymore, it is removed from list. The lim-

ited number of frames is deﬁned by a threshold. After

these steps the system gets new possible blobs enter-

ing in a scene, gets their features by detecting eigen

features and makes sure object is actually a new one

by comparing it with previously detected tracks.

VehicleTrackingandOrigin-destinationCountingSystemforUrbanEnvironment

601

Algorithm 1: Method algorithm.

1: procedure TRACKING(videoFile)

2: procedure COMPUTEFOREGROUND

3: for integer i = 0; i < numFramesDetector;

i++ do

4: f rame =readNextFrame()

5: f rameF =updateForeground( f rame)

6: end for

7: Return f rameForeground

8: end procedure

9: while videoHasFrame do

10: f rame =readNextFrame()

11: updateTracksOpticalFlow()

12: deleteLostTracks()

13: ob jects =detectObjects( f rame)

14: for each object ob j in ob jects do

15: points = detectMinEigenFea-

tures(object)

16: result = checkIfObjectIsNew(points)

17: updateObjectPosition()

18: checkObjectOverROI()

19: computeResult()

20: end for

21: end while

22: end procedure

3.1 Initialize Region of Interest

When the system is initialized it presents on screen

the ﬁrst video frame and asks user to mark all region

of interest (ROI). One ROI represents a vehicle’s en-

trance or exit region. This information will be used

for the tracker to determine when a vehicle is counted.

For each ROI, user needs to draw a polygon, create

an ID and inform the type of ROI (’in or out’). When

a vehicle passes over the delimited regions, the sys-

tem uses vehicle ID and origin/target regions IDs to

count it and determine its route. This interface pro-

vides a mechanism that allows user to create and save

a work-region mask. All pixels of the current frame

loaded into the system that is outside work-region

mask is discarded. This step increases the system per-

formance and decreases the number of detected blobs

outside the region of interest.

3.2 Foreground Detection

Foreground detection is a key step in segmentation

and tracking systems by determining which objects

are moving and each ones are stationary. This task

becomes hard when camera’s stability and illumina-

tion conditions change.

One basic approach to detect background is the

Figure 2: System architecture represented in block dia-

grams.

difference between the current frame and the initial

frame without any object. This method is very fast but

requires an initialization and it is not sensitive to light

changes. A better solution is a temporal difference

method that considers a number of time-sequential

frames to calculate the difference. In (Jinglei and

Zhengguang, 2007) authors used the difference be-

tween three sequential frames to calculate a ratio of

changing pixels over the whole difference image and

compared it to a threshold to determine whether it is

a background or not.

The videos used in our work were obtained by a

camera mounted over a trafﬁc light pole that swings

with wind. To minimize effects of camera shake and

light changes in foreground detector an implementa-

tion of Gaussian Mixture Models (Stauffer and Grim-

son, 1999) (Friedman and Russell, 1997) was ap-

plied. In this approach, background model is adapted

in each subsequent frame according to pixels taken

by a mixture of Gaussians. Analyzing the variance of

Gaussians of the mixture for a pixel it is possible to

classify each pixel as background or foreground.

The ﬁrst one thousand eight hundred frames (ﬁrst

one-minute video) are used to train foreground de-

tector. This large number is due to hard trafﬁc and

camera instability; by using a large number of frames

we reduce the number of invalid artifacts detected by

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

602

foreground extractor. Foreground model is updated

at each frame computation in the training phase and

used as input for the blob analysis step. In Figure 3

we can ﬁnd an output sample of system foreground

detector.

(a) Frame (b) Foreground

Figure 3: Foreground detector. Sample frame (a) and Fore-

ground mask (b).

3.3 Blob Analysis

For each object detected by foreground computation,

the system proceeds with a blob analysis that consists

in applying morphological operations (median ﬁlter,

closing and hole ﬁlls) over foreground result image.

All objects smaller than a threshold (calculated based

on minimum blob area of vehicles found in frames)

are discarded. For blobs regarded as potential vehi-

cles, the system computes area, centroid and bound-

ing box coordinates, as well as a blob label. Blobs

bounding box are used in next steps to calculate re-

gion optical ﬂow and object movement.

Object aspect ratio (width x height relation) is

computed in order to verify whether its values are un-

der an acceptable range to be considered as a potential

vehicle. If it is a potential vehicle, the system creates

a new ID for object and inserts it in a tracker list, used

to keep track vehicle along video frames.

Foreground detector and blob analysis fail when

two vehicles are moving close to each other creating

only one blob for two objects. This issue is solved by

using a K-means segmentation (with k equal 2) un-

der each blob. This creates two region-groups of pix-

els. By comparing the distance between centroid of

this two region-groups, the system can decide if it is a

unique blob or if it contains two vehicles moving to-

gether. On the other hand, in some cases foreground

detector/blob analysis fails by separating one object

in two blobs. To solve this problem, when more than

one blob is found by foreground detector in the same

frame, the system calculates and compares distances

between all pairs of blob centroids. If two centroids

are close enough (determined by a threshold), the sys-

tem will consider that it represents the same object

(and will join it in only one blob). In this case the

blob with older ID will maintain its properties as blob

ID, centroid path, etc.

3.4 Modiﬁed Optical Flow Computation

and Tracking

An optical ﬂow computation is performed to deter-

mine movement of blobs detected in previous step.

The bounding boxes of blobs detected are used as re-

gion delimiter to ﬁnd feature points using eigenvalue

as presented in (Shi and Tomasi, 1994). These fea-

ture points are compared with values calculated in

previous frames and determines if object is already

detected and if object is moving. An example of fea-

ture points of a moving object can be viewed as red

crosses in Figure 4.

Figure 4: Optical ﬂow - Corner features.

Our camera was mounted in one corner of the in-

tersection. This means that we have a high difference

in vehicles size along scene (vehicles far from cam-

era position are clearly smaller than vehicles near the

camera).

Sparse optical ﬂow algorithm, like Lucas-Kanade,

calculates pixels displacement between frames as-

suming local smoothness. If object changes their

sizes along frames, optical ﬂow fails since some

points disappear (when object moves away from cam-

era position) and some appear (when object moves to-

ward the camera). The object points reacquisition is

necessary along the video frames. This is performed

by comparing optical ﬂow points from previous frame

with object detection (foreground detector result) in

current frame.

If result of optical ﬂow computation is higher than

a percentage of feature points detected (determined by

a threshold with value equal 55), object tracker is up-

dated by computing new points position and new cen-

troid position. This task is necessary because when an

object position changes along video frames, its size

and aspect ratio also changes and causes lost points.

A vector containing centroid path is saved to show

vehicle route to user as seen in Figure 5. At this mo-

ment vehicle’s age is incremented to label number of

frames in which this vehicle is tracked as shown in

VehicleTrackingandOrigin-destinationCountingSystemforUrbanEnvironment

603

Figure 6. If the result of the optical ﬂow computation

is lower than a threshold, object is considered not vis-

ible in current frame and its property called Consec-

utive Invisible Frames is incremented. When object

is not visible in more than eight consecutive frames,

it is discarded. Considering medium vehicle size and

medium speed of objects, eight frames represent a dis-

tance equal to vehicle size. By discarding objects in-

visible for eight frames the system avoids recognizing

a new vehicle at same position as the old one that is

invisible.

Figure 5: Centroid Path/Route computed by tracking

methodology.

Figure 6: Vehicle ID and age computed by tracking method-

ology.

3.5 Creating New Objects

Because vehicles are entering and leaving video area

all the time, the system needs to detect possible new

objects and decide whether it is a new detection or an

existing moving object detected in previous frames.

To perform this task, after detecting foreground, the

system takes all blobs in current frame and compares

its positions to positions of blobs detected in previous

frames (number of previous frames used in this com-

parison is determined by a threshold). If the centroid

of new detection is close enough (determined by a

threshold) to some object detected in previous frames,

they are considered same object and its properties are

updated. Otherwise, a new object is created and it re-

ceives a new ID. This new object is included in object

detect vector and will be tracked and analyzed again

in the next frame.

3.6 Region of Interest Analysis

When an object centroid passes over a region of in-

terest, ROI ID and region type (in/out) are saved into

object properties. This ID will be used along the vehi-

cle’s lifetime over next frames to determine its route.

At this moment, object ID is stored in a result set

to prevent duplicate in counting. This is necessary

because ROI is a polygon (not a single line) and an

object will pass over the region in multiple sequen-

tial frames. Therefore, when a object passes over a

ROI its ID is compared with all IDs that have already

passed over a polygon region and will be computed

only on its ﬁrst detection over the region. Figure 7

shows a frame with a user deﬁned ROI

Figure 7: Frame ROI.

3.7 Compute Results

When an object passes over a ROI of type ’out’, its

route is computed based on origin and destination

regions identiﬁcations (IDs) stored in its properties.

If an origin-destination pair is already presented in

the result set, this value is incremented; otherwise,

a new result entry is created. This step is executed

along all video frames and when the end of the ﬁle is

reached, the system shows a table containing Origin-

Destination identiﬁcation with the total vehicle count.

4 EXPERIMENTS

In this section we present the dataset used and the ex-

perimental results.

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

604

4.1 Video Database

Three video sequences were taken in an urban inter-

section in three different times of a given day. All

sequences were captured by the same camera at 30

frames per second and frame size of 720x480. First

and second sequences were taken in daylight con-

ditions and third sequence at the end of afternoon.

Video length and total vehicles in each sequence can

be seen at Table 1 where Vehic. (A) and Vehic. (B)

are ground truth of vehicle quantities that pass over

each route.

Table 1: Video database details.

Seq. Len. Frames Vehic.(A) Vehic.(B)

01 59:52s 107669 69 112

02 59:59s 107874 81 126

03 60:00s 107893 128 101

In Figure 8 we show some examples of video

frames taken under a range of realistic conditions.

(a) Sequence 01 (b) Sequence 02

Figure 8: Example frames from video sequences. (a) and

(b) Sunny conditions with shadows and occlusions (c) End

of afternoon with dark scene.

In Figure 9 we can ﬁnd routes identiﬁcation for

captured videos and in Figure 10 we can ﬁnd routes

identiﬁcation for StMarc video that is a public video

used in (Jodoin et al., 2014). Vehicles entering in

video frame by route B are more distant from the cam-

era position and are occluded by a tree, which causes

difﬁculties for foreground detection and consequently

object tracking.

All video sequences are available to scien-

tiﬁc community on website 4shared.com/folder/

X9pXuxKD/TrafﬁcDatabase.html, as well as the

ground truth table of ﬁve-minute segments of each

video. This will help future related works to perform

tests and compare results.

4.2 Results

Experiments were made using two approaches. The

ﬁrst one used an original implementation of optical

ﬂow using Lucas-Kanade features. Results from ex-

ecution of this approach was used as baseline. Sec-

ond experiment was made using our modiﬁed version

of optical ﬂow. Results of both can be seen in Ta-

ble 2 where Prec. , Recall, and F1 respectively stand

for Precision, Recall and F1-Score for original op-

tical ﬂow implementation and Prec. (Mod.), Recall

(Mod.), F1 (Mod.) stand for Precision, Recall and F1-

Score for modiﬁed optical ﬂow tracking method. Col-

umn ’Gain F1’ shows the gain in F1-Score of modi-

ﬁed optical ﬂow implementation over baseline.

Sequence A-A presents the best results for preci-

sion in all three sequences, as well as modiﬁed optical

ﬂow. Recall and F1 also have high values indicating

that returned values are also relevant. B-B sequence

presents high precision values, but lower recall. Such

behavior may be correlated to video complexity in re-

gards to occlusions and illumination. It is denoted that

the modiﬁed version for all sequences and both routes

have better results than the classical version of optical

ﬂow.

The same experiment was done using public St

Marc video sequence and is presented in Table 3.

Figure 9: Route’s IDs.

Figure 10: Route’s IDs - StMarc.

VehicleTrackingandOrigin-destinationCountingSystemforUrbanEnvironment

605

Table 2: Results table - Original and Modiﬁed Optical Flow.

Seg./Route Prec. Recall F1 Prec. (Mod.) Recall (Mod.) F1 (Mod.) Gain (F1)

S01/A-A 59.52% 48.10% 53.20% 71.43% 75.47% 73.39% 37.98%

S01/B-B 73.33% 33.00% 45.51% 86.95% 58.25% 69.77% 53.28%

S02/A-A 70.83% 50.07% 59.13% 82.81% 75.71% 79.10% 33.78%

S02/B-B 62.10% 28.97% 39.49% 70.67% 50.96% 59.21% 49.95%

S03/A-A 62.38% 48.50% 54.54% 74.51% 74.51% 74.51% 36.60%

S03/B-B 58.97% 27.06% 37.09% 67.24% 47.56% 55.71% 50.19%

Table 3: Results table - StMarc video.

St Marc

Route Groundtruth System result Accuracy

Route A-A 4 4 100.0%

Route B-B 4 4 100.0%

5 CONCLUSION

We presented a hybrid algorithm that combines opti-

cal ﬂow to track points along video frames and fore-

ground detection to reacquire points and update track-

ing information during size variation of objects in

sequence. This mixed solution allows us to mini-

mize problems caused by changes in object size along

video frames.

It is clear that good results in optical ﬂow using

Lucas-Kanade approach highly depends on quality of

initial object segmentation. If foreground detector

fails, all following approaches will also fail because it

will calculate pixel displacement in wrong positions.

By using a hybrid solution that combines optical ﬂow

results with foreground extraction the error is mini-

mized and system accuracy is increased.

ACKNOWLEDGEMENTS

This work was supported by Fapemig, CNPq and

Capes.

REFERENCES

Buch, N., Velastin, S., and Orwell, J. (2011). A review

of computer vision techniques for the analysis of ur-

ban trafﬁc. Intelligent Transportation Systems, IEEE

Transactions on, 12(3):920–939.

Chen, T.-H., Lin, Y.-F., and Chen, T.-Y. (2007). Intelli-

gent vehicle counting method based on blob analysis

in trafﬁc surveillance. In Innovative Computing, In-

formation and Control, 2007. ICICIC ’07. Second In-

ternational Conference on, pages 238–238.

Fawzi M. Al-Naima, H. A. H. (2012). Vehicle trafﬁc con-

gestion estimation based on rﬁd. volume 4:30.

Fleischer, P., Nelson, A., Sowah, R., and Bremang, A.

(2012). Design and development of gps/gsm based ve-

hicle tracking and alert system for commercial inter-

city buses. In Adaptive Science Technology (ICAST),

2012 IEEE 4th International Conference on, pages 1–

Friedman, N. and Russell, S. (1997). Image segmentation

in video sequences: A probabilistic approach. In Pro-

ceedings of the Thirteenth Conference on Uncertainty

in Artiﬁcial Intelligence, UAI’97, pages 175–181, San

Francisco, CA, USA. Morgan Kaufmann Publishers

Inc.

Han, C. and Zhang, Q. (2008). Real-time detection of vehi-

cles for advanced trafﬁc signal control. In Computer

and Electrical Engineering, 2008. ICCEE 2008. Inter-

national Conference on, pages 245–249.

Jinglei, Z. and Zhengguang, L. (2007). A vision-based road

surveillance system using improved background sub-

traction and region growing approach. In Software

Engineering, Artiﬁcial Intelligence, Networking, and

Parallel/Distributed Computing, 2007. SNPD 2007.

Eighth ACIS International Conference on, volume 3,

pages 819–822.

Jodoin, J.-P., Bilodeau, G.-A., and Saunier, N. (2014). Ur-

ban tracker: Multiple object tracking in urban mixed

trafﬁc. In Applications of Computer Vision (WACV),

2014 IEEE Winter Conference on, pages 885–892.

Lee, S.-M. and Baik, H. (2006). Origin-destination (o-d)

trip table estimation using trafﬁc movement counts

from vehicle tracking system at intersection. In IEEE

Industrial Electronics, IECON 2006 - 32nd Annual

Conference on, pages 3332–3337.

nate S

anchez, A., Quesada-Arencibia, A., and

Travieso Gonz

alez, C. (2012). Real time vehi-

cle recognition: A novel method for road detection.

In Moreno-D

ıaz, R., Pichler, F., and Quesada-

Arencibia, A., editors, Computer Aided Systems

Theory – EUROCAST 2011, volume 6928 of Lecture

Notes in Computer Science, pages 359–364. Springer

Berlin Heidelberg.

Roberts, W., Watkins, L., Wu, D., and Li, J. (2008). Vehicle

tracking for urban surveillance. volume 6970, pages

69700U–69700U–8.

Shi, J. and Tomasi, C. (1994). Good features to track. In

Computer Vision and Pattern Recognition, 1994. Pro-

ceedings CVPR ’94., 1994 IEEE Computer Society

Conference on, pages 593–600.

Stauffer, C. and Grimson, W. E. L. (1999). Adaptive

background mixture models for real-time tracking.

In Computer Vision and Pattern Recognition, 1999.

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

606

IEEE Computer Society Conference on., volume 2,

pages –252 Vol. 2.

Xue, K., Vela, P., Liu, Y., and Wang, Y. (2012). A modi-

ﬁed klt multiple objects tracking framework based on

global segmentation and adaptive template. In Pattern

Recognition (ICPR), 2012 21st International Confer-

ence on, pages 3561–3564.

VehicleTrackingandOrigin-destinationCountingSystemforUrbanEnvironment

607