Lightweight Computer Vision Methods

for Traffic Flow Monitoring on Low Power Embedded Sensors

Massimo Magrini, Davide Moroni, Gabriele Pieri and Ovidio Salvetti

SILab, Institute of Information Science and Technologies - CNR, Via G. Moruzzi 1, 56124, Pisa, Italy

Keywords: Real-time Imaging, Embedded Systems, Intelligent Transport Systems (ITS).

Abstract: Nowadays pervasive monitoring of traffic flows in urban environment is a topic of great relevance, since the

information it is possible to gather may be exploited for a more efficient and sustainable mobility. In this

paper, we address the use of smart cameras for assessing the level of service of roads and early detect

possible congestion. In particular, we devise a lightweight method that is suitable for use on low power and

low cost sensors, resulting in a scalable and sustainable approach to flow monitoring over large areas. We

also present the current prototype of an ad hoc device we designed and report experimental results obtained

during a field test.

1 INTRODUCTION

Thanks to computer vision techniques, fully

automatic video and image analysis from traffic

monitoring cameras is a fast-emerging field based

with a growing impact on Intelligent Transport

Systems (ITS).

Indeed the decreasing hardware cost and,

therefore, the increasing deployment of cameras and

embedded systems have opened a wide application

field for video analytics in both urban and highway

scenarios. It can be envisaged that several

monitoring objectives such as congestion, traffic rule

violation, and vehicle interaction can be targeted

using cameras that were typically originally installed

for human operators (Buch et al., 2011).

On highways, systems for the detection and

classification of vehicles have successfully been

using classical visual surveillance techniques such as

background estimation and motion tracking for some

time. Nowadays existing methodologies have good

performance also in case of inclement weather and

are operational 24/7. On the converse, the urban

domain is less explored and more challenging with

respect to traffic density, lower camera angles that

lead to a high degree of occlusion and the greater

variety of street users. Methods from object

categorization and 3-D modelling have inspired

more advanced techniques to tackle these

challenges. In addition, due to scalability issues and

cost-effectiveness, urban traffic monitoring cannot

be constantly based on high-end acquisition and

computing platforms; the emerging of embedded

technologies and pervasive computing may alleviate

this issue: it is indeed challenging yet definitely

important to deploy pervasive and untethered

technologies such as Wireless Sensor Networks

(WSN) for addressing urban traffic monitoring.

Based on these considerations, the aim of this

paper is to introduce a scalable technology for

supporting ITS-related problems in urban scenarios;

in particular, we propose an embedded solution for

the realization of a smart camera that can be used to

detect, understand and analyse traffic-related

situation and events thanks to an on-board vision

logics. Indeed, to suitably tackle scalability issues in

the urban environment, we propose the use of a

distributed, pervasive system consisting in a Smart

Camera Network (SCN), a special kind of WSN in

which each node is equipped with an image-sensing

device. Clearly, gathering information from a

network of scattered cameras, possibly covering a

large area, is a common feature of many video

surveillance and ambient intelligence systems.

However, most of classical solutions are based on a

centralized approach: only sensing is distributed

while the actual video processing is accomplished in

a single unit. In those configurations, the video

streams from multiple cameras are encoded and

conveyed (sometimes thanks to multiplexing

technologies) to a central processing unit, which

663

Magrini M., Moroni D., Pieri G. and Salvetti O..

Lightweight Computer Vision Methods for Trafﬁc Flow Monitoring on Low Power Embedded Sensors.

DOI: 10.5220/0005361006630670

In Proceedings of the 10th International Conference on Computer Vision Theory and Applications (MMS-ER3D-2015), pages 663-670

ISBN: 978-989-758-090-1

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

decodes the streams and perform processing on each

of them. With respect to those configurations, the

need to introduce distributed intelligent system is

motivated by several requirements, namely

(Remagnino et al., 2004):

• Speed: in-network distributed processing is

inherently parallel; in addition, the specialization of

modules permits to reduce the computational burden

in the higher level of the network, in this way, the

role of the central server is relieved and it might be

actually omitted in a fully distributed architecture.

• Bandwidth: in-node processing permits to

reduce the amount of transmitted data, by

transferring only information-rich parameters about

the observed scene and not the redundant video data

stream.

• Redundancy: a distributed system may be re-

configured in case of failure of some of it

components, still keeping the overall functionalities.

• Autonomy: each of the nodes may process the

images asynchronously and may react autonomously

to the perceived changes in the scene.

In particular, these issues suggest moving a part

of intelligence towards the camera nodes. In these

nodes, artificial intelligence and computer vision

algorithms are able to provide autonomy and

adaptation to internal conditions (e.g. hardware and

software failure) as well as to external conditions

(e.g. changes in weather and lighting conditions). It

can be stated that in a SCN the nodes are not merely

collectors of information from the sensors, but they

have to blend significant and compact descriptors of

the scene from the bulky raw data contained in a

video stream.

This naturally requires the solution of computer

vision problems such as change detection in image

sequences, object detection, object recognition,

tracking, and image fusion for multi-view analysis.

Indeed, no understanding of a scene may be

accomplished without dealing with some of the

above tasks. As it is well known, for each of such

problems there is an extensive corpus of already

implemented methods provided by the computer

vision and the video surveillance communities.

However, most of the techniques currently available

are not suitable to be used in SCN, due to the high

computational complexity of algorithms or to

excessively demanding memory requirements.

Therefore, ad hoc algorithms should be designed for

SCN, as we will explore in the next sections. In

particular, after describing the possible role of SCN

in urban scenarios, we present in Section 3 a sample

application, namely the estimation of vehicular

flows on a road, proposing a lightweight method

suitable for embedded systems. Then, we introduce

the sensor prototype we designed and developed in

Section 4. In Section 5 we report the experimental

results gathered during a test field and we finally

conclude the paper in Section 6.

2 SCN IN URBAN SCENARIOS

According to (Buch et al., 2011), there has been an

increased scope for the automatic analysis of urban

traffic activity. This is partially due to the additional

numbers of cameras and other sensors, enhanced

infrastructure and consequent accessibility of data.

In addition, the advances in analytical techniques for

processing video streams together with increased

computing power have enabled new applications in

ITS. Indeed, video cameras have been deployed for

a long time for traffic and other monitoring

purposes, because they provide a rich information

source for human understanding. Video analytics

may now provide added value to cameras by

automatically extracting relevant information. This

way, computer vision and video analytics become

increasingly important for ITS.

In highway traffic scenarios, the use of cameras

is now widespread and existing commercial systems

have excellent performance. Cameras are used

tethered to ad hoc infrastructures, sometimes

together with Variable Message Signs (VMS), RSU

and other devices typical of the ITS domain. Traffic

analysis is often performed remotely by using

special broadband connection, encoding,

multiplexing and transmission protocols to send the

data to a central control room where dedicated

powerful hardware technologies are used to process

multiple incoming video streams (Lopes et al.,

2010). The usual monitoring scenario consists in the

estimation of traffic flows distinguished among

lanes and vehicles typologies together with more

advanced analysis such as detection of stopped

vehicles, accidents and other anomalous events for

safety, security and law enforcement purposes.

By converse, traffic analysis in the urban

environment appears to be much more challenging

than on highways. In addition, several extra

monitoring objectives can be supported, at least in

principle, by the application of computer vision and

pattern recognition techniques. For example these

include the detection of complex traffic violations

(e.g. illegal turns, one-way streets, restricted lanes)

(Guo et al., 2011; Wang et al. 2013), identification

of road users (e.g. vehicles, motorbikes and

pedestrians) (Buch et al., 2010) and of their

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

664

interactions understood as spatiotemporal

relationships between people and vehicle or vehicle-

to-vehicle (Candamo et al., 2010). For these reasons,

it is worthwhile to apply the wireless sensor network

approach to the urban scenario.

Generally, we may identify four different scopes that

can be targeted thanks to video-surveillance based

systems, namely i) safety and security, ii) law

enforcement, iii) billing and iv) traffic monitoring

and management. Although in this chapter we focus

mostly on the latter, we give a brief overview of

each of them.

Safety and security relate to the prevention and

prompt notification both of proper traffic events and

of roadside events typical of urban environment.

Law enforcement is based on the detection of

unlawful acts and to their documentation for

allowing the emission of a fine. Besides well-known

and established technologies e.g. for streetlight

violations, vision based systems might allow for

identification of more complex behaviour e.g. illegal

turns or trespassing on a High Occupancy Vehicle

(HOV) lane. Documentation of unlawful acts is

usually performed by acquiring a number of images

sufficient for representing the violation, combined

with automatic number plate recognition (ANPR)

for identifying the offender vehicle. ANPR is also a

common component of video-based billing and

tolling. In addition, in this case there are a number of

established technologies provided as commercial

solutions by many vendors (Digital Recognition,

2014). A peculiarity of urban billing systems with

respect to highways is the non-intrusiveness

requirement: it is not possible to alter the normal

vehicular flow but a free-flow tolling must be

implemented. Technologies satisfying this

requirement are already available and used in cities

such as London, Stockholm and Singapore but their

actual cost prevents their massive deployment in

medium-size or low-resource cities. Nevertheless,

the availability of such billing technologies at a

lower cost may pave the way to the collection of

fine-grained data analytics of vehicular flows, road

usage and congestions, allowing for the

implementation of adaptive Travel Demand

Management (TDM) policies aimed at a more

sustainable, effective and socially acceptable

mobility applied to urban and metropolitan contexts.

Finally, traffic monitoring and management is

related to extraction information from urban

observed scenes that might be beneficial in several

contexts. For instance, real-time vehicle counting

might be used to assess level of service on a road

and detecting possible congestions. Such real-time

information might then be used for traffic routing;

either by providing directly suggestion to user (e.g.

by VMS) of by letting a trip planner deploys these

data to search for an optimal path. Finally, statistics

on vehicular flows may be used to understand

mobility patterns and help stakeholders to improve

urban mobility. Usually, vehicle count is performed

by inductive loops, which provide precise

measurements and some vehicle classification. The

major drawback of inductive loops is that they are

very intrusive in the road surface and therefore

require a rather long and expensive installation

procedure. Furthermore, maintenance also requires

intervention on the road pavement and therefore is

not sustainable in most urban scenarios. Radar-based

sensing systems are also used for vehicle counting

and simple analytics but in cases of congestions,

they generally exhibit deteriorated performance. In

the last years there has been interest in video-based

counting system based on imaging devices, also

embedded. Some solutions, such as (Traficam,

2014), are commercially available and provide

vehicle count in several lanes at an intersection. A

version of Traficam working in the infrared

spectrum is also available. Besides vehicle counting,

traffic management can include the extraction of

other flow parameters, e.g. discriminating the

components of flow generated by different vehicle

classes (car, track, buses, bike and motorbikes) and

assessing the transit speed of each detected vehicle.

From this brief survey of urban scenario

applications, we might argue that pervasive

technologies based on vision turn out to be of

interest when i) there is some semantics to be

understood that cannot be acquired solely on the

basis of scalar sensors, ii) there is no possibility or

no sufficient revenue in actuating installation of

tethered technologies, such as intrusive sensor or

high-end devices and iii) there is the need of a

scalable architecture, capable of covering a

metropolitan area. Since computer vision is not

application specific, an additional feature of a SCN

is represented by the fact that it can be re-adapted to

the changing urban environment and reconfigured

even for supporting new scene understanding tasks

by just updating the vision logics hosted in each

sensor. On the converse, scalar sensors (like

inductive loops) and specific sensors like radar have

no flexibility in providing information different form

the one they were built for.

LightweightComputerVisionMethodsforTrafficFlowMonitoringonLowPowerEmbeddedSensors

665

3 TRAFFIC FLOW ANALYSIS

In this Section, a sample ITS applications based on

computer vision over SCN is reported. It regards the

estimation of vehicular flows and is based on a

lightweight computer vision pipeline that is

dissimilar form the conventional one used on

standard architectures.

More precisely, the analysis of traffic status and

the estimation of level of service are usually

obtained by extracting information on the vehicular

flows in terms of passed vehicles, their speed and

typology. Conventional pipelines start with i)

background subtraction and move forward to ii)

vehicle detection, iii) vehicle classification, iv)

vehicle tracking and v) final data extraction. On

SCN, instead, it is convenient to adopt a lightweight

approach; in particular, data only in Region of

Interest (RoI) is processed, where the presence of a

vehicle is detected. On the basis of these detections,

then, flow information is derived without making

explicit use of classical tracking algorithms.

3.1 Background Subtraction

More in detail, background subtraction is performed

only on small quadrangular RoIs. Such shape is

sufficient for modelling physical rectangles under

perspective skew. In this way, when low vision

angles are available (as common in urban scenarios),

it is possible to deal with a skewed scene even

without performing direct image rectification, which

can be computationally intensive on an embedded

sensor. The quadrangular RoI can be used to model

lines on the image (i.e. a 1 pixel thick line) either.

On such RoI, lightweight detection methods are

used to classify a pixel as changed (in which case it

is assigned to the foreground) or unchanged (in

which case it is deemed to belong to the

background). Such decision is obtained by

modelling the background. Several approaches are

feasible. The simplest one is represented by

straightforward frame differencing. In this approach,

the frame before the one that is being processed is

taken as background. A pixel is considered changed

if the frame difference value is bigger than a

threshold. Frame differencing is one of the fastest

methods but has some cons in ITS applications; for

instance, a pixel is considered changed two times:

first when a vehicle enters and, second, when it exits

from the pixel area. In addition, if a vehicle is

homogeneous and it is imaged in more than one

frame, it might be not detected in the frames after

the first. Another approach is given by static

background. In this approach, the background is

taken as a fixed image without vehicles, possibly

normalized to factor illumination changes. Due to

weather, shadow, and light changes the background

should be updated to yield meaningful results in

outdoor environments. However strategies for

background update might be complex; indeed it

should be guaranteed that the scene is without

vehicles when updating. To overcome these issues,

algorithms featuring adaptive background are used.

Indeed this class of algorithms is the most robust for

use in uncontrolled outdoor scenes. The background

is constantly updated fusing the old background

model and the new observed image. There are

several ways of obtaining adaptation, with different

levels of computational complexity. The simplest is

to use an average image. In this method, the

background is modelled as the average of the frames

in a time window. Online computation of the

average is performed. Then a pixel is considered

changed if it is different more than a threshold from

the corresponding pixel in the average image. The

threshold is uniform on all the pixels. Instead of

modelling just the average, it is possible to include

the standard deviation of pixel intensities, thus using

a statistic model of the background as a single

Gaussian distribution. In this case, both the average

and standard deviation images are computed by an

online method on the basis of the frames already

observed. In this way, instead of using a uniform

threshold on the difference image, a constant

threshold is used on the probability that the observed

pixel is a sample drawn from the background

distribution, which is modelled pixel by pixel as a

Gaussian. Gaussian Mixture Models (GMM) are a

generalization of the previous method. Instead of

modelling each pixel in the background image as a

Gaussian, a mixture of Gaussians is used. The

number k of Gaussians in the mixture is a fixed

parameter of the algorithm. When one of the

Gaussian has a marginal contribution to the overall

probability density function, it is disregarded and a

new Gaussian is instantiated. GMM are known to be

able to model changing background even in cases

where there are phenomena such as trembling

shadows and tree foliage (Stauffer and Grimson,

1999). Indeed, in those cases pixels clearly exhibit a

multimodal distribution. However, GMM are

computationally more intensive than a single

Gaussian. Codebooks (Kim et al., 2004) are another

adaptive background modelling techniques

presenting computational advantages for real-time

background modelling with respect to GMM. In this

method, sample background values at each pixel are

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

666

quantized into codebooks, which represent a

compressed form of background model for a long

image sequence. That allows to capture even

complex structural background variation (e.g. due to

shadows and trembling foliage) over a long period

of time under limited memory.

Several ad hoc procedures can be envisaged

starting with the methods just described. In

particular, one important issue concerns the policy

by which the background is updated or not. In

particular, if a pixel is labelled as foreground in

some frame, we might want that this pixel does not

contribute in updating the background or that it

contributes to a lesser extent. Similarly, if we are

dealing with a RoI, we might want to fully update

the background only if no change has been detected

in the RoI; if a change has been detected instead, we

may decide not to update any pixel in the

background.

3.2 Transit Detection

The transit detection procedure starts taking in input

one or more RoIs for each lane suitably segmented

in foreground/background by the aforementioned

methods. When processing the frame acquired at

time t, the algorithm decides if a vehicle occupies

the RoI R

or not. The decision is based on the ratio

of pixels changed with respect to the total number of

pixels in R

, i.e.:

(t)=#(changed pixels in R

)/ #(pixels in R

)(1)

Then a

(t) is compared to a threshold



in order to

evaluate if a vehicle was effectively passing on R

. If

(t) >



and at time t-1 no vehicle was detected, then

a new transit event is generated. If a vehicle was

already detected instead at time t-1, no new event is

generated but the time length of the last created

event is incremented by one frame. When finally at a

time t+k no vehicle is detected (i.e. a

(t) <



) , the

transit event is declared as accomplished and no

further updated. Assuming that the vehicle speed is

uniform during the detection time, the number of

frames k in which the vehicle has been observed is

proportional to the vehicle length and inversely

proportional to its speed. In the same way, it is

possible to use two RoIs R

and R

lying on the

same lane but translated by a distance , to estimate

the vehicle speed. See Figure 1.1. Indeed, if there is

a delay of  frames, the vehicle speed can be

estimated as v=/(*) where  is the frame rate.

The vehicle length can in turn be estimated as l=k/v.

Clearly, the quality of these estimates varies greatly

with respect to several factors, and is in particular

due to a) frame rate and b) finite length of RoIS.

Indeed, the frame rate generates a quantization error,

which leads to the estimation of the speed range;

therefore, the approach cannot be used to compute

the instantaneous speed. For what regards b), an

ideal detection area is represented by a detection line

having length equal to zero. Otherwise, a

localization error affects any detection, i.e. it is not

know exactly where the vehicle is inside the RoI at

detection time. The use of a 1-pixel thick RoIs

alleviates the problem but it results in less robust

detections. This problem introduces some issues

both in vehicle length and speed computations,

because in both formulas we use the nominal

distance  and not the precise (and unknown)

distance between the detections. This is the

drawback in not using a proper tracking algorithm in

the pipeline, which would require however

computational resources not usually available on

embedded devices. Nevertheless, it is possible to

provide a speed and size class for each vehicle. For

each speed and vehicle class a counter is used to

accumulate the number of detections. Temporal

analysis on the counter is sufficient for estimating

traffic typologies, average speed and analysing the

level of service of the road, early identifying

possible congestions.

Figure 1: RoI configuration for traffic flow analysis.

4 SENSOR PROTOTYPE

In this section the design and development of a

sensor node prototype based on SCN concepts is

presented. This prototype is particularly suited for

urban application scenarios. In particular, the

prototype is a sensor node having enough

computational power to accomplish the computer

vision task envisaged for urban scenarios as

described in the previous section. For the design of

the prototype an important issue to follow has been

the use of low cost technologies. The node is using

sensors and electronic components at low cost, so

that once engineered, the device can be

manufactured at low cost in large quantities. The

LightweightComputerVisionMethodsforTrafficFlowMonitoringonLowPowerEmbeddedSensors

667

single sensor node has a main board that manages

both the vision tasks and the networking tasks thanks

to an integrated wireless communication module

(RF Transceiver).

Other components of the sensor node are given

by the power supply system that controls charging

and permits to choose optimal energy savings

policies. The power supply system includes the

battery pack and a module for harvesting energy,

e.g. through photovoltaic panel. See Figure 2:

Architecture of the sensor node

Figure 2: Architecture of the sensor node.

4.1 The Main Board

For the realization of the vision board, an embedded

Linux architecture has been selected in the design

stage for providing enough computational power and

ease of programming. A selection of ready-made

Linux based prototyping boards has been evaluated

with respect to computing power,

flexibility/expandability, price/performance ratio

and support. They were all find to have as common

disadvantages high power consumption and the

presence of electronic parts which are not useful for

the tasks of a smart camera node.

It has been therefore decided to design and

realize a custom vision component by designing,

printing and producing a new PCB. The new PCB

(see Figure 3) was designed in order to have the

maximum flexibility of use while maximizing the

performance/consumption ratio. A good compromise

has been achieved by using a Freescale CPU based

on the ARM architecture, with support for MMU -

like operating systems GNU/Linux.

This architecture has the advantage of integrating

a Power Management Unit (PMU), in addition to

numerous peripherals interface, thus minimizing the

complexity of the board. In addition, the CPU

package of type TQFP128 helped us minimize the

layout complexity, since it was not necessary to use

multilayer PCB technologies for routing. Thus, the

board can be printed also in a small number of

instances. The choice has contributed to the further

benefit of reducing development costs, in fact, the

CPU only needs an external SDRAM, a 24MHz

quartz oscillator and an inductance for the PMU.

It has an average consumption, measured at the

highest speed (454MHz), of less than 500mW.

The board has several communication interfaces

including RS232 serial port for communication with

the networking board, SPI, I2C and USB

For radio communication, a transceiver

compliant with IEEE 802.15.4 has been integrated in

line with modern approaches to IoT. A suitable glue

has been used to integrate the transceiver with the

IPv6 stack, also containing the 6LoWPAN header

compression and adaptation layer for IEEE 802.15.4

links. Therefore, the operating system is well

capable of supporting ETSI M2M communications

over the SCN.

Figure 3. Design of the PCB and main features.

4.2 Sensor, Energy Harvesting and

Housing

For the integration of a camera sensor on the vision

board, some specific requirements were defined in

the design stage for providing easiness of connection

and to the board itself and management through it,

and capability to have at least a minimal

performance in difficult visibility condition, i.e.

night vision. Thus, the minimal constraints were to

Figure 4: General setup of a single node.

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

668

be compliant with USB Video Class device (UVC)

and the possibility to remove IR filter or capability

of Near-IR acquisition. Moreover, the selection of a

low cost device was an implicit requirement

considered for the whole sensor node prototype.

The previously described boards and camera are

housed into an IP66 shield. Another important

component of the node is the power supply and

energy harvesting system that controls charging and

permits to choose optimal energy savings policies.

The power supply system includes the lead (Pb) acid

battery pack and a module for harvesting energy

through photovoltaic panel.

In Figure 4, the general setup of a single node

with the electric connections for the involved

components is shown.

5 EXPERIMENTAL RESULTS

For the traffic flow, the set-up consists in a small set

of SCN nodes, which are in charge of observing and

estimating dynamic real-time traffic related

information, in particular regarding traffic flow and

the number and direction of the vehicles, as well as

giving a rough estimate about the average speed of

the cars in the traffic flow.

Two versions of the algorithm were

implemented. In the first, the solutions uses frame

differencing as a background subtraction method,

obtaining a binary representation of the moving

objects in the RoI frame. In the second, an adaptive

background modelling based on Gaussian

distribution has been employed using a weighted

mixture of previous backgrounds. This means that

previous backgrounds are used with a heavier weight

in case of no-event occurring (i.e. no transit of car),

while they are used with light or no-weight in case

there is an event of transit occurring.

Test sequences have been acquired under real

traffic conditions and then used for testing both

algorithms. The ground-truth total for these

sequences was the following:

- 124 vehicles transited,

having the following length estimation subdivision:

- 11 with length between 0 and 2 metres

- 98 (between 2 and 5 metres)

- 15 (5 and more metres)

In the following figure a view from the sensor on

the testing scenario is shown.

Moreover, the algorithms yield a speed class

estimate, but for this type of data there is no ground

truth available.

Figure 5. Sample of frame from one of the test sequences

(in black and white are shown the RoIs).

The total classification results are shown in the

following table:

Ground-

truth

Alg.1

Frame diff.

Alg.2

Adaptive

Total

transited

vehicles

124 140 121

Correctly identified

vehicles

124

(100%)

118

(95.2%)

False positive

(12.9%)

(2.4%)

The first algorithm based on frame differencing has

a significant number of false positives but it reaches

a 100% identification rate, while the second adaptive

algorithm has an acceptable rate of identification

with a very low false positive rate. As a further step,

in the following two tables are shown the

classification estimates for the speeds and lengths

classes for each of the implemented algorithms.

Algo 1

Frame

Diff.

Speed

<20

Km/h

Sp. Betw.

20-35

Km/h

Speed

>35

Km/h

TOT

L. 0-2 m. 10 8 2 20

L. 2-5 m. 29 27 8 64

L. 5+ m. 0 10 46 56

TOT 39 45 56 140

Algo 2

Adaptive

Speed

<20

Km/h

Sp. Betw.

20-35

Km/h

Speed

>35

Km/h

TOT

L. 0-2 m. 25 1 1 27

L. 2-5 m. 27 35 3 65

L. 5+ m. 8 15 6 29

TOT 60 51 10 121

For a correct evaluation of these tables, it has to be

taken into account the fact that length estimates were

made roughly by an observer by sight, while there is

LightweightComputerVisionMethodsforTrafficFlowMonitoringonLowPowerEmbeddedSensors

669

no estimate at all on the ground truth regarding the

speeds. Furthermore, for the first algorithm all the

false positive were detected in the class of length 5

or more metres with fastest speed, and have been

identified as bugs related to the camera and its

automatic setting of balance and contrast. All these

issues and deeper analysis are under studying and

will provide more detailed results.

6 CONCLUSIONS

In this paper, we have presented technologies based

on computer vision for supporting urban mobility,

envisaging a number of applications of interest.

Then, as a sample, we introduced a specially-

designed lightweight pipeline for traffic flow

analysis that is suitable for embedded system with

constrained memory and computational power. Such

method has been tested on a prototype sensor we

designed and developed and whose main features are

also reported in this paper. The sensor, being low

cost and equipped with a wireless transceiver, is a

very good candidate for becoming the key ingredient

of a scalable and pervasive smart camera network

for the urban environment. Its good functionalities

are proved by the set of experimental results that

were collected on the field in realistic conditions. In

the future, besides refining the procedure for vehicle

characterization in term of speed and size, we plan

to extend the class of vision logics to address further

applications to mobility.

ACKNOWLEDGEMENTS

This work has been partially supported by POR

CReO 2007-2013 Tuscany Project “SIMPLE” –

Sicurezza ferroviaria e Infrastruttura per la Mobilità

applicate ai Passaggi a LivEllo, EU FP7 “ICSI” –

Intelligent Cooperative Sensing for Improved traffic

efficiency and EU CIP “MobiWallet” – Mobility and

Transport Digital Wallet.

REFERENCES

Buch; Orwel ; Velastin, 2010. Urban road user detection

and classification using 3D wire frame models IET

Computer Vision, Volume 4, Issue 2, p. 105 – 116.

Buch N., S.A. Velastin and J. Orwell, 2011. A review of

computer vision techniques for the analysis of urban

traffic. IEEE Trans. ITS, Vol. 12, N°3, pp. 920-939.

Candamo, Shreve, Goldgof, Sapper, Kastur, 2010.

Understanding transit scenes: A survey on human

behavior-recognition algorithms, IEEE Trans. ITS,

vol. 11, no. 1, pp.206 -224.

Digital Recognition, 2014. Available at:

http://www.digital-recognition.com/ (Last retrieved

October 28, 2014).

Guo, Wang, B. Yu, Zhao, X. Yuan, 2011. TripVista:

Triple Perspective Visual Trajectory Analytics and Its

Application on Microscopic Traffic Data at a Road

Intersection. PacificVis, 2011 IEEE: 163-170.

Kim K., Chalidabhongse T., Harwood D. and Davis L.,

2004. Background modeling and subtraction by

codebook construction, IEEE ICIP, 2004, pp. 2-5.

Lopes J., J. Bento, E. Huang, C. Antoniou and M. Ben-

Akiv, 2010. Traffic and mobility data collection for

real-time applications, IEEE Conf. ITSC, pp.216 -223.

Magrini M., Moroni D., Pieri G., Salvetti O, 2012. Real

time image analysis for infomobility. Lecture Notes in

Computer Science, vol. 7252, 207 – 218, 2012.

Remagnino P., A. I. Shihab, and G. A. Jones, 2004.

Distributed intelligence for multi-camera visual

surveillance. Pattern Recognition, 37(4):675–689.

Stauffer C. and Grimson W.E., 1999. Adaptive

background mixture models for real-time tracking,

Proc. CVPR Fort Collins, CO, USA: 1999, pp. 2: 246-

252.

Traficam, 2014. Available at: http://www.traficam.com/

(Last retrieved October 28, 2014).

Wang; Min Lu; X. Yuan; J. Zhang; Van De Wetering, H.,

2013. Visual Traffic Jam Analysis Based on

Trajectory Data, Visualization and Computer

Graphics, IEEE Transactions on, On page(s): 2159 -

2168 Volume: 19, Issue: 12.

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

670