mance characteristics. Once an overall anomaly level
is obtained, the mapping procedure described in 3
selects the next set of implementations. We discuss
methodology in 4, results are presented in 5 and fol-
lowed by analysis of the findings in 6. We conclude
with pointers to future work in 7.
We consider first the architectural background of
this work, and then place it in the context of other
parked vehicle detection work evaluated on the same
dataset. FPGAs and GPUs are two of the most com-
mon ways of improving performance of compute-
intensive signal processing algorithms. In general,
FPGAs offer a large speedup over reference imple-
mentations and draw low power because they in-
stantiate logic which closely matches the applica-
tion. However, their use of arbitrary-precision fixed-
point arithmetic can impact accuracy slightly, and
they require specialised knowledge to program, of-
ten incurring longer development times (Bacon et al.,
2013). In contrast, GPUs have a software program-
ming model, using large arrays of floating-point units
to process data quickly, at the expense of power con-
sumption. Their ubiquitous presence in PCs and lap-
tops has made uptake very common in recent years,
and the availability of Nvidia’s CUDA language for
both desktop and (in the near future) mobile and tablet
devices greatly increases the potential for pervasive
vision computing.
When building a system to perform a process-
ing task, we consider each of the characteristics of
FPGA and GPU and select the system which best suits
our application, for example prioritising fast runtime
over low power consumption. This has been done for
various smaller algorithms, such as colour correction
and 2-D convolution, as in (Cope et al., 2010). This
design-time selection can also be done at a low level
using an optimisation algorithm as shown by (Bouga-
nis et al., 2009), although this is usually more con-
cerned with FPGA area vs. latency tradeoffs. As an
example, 1 shows design space exploration for power
and time characteristics of implementation combina-
tions used in this paper, with a well-defined Pareto
curve at the left-hand side. To the authors’ knowl-
edge, little work has been done in the area of power-
aware algorithm scheduling at runtime; the most ap-
propriate is arguably (Yu and Prasanna, 2002), al-
though this does not consider more than two perfor-
mance characteristics or deal with changing optimisa-
tion priorities over time.
The UK Home Office supplies the i-LIDS
dataset (Home Office Centre for Applied Science and
Technology, 2011) used in 5, and this has been used
for anomaly detection by other researchers. (Albiol
et al., 2011) identify parked vehicles in this dataset’s
0
50
100
150
200
250
300
350
400
450 500 550 600 650
180
190
200
210
220
230
processing time (ms)
system power consumption (W)
ped-ggg ped-cff ped-gff ped-gfg ped-cfc
ped-ccc
car-ggg
car-cfc car-gfg
car-ccc
Figure 1: Design space exploration for power vs. time trade-
offs for all possible combinations of car, pedestrian and mo-
tion detector implementations. Pedestrian implementations
are labelled by colour, and car implementations by shape.
PV3 scenario with precision and recall of 0.98 and
0.96 respectively. However, their approach is consid-
erably different from ours, in that they:- (a) require
all restricted-parking lanes in the image to be manu-
ally labelled first, (b) only evaluate whether an object
in a lane is part of the background or not, and (c) do
not work in real-time or provide performance infor-
mation. In addition, their P and R results do not de-
note events, but rather the total fraction of time during
which the lane is obstructed. Due mainly to limita-
tions within our detectors, our accuracy results alone
do not improve upon this state-of-the-art, but given
the points noted above, we are arguably trying to ap-
proach a different problem (anomaly detection under
power and time constraints) than Albiol et al..
2 SYSTEM COMPONENTS
A flow diagram for the high-level frame processing
framework is shown in 2. It works on offline videos
and dynamically calculates the number of frames to
drop to obtain real-time performance at 25FPS. (We
do not count video decode time or time for output im-
age markup and display as part of processing time.)
The ‘detection algorithms’ block generates bounding
boxes with type information, which is used by the re-
maining algorithm stages. (An expanded version of
this step showing all combinations is shown further on
in 4.) This section gives details of each algorithm in
1, i.e. the detection algorithms run on the image, and
method for calculating image anomaly level. The ob-
ject detectors and background subtractor were by far
the most computationally expensive algorithm stages,
so each of these had at least one accelerated version
available. We use a platform containing a 2.4GHz
dual-core Xeon CPU, a nVidia GeForce 560Ti GPU,
and a Xilinx ML605 board with a XC6VLX240T
FPGA, as shown in 3. Implementation performance
Event-drivenDynamicPlatformSelectionforPower-awareReal-timeAnomalyDetectioninVideo
55