ware is the use of intellectual property (IP) cores.
Theses cores cover the underlying complexity of
hardware peripherals and offer a more abstract in-
terface to the functionality. Most FPGA vendors
offer IP cores for often used peripherals like Ran-
dom Access Memory (RAM) or networking hard-
ware. However, the interface to these cores is still
complex as they usually offer bus interfaces (PLB,
AXI, Whishbone, etc.). To access these kind of
cores, the user has to implement a bus participant,
thus needs to know the bus specification in detail.
This is a non trivial task and highly time consum-
ing.
In this paper we present a framework aimed at accel-
erating the development of FPGA vision algorithms.
The problems of FPGA development stated above, es-
pecial the latter one, are overcome by a high level
of peripheral hardware abstraction. This abstraction
specifically suits to the needs of computer vision al-
gorithms.
The typical use case of this framework is the cre-
ation of a real time capable prototype system. In con-
trast to GPU based real time prototypes, the imple-
mentation of the algorithms is much closer to a pro-
duction ready state. This allows the user to predict
the overall cost and energy consumption of the sys-
tem very precisely. It is assumed that the algorithms
are already evaluated using a non real time capable
implementation. This reference implementation can
then be ported to the FPGA and be deployed with lit-
tle effort. The user can concentrate on developing the
vision algorithms and is not forced to invest time in
retrieving and passing on the data.
The focus here is not to provide an overall high
level of abstraction covering also the vision algorithm
itself. Instead, the framework allows to test and use
production ready HDL implementations of algorithms
without any infrastructural developing overhead.
2 RELATED WORK
A FPGA co-processor framework is presented in
(Kalomiros and Lygouras, 2008). Several vision al-
gorithms are evaluated using a commercial Simulink
to HDL translator. Communication to the host PC is
done via USB and the data flow is organized by a soft
processor. As being a co-processing system with no
direct access to the image data, the latency is higher
than in a pre-processing system but the possible range
of applications is broader.
A framework for verification of vision algorithms
is presented in (van der Wal et al., 2006). Concep-
tual similar to the approach described by us, they use
image pipelines to process the data. However, due to
the crosspoint switch, their processing entities can be
connected at run-time, allowing high flexibility. This
flexibility is useful for hardwired Application Specific
Integrated Circuits (ASIC) which cannot be reconfig-
ured.
3 FUNDAMENTAL CONCEPTS
The framework itself consists of modules connected
by streams and a supervisor organizing the system
configuration and data flow. It is assumed that the
platform on which the framework is running consists
of at least an FPGA, an external RAM and a commu-
nication module to a workstation PC (gigabit ethernet,
PCI Express, etc.) all interconnected by the system
bus.
A module encapsulates an arbitrary function. In
the simplest form a module operates on data of the in-
put streams and delivers the result on one or several
output streams. More complex modules also interact,
besides the streams, with lower level components like
bus interfaces or hardware peripherals. However, to
the user this complexity is opaque as only the stream
interfaces are visible to him. Modules can be instanti-
ated and connected as hardware description language
(HDL) entities in source files or more convenient to
the user, via a graphical user interface (GUI) by drag-
ging them into the system to be built.
A stream is a unidirectional data flow interface.
The most common use of it is to transfer pixel val-
ues. Note that this interface is kept as simple as pos-
sible. The synchronization of this interface is only
word-wise. Every other synchronization information
has to be implicit which means that the data format of
a stream has to be known a priori. This is in general
true for image processing algorithms. The implicit
synchronization offers several advantages. First, due
to the fixed input format, the module can process the
data in a statically way. This eases the development
and normally speeds up the implementation. Second,
the module can trust the format of the input data. No
error checking has to be done on the format of the
input data. This leads to a better encapsulation of
functionality as an exceptional state will be handled
inside a module instead of being passed between two
modules. One prerequisite of this to work is that ev-
ery module delivers correctly synchronized data on its
output streams. In most cases this is easier to archive
than format error checking on the input streams in
case of explicit synchronization.
The supervisor is a general purpose soft proces-
sor with low speed requirements. It organizes the data
PECCS 2012 - International Conference on Pervasive and Embedded Computing and Communication Systems
296