setup is well suited for investigations on paralleliza-
tion techniques and data flow coordination.We pro-
pose a multithreaded vision system based on a high
level of abstraction from hardware, operating system,
and even lower level vision tasks like morphological
operations.This minimizes the overhead for commu-
nicational tasks, as the amount of data transferred de-
creases in an abstract representation. Furthermore,
the scalability of the system with integration of mul-
tiple cores can be examined soundly by connecting
different machines to the JAST system, each running
a copy of the vision system (details in Section 4).
2 PARALLEL COMPUTATION
On an abstract level two major parallelization scenar-
ios may be identified: distribution of processing tasks
on multiple machines on one side and distribution of
tasks on a single machine with multiple processors
and / or cores on the other.
Many approaches employing the distributed sce-
nario have been proposed, see (Choudhary and Patel,
1990) for an overview regarding CV or (Wallace et al.,
1998) for a concrete implementation. However, with
recent development in integration of multiple cores
the latter scenario also becomes more relevant. Thus
there is increasing demand for algorithms fully ex-
ploiting parallel resources on a single PC. This is es-
pecially the case, where computational power easily
reaches the limits – e.g. in computer vision.
2.1 Communication
In parallel environments one can generally apply ei-
ther synchronous or asynchronous communication
strategies for data exchange between processes or
threads. Though being robust, due to its blocking
nature a synchronous approach can cause problems
especially for realtime systems where immediate re-
sponses have to be guaranteed. For this case asyn-
chronous non-blocking communication mechanisms
(ACM) have been proposed. With ACMs informa-
tion is dropped when capacities exceed – which is ac-
ceptable as long as the system does not block. Non-
blocking algorithms can be distinguished into being
lock-free and wait-free (Sundell and Tsigas, 2003).
Lock-free implementations guarantee at least one pro-
cess to continue at any time (with the risk of starva-
tion). Wait-free implementations avoid starvation as
they guarantee completion of a task in a limited num-
ber of steps (Herlihy, 1991).
According to (Simpson, 2003), ACMs can be clas-
sified based on the destructiveness of data access. The
classification of ACM protocols by (Yakovlev et al.,
2001) distinguishes data access with respect to their
overwriting and re-reading permission. One can find
manifold implementations of ACMs regarding each
of these classification schemes. Some common im-
plementations, e.g. from (Sundell and Tsigas, 2003)
use lock-free priority queues or employ FIFO-buffers
(Matsuda et al., 2004).
2.2 Parallelization Techniques
According to (Culler et al., 1999) we have to distin-
guish parallelization techniques by means of data-
domain or function-domain. With function-domain
parallelization the overall computation process is di-
vided into stages and each thread works on a sepa-
rate stage. In contrast to this, with data-domain par-
allelization data is partitioned and each partition re-
quires the same computation performed by equally
designed threads (Chen et al., 2007). This distinction
may be correct and worthy for low level vision tasks
like edge detection, but this paper will show, that on
a higher level a carefully modeled CV system does
not require this distinction. Moreover a combined ap-
proach can be derived and, on the basis of an asyn-
chronous data management, a system implementing
both aspects can perform very well in practice.
Aiming this goal, we first have to deliberately de-
sign anchor points for distributed computation. Also,
the level of abstraction considering computational
tasks matters in terms of parallelization. In order
to avoid unnecessary overhead regarding communica-
tion and take full advantage of the multicore environ-
ment, we decided to model concurrent computation
on a high level of abstraction. Therefore, we do not
intend to parallelize primitive control-structures – like
for-loops – specific to a programming language. In-
stead we try to identify major and subsequently minor
tasks of computation (see Fiture 2).
For function-domain parallelization we assume,
that the division into well-defined functional submod-
ules is feasible. In the processing layer of the pro-
posed CV system this is obviously the case, as we can
identify three major functional stages: Preprocess-
ing, Analysis and Interpretation and Postprocessing.
Further refinement divides these stages into subtasks.
Modules implementing a task independently pick a
data partition (also called data item below), analyze it
and write it back. In case new items are created within
the analysis, these are also stored in the corresponding
data management queue (see Section 3).
As the recognition process is decomposable in
the function-domain, we now have to achieve data-
domain parallelization in order to prove our claim.
ICINCO 2008 - International Conference on Informatics in Control, Automation and Robotics
302