terms, GVF and balloon energy. In this way, if there is
no GVF field, then the balloon force acts on the active
contour, although its influence over the active contour
decreases as the GVF force increases in magnitude.
The superposition of effects get through the balloon
force and the GVF force is ruled by a user parame-
ter. The optimum value of this parameter is obtained
when the influence of the pressure force over the ac-
tive contour is null.
3 HARDWARE OVERVIEW
3.1 TMS320C60 EVM evaluation
board
The choice of DSP-based hardware permits the de-
velopment of a powerful and highly flexible sys-
tem. Due to its full programmability and high per-
formance, the TMS320C6701 processor mounted on
the TMS320C60 EVM is a suitable device for re-
searching and testing DSP algorithms. In the work
described in this paper, a personal computer was
equipped with the Code Composer Studio develop-
ment environment which helps to construct and debug
embedded real-time DSP applications. It provides
tools for configuring, building, debugging, tracing
and analyzing programs. Texas Instruments DSP’s
provide on-chip emulation support that enables Code
Composer Studio to control program execution and
monitor real-time program activity. The heart of the
TMS320C60 EVM evaluation board is the Texas In-
struments TMS320C6701 processor. The C6701 is
based on a VLIW-like architecture which allows it to
execute up to eight RISC-like instructions per clock
cycle.
The two data paths of the C6701 extend the func-
tionality of the data paths of the C6201 with support
for 64-bit data and IEEE-754 32-bit single-precision
and 64-bit double-precision floating-point arithmetic.
Each data path includes a set of four execution units,
a general-purpose register file, and paths for moving
data between memory and registers. The four exe-
cution units in each data path comprise two ALUs,
a multiplier and an adder/subtractor which is used
for address generation. The ALUs support both in-
teger and floating point operations, and the multipli-
ers can perform both 16x16-bit and 32x32-bit integer
multiplies and 32-bit and 64-bit floating point mul-
tiplies. The two register files each contain sixteen
32-bit general-purpose registers. To support 64-bit
floating point arithmetic, pairs of adjacent registers
can be used to hold 64-bit data. In addition to the
operations supported by the C6201, the C6701 of-
fers support for floating-point reciprocal and recip-
rocal square root estimation, and for converting data
between fixed- and floating-point formats.
3.2 Implementation
Although the code running on the DSP processor can
be written in C because the compiler generates an
quasi-efficient code, the performance of the applica-
tion can be maximized by using compiler options,
intrinsic instructions, and assembly code transforma-
tion. In fact, this previous strategy was used in our
code.
The computational cost of the active contour algo-
rithm constitutes most the computational cost of the
entire procedure. Therefore, the algorithm is imple-
mented in assembly language on the four execution
units in each data path and executed in parallel in or-
der to utilize all the hardware resources of DSP and
exploit all the capabilities of the architecture VLIW
and software pipelining. Because most of the millions
of instructions per second (MIPS) in DSP applications
occur in tight loops, it is important for the application
to make maximal use of all the hardware resources in
important loops. Fortunately, loops have more inher-
ent parallelism than non-looping code because multi-
ple iterations of the same code are executed with lim-
ited dependency between each iteration.
To maximize the efficiency of the code, the ap-
plication schedules as many instruction as possible
in parallel. For this, the relationships, or dependen-
cies, between instructions were determined. Figure 1
shows the assembly instructions for floating-point ac-
tive contour and allocated resources.The symbol || de-
fines which instruction is being executed in parallel,
while the symbol @ defines which iteration of the loop
is executing each cycle. For example, the rightmost
column shows that on during given cycle inside the
loop, the ADDSP instruction is adding data for iter-
ation n, the MPYSP instruction is multiplying data
for iteration n+2 (@@), and so on. No value can live
in a register for more than the number of cycles in
the loop. Otherwise, iteration n+1 writes into reg-
ister before iteration n has read that register. The
live-too-long problem means that no loop variable can
live longer than the iteration interval, because a child
would then read the parent value for the next itera-
tion. A simple solution is to break up the lifetime of
a variable by inserting move (MV) instructions in or-
der to break up the path of the variables into smaller
pieces and can live for minimum iteration interval.
One way to accomplish if-then-else in assembly code
is by means of conditional instructions in one of the
five general-purpose registers. Conditional instruc-
tions can handle both the true and false cases of if-
then-else statement. Branching is one way, although,
because each branch has five delay slots, this method
requires additional cycles. Furthermore, branching
A DSP-BASED ACTIVE CONTOUR MODEL
427