EFFICIENT INVERSE KINEMATICS ALGORITHM BASED ON

CONFORMAL GEOMETRIC ALGEBRA

Using Reconﬁgurable Hardware

Dietmar Hildenbrand

Research Center of Excellence for Computer Graphics, University of Technology, Darmstadt, Germany

Holger Lange, Florian Stock, Andreas Koch

Embedded Systems and Applications Group, University of Technology, Darmstadt, Germany

Keywords:

Geometric algebra, geometric computing, computer animation, inverse kinematics, hardware acceleration,

reconﬁgurable hardware, runtime performance.

Abstract:

This paper presents a very efﬁcient approach for algorithms developed based on conformal geometric algebra

using reconﬁgurable hardware. We use the inverse kinematics of the arm of a virtual human as an example,

but we are convinced that this approach can be used in a wide ﬁeld of computer animation applications. We

describe the original algorithm on a very high geometrically intuitive level as well as the resulting optimized

algorithm based on symbolic calculations of a computer algebra system. The main focus then is to demonstrate

our approach for the hardware implementation of this algorithm leading to a very efﬁcient implementation.

1 INTRODUCTION

Based on an inverse kinematics application we show

that algorithms using the geometrically intuitive

mathematical language of conformal geometric alge-

bra can be implemented very efﬁciently using recon-

ﬁgurable hardware.

Our starting point is the inverse kinematics algo-

rithm for the arm of a virtual character of the paper

(Hildenbrand et al., 2006). It introduces the princi-

ple of optimizing geometric algebra algorithms using

the symbolic calculation feature of Maple. In sec-

tion 3 we present the optimized algorithm resulting

of this approach. In principle, we describe the algo-

rithm based on the computation of the coefﬁcients of

the main data structure of conformal geometric alge-

bra, the 32-dimensional multivector. This kind of op-

timized algorithms uses only basic arithmetic opera-

tions.

These properties provide an adequate basis for our

hardware implementation as described in section 4.

The optimized algorithm of section 3 uses only oper-

ations easily performable by hardware and the com-

putations of the coefﬁcients of the multivectors can be

executed in parallel. We use reconﬁgurable hardware

in order to have the chance to conﬁgure the hardware

acceleration depending on the application.

In a nutshell, the approach of this paper promises

to combine the easy development of algorithms based

on conformal geometric algebra with very efﬁcient

implementations in a wide ﬁeld of computer anima-

tion and computer graphics applications.

2 CONFORMAL GEOMETRIC

ALGEBRA

Blades are the basic computational elements and the

basic geometric entities of the geometric algebra. For

example, the 5D Conformal Geometric Algebra pro-

vides a great variety of basic geometric entities to

compute with. It consists of blades with grades 0, 1,

2, 3, 4 and 5, whereby a scalar is a 0-blade (blade of

grade 0). There exists only one element of grade ﬁve

in the Conformal Geometric Algebra. It is therefore

also called the pseudoscalar. A linear combination of

blades is called a k-vector. So a bivector is a linear

combination of blades with grade 2. Other k-vectors

are vectors (grade 1), trivectors (grade 3) and quad-

vectors (grade 4). Furthermore, a linear combination

of blades of different grades is called a multivector.

Multivectors are the general elements of a Geometric

Algebra. Table 2 lists all the 32 blades of Conformal

300

Hildenbrand D., Lange H., Stock F. and Koch A. (2008).

EFFICIENT INVERSE KINEMATICS ALGORITHM BASED ON CONFORMAL GEOMETRIC ALGEBRA - Using Reconﬁgurable Hardware.

In Proceedings of the Third International Conference on Computer Graphics Theory and Applications, pages 300-307

DOI: 10.5220/0001094603000307

 SciTePress

Table 1: Representations of the conformal geometric enti-

ties.

entity standard repr. direct repr.

Point P = x+

∞

+ e

Sphere s = P−

∞

∗

= x

∧x

Plane π = n+ de

∞

∗

= x

∧x

∧e

∞

Circle z = s

∧s

∗

= x

∧x

Line l = π

∧π

∗

= x

∧x

∧e

∞

Point Pair P

= s

∧s

∗

= x

∧x

Geometric Algebra. The indices indicate 1: scalar,

2..6: vector 7..16: bivector, 17..26: trivector, 27..31:

quadvector, 32: pseudoscalar.

Table 1 presents the basic geometric entities of

conformal geometric algebra, points, spheres, planes,

circles, lines and point pairs. The s

represent differ-

ent spheres and the π

different planes. The two rep-

resentations are dual to each other. In order to switch

between the two representations, the dual operator

which is indicated by ’

∗

’, can be used. For example

in the standard representation a sphere is represented

with the help of its center point P and its radius r,

while in the direct representation it is constructed by

the outer product ’∧’ of four points x

that lie on the

surface of the sphere (x

∧x

). In standard

representation the dual meaning of the outer product

is the intersection of geometric entities. For example

a circle is deﬁned by the intersection of two spheres

∧s

Quaternions are embedded in the Conformal Ge-

ometric Algebra in a very intuitive way. The main

observation is that an arbitrary line through the origin

represents the rotation axis for a quaternion if we use

the following deﬁnitions for the imaginary units

i = e

∧e

, (1)

j = e

∧e

, (2)

k = e

∧e

. (3)

A rotation around the line L as nomalized rotation

axis with an angle of φ can be computed as the fol-

lowing quaternion:

Q = cos(

) + Lsin(

) (4)

For example, if L = i = e

∧e

, the resulting quater-

nion

Q = cos(

) + isin(

)

= cos(

) + (e

∧e

)sin(

)

represents a rotation around the x-axis. For efﬁciency

reasons we use an approach to calculate quaternions

without the need of using trigonometric functions.

Table 2: A multivector in the 5D Conformal Geometric Al-

gebra is a linear combination of 32 blades. On hardware all

its coefﬁcients can be computed in parallel.

Index blade

1 1

2 e

3 e

4 e

5 e

∞

6 e

7 e

∧e

8 e

∧e

9 e

∧e

∞

10 e

∧e

11 e

∧e

12 e

∧e

∞

13 e

∧e

14 e

∧e

∞

15 e

∧e

16 e

∞

∧e

17 e

∧e

18 e

∧e

∞

19 e

∧e

20 e

∧e

∞

21 e

∧e

22 e

∧e

∞

∧e

23 e

∧e

∞

24 e

∧e

25 e

∧e

∞

∧e

26 e

∧e

∞

∧e

27 e

∧e

∞

28 e

∧e

29 e

∧e

∞

∧e

30 e

∧e

∞

∧e

31 e

∧e

∞

∧e

32 e

∧e

∞

∧e

The angle between two lines or two planes is deﬁned

as follows:

cos(θ) =

∗

·o

∗



∗



∗



, (5)

cos(

) = ±

1+ cos(φ)

(6)

and

sin(

) = ±

1−cos(φ)

, (7)

leading to the formulas

cos(

) = ±

∗

·o

∗

(8)

EFFICIENT INVERSE KINEMATICS ALGORITHM BASED ON CONFORMAL GEOMETRIC ALGEBRA - Using

Reconfigurable Hardware

301

Table 3: Input/output parameters of the inverse kinematics

algorithm.

parameter meaning

target point of wrist

φ swivel angle

length of the forearm and the upper arm

shoulder quaternion

elbow quaternion

and

sin(

) = ±

1−

∗

·o

∗

(9)

The signs of these formulas depend on the applica-

tion.

For the foundations of conformal geometric al-

gebra and its application to kinematics please refer

for instance to (L.Dorst et al., 2007), (Rosenhahn,

2003), (Bayro-Corrochano and Zamora-Esquivel,

2004), (Hildenbrand et al., 2005) and to the tutorials

(Hildenbrand et al., 2004) and (Hildenbrand, 2005).

3 THE OPTIMIZED INVERSE

KINEMATICS ALGORITHM

In this section we present the optimized inverse kine-

matics algorithm for the arm of a virtual character as

described in (Hildenbrand et al., 2006). We especially

use its optimization approach based on Maple in or-

der to get the most elementary relationship between

the input and output parameters of the algorithm.

The goal of the inverse kinematics algorithm is to

compute the quaternions Q

and Q

based on the tar-

get point P

and the parameters φ,d

(see table 3).

3.1 Compute the Swivel Plane

First of all, we compute the swivel plane. Accord-

ing to (Tolani et al., 2000) we use the swivel angle φ

as one free degree of redundancy. The swivel plane

is the plane rotated by φ around the line L

through

shoulder (at the origin) and P

(see ﬁgure 1).

The length of L

can be computed as

| =

+ p

. (10)

The coefﬁcients of the swivel plane optimized by

Maple are:

Figure 1: Swivel plane.

swivel

=(2cos

sin

− p

|cos

)/|L

swivel

=(2cos

sin

+ p

|−

|cos

)/|L

swivel

−2sin

cos

+ p

)

(11)

3.2 The Elbow Point P

With the help of the two spheres S

= P

−

∞

and

the sphere S

= e

−

∞

with center points P

and

and radii d

we are able to compute the circle

= S

∧S

determining all the possible locations of

the elbow as the intersection of the spheres (see table

1).The intersection with the swivel plane delivers the

point pair Pp = Z

∧π

swivel

Its coefﬁcients based on the optimizations of

Maple are as follows

swivel

+ p

+ d

−d

)

swivel

+ p

+ d

−d

)

swivel

+ p

+ d

−d

)

(1−d

)(p

swivel

− p

swivel

)

(1+ d

)(p

swivel

− p

swivel

)

(1−d

)(p

swivel

− p

swivel

)

(1+ d

)(p

swivel

− p

swivel

)

−1)(p

swivel

− p

swivel

)

(1+ d

)(p

swivel

− p

swivel

) (12)

GRAPP 2008 - International Conference on Computer Graphics Theory and Applications

302

Figure 2: Compute the elbow point.

We decide for one of the two possible elbow

points and compute the three components of the el-

bow point p

, p

from the point pair PP:

einf PP =(PP

−PP

)

+ (PP

−PP

)

+ (PP

−PP

)

tmp

= −PP

−PP

+ PP

−PP

+ PP

tmp

sqrt

√

tmp

=(PP

(PP

−PP

) + PP

(PP

−PP

+ tmp

sqrt

(PP

−PP

))/einf PP

=(PP

(PP

−PP

) + PP

(PP

−PP

)

+ tmp

sqrt

(PP

−PP

))/einf PP

=(PP

(PP

−PP

) + PP

(PP

−PP

)

+ tmp

sqrt

(PP

−PP

))/einf PP (13)

3.3 Calculate the Elbow Quaternion Q

The elbow angle θ

is computed with the help of the

line L

= (e

∧P

∧e

∞

)

∗

through the shoulder and

the elbow and the line L

= (P

∧P

∧e

∞

)

∗

through

the shoulder and the wrist. Based on these two lines

we are able to compute the angle between them (c

cos(θ

) =

∗

·L

∗

||L

∗

) according to equation (5).

Now, we are able to compute the quaternion Q

cos(θ

/2) + sin(θ

/2)i according to equation (1). It

represents a rotation around the local x-axis with the

angle θ

. The optimized version of this quaternion is

1+c

−

1−c

i, according to (8) and (9).

This quaternion rotates the upper arm correspond-

ing to the angle between the two yellow lines as

shown in ﬁg. 3.

The quaternion Q

of the rotation at the elbow

joint:

−p

−

1−

−p

i (14)

Figure 3: Use the elbow quaternion.

3.4 Rotate to the Elbow Position

At ﬁrst we calculate the middle line L

through the

origin within the same distance from the points P

and

= d

∞

+ e

. We will need this line L

the next step in order to rotate around this line.

To compute L

, we use the middle plane as the

difference of the two points P

and P

(π

= P

−P

)

and the plane through the origin and the points P

and P

as π

∗

= e

∧P

∧e

∞

and intersect them

= π

∧π

In order to rotate the elbow towards our already

computed point P

we have to rotate around the mid-

dle line of the previous step with angle π. This results

in a quaternion identical with the normalized middle

line (Q

). Figure 4 shows this rotation from

the z-axis L

to the elbow point with the help of the

yellow middle line.

The result for the quaternion Q

optimized by

Maple can be computed as follows:

tmp

−2d

+ d

−2d

+ d

+ 2d

+ d

| =

√

tmp

− p

)

− p

)

+ p

)

(15)

3.5 Rotate to the Wrist Location

The angle θ

(and the resulting quaternion Q

) is

computed with the help of the y-z-plane rotated by

EFFICIENT INVERSE KINEMATICS ALGORITHM BASED ON CONFORMAL GEOMETRIC ALGEBRA - Using

Reconfigurable Hardware

303

the quaternion Q

and the swivel plane. The plane

in y and z direction (with normal vector e

and zero

distance to the origin), is computed by π

= e

. The

rotated plane π

yz2

results in π

yz2

= Q

Figure 4: Rotate to the elbow position.

Figure 5: Rotate to the wrist location.

Based on these two planes we are able to compute

the angle between them as c

= cos(θ

) =

∗

yz2

·π

∗

swivel



∗

yz2



∗

swivel

according to equation (5) and we get the quaternion

= cos(θ

/2) + sin(θ

/2)k. It represents a rotation

around the local z-axis with the angle θ

. The opti-

mized version of this quaternion is

= ±

1+ c

1−c

k, (16)

according to (8) and (9).

Note: the sign of this quaternion depends on which

side of the plane π

yz2

the point P

is lying. This can

be easily computed with the help of the inner product

yz2

·P

. This quaternion rotates the arm to the wrist

location as shown in ﬁgure 5.

The rotation at the shoulder joint optimized by

Maple can be computed as follows:

=(π

swivel

+ q

−q

)−

swivel

+ q

swivel

))

swivel

+ π

swivel

+ π

swivel

sign =p

−q

+ q

)

sign =

sign

|sign|

scalar

1+ c

1−c

sign (17)

The resulting quaternion for the shoulder rotation

can now be computed as the product Q

= Q

. The

ﬁnal result of Q

optimized by Maple can be com-

puted as follows:

= −q

·q

scalar

+ q

·q

)i−

·q

−q

·q

scalar

) j+

·q

scalar

)k (18)

4 HARDWARE

IMPLEMENTATION

The previous section describes the inverse kinemat-

ics algorithm, originally developed based on confor-

mal geometric algebra, with the help of the equations

(10) to (18) using only basic arithmetic operations. In

principle they describe the computation of the compo-

nents of the 32-dimensional multivectors of the alge-

bra. Equation (12) for instance computes the 9 com-

ponents of a point pair. On hardware all these compu-

tations can be executed in parallel.

These equations provide an adequate basis for the

following hardware implementation.

4.1 The Hardware Platform

The accelerated hardware implementation targets a

Field Programmable Gate Array (FPGA). These re-

conﬁgurabledevicesallow the implementationof cus-

tom circuits, but provide only limited resources. The

Xilinx Virtex 2/4 series of FPGAs comprises an ar-

ray of lookup-tables (LUTs) with one associated Flip-

Flop (FF) per LUT. Each LUT can implement an arbi-

trary function of up to four independent 1-bit inputs.

GRAPP 2008 - International Conference on Computer Graphics Theory and Applications

304

Table 4: Comparison of required operations.

+, − ×

√

unoptimzed 147 87 8 16

optimized 69 42 8 10

To ﬁt the calculation on the FPGA and speed it

up, we decided to use only ﬁxed point calculations in-

stead of ﬂoating point operations (which is possible

but not as efﬁcient as ﬁxed point). As a side effect of

calculating in dedicated hardware, the cost of opera-

tions drastically changes: e.g. multiplication and ad-

dition need the same calculation time, whereas recip-

rocal value, division and square root are really costly

in both execution time and FPGA resources.

4.2 Preparatory Optimization

The equations described in the previous section were

the starting point for further optimization. They

where manually optimized to reduce the number of

operations, especially the expensive ones. E.g. we

changed the last six equations of (12) to

= PP

+ PP

= 2d

...

swivel

...

− p

...

swivel

...

)

−

= PP

−PP

= 2(p

...

swivel

...

− p

...

swivel

...

) (19)

m = 1,2,3

These six equations comprise the same number of

(constant) multiplications and additions, so the calcu-

lations seem as expensive as before. However, now

the PP

and PP

−

share a common subexpression,

so 2 multiplications and 1 addition are saved. Fur-

thermore, the constant multiplication in PP

is now

a constant multiplication with a power of 2, which

is implemented in hardware efﬁciently through sim-

ply shifting wire connections by the exponent value.

Since only pure wiring is involved, no additional

ressources are required for this kind of multiplication.

In the following computation of the elbow points

in (13), the frequent reference to PP

and PP

clearly obvious. However, as the references occur

only in a sum or difference, the new PP

and PP

−

can be used instead, thus eliminating an addition each

time. Only tmp

does not refer directly to a sum or

difference, but it can be expressed as

tmp

= ... + PP

−

+ .. .,

eliminating another 3 multiplications and 3 additions.

For a comparison of the required operations be-

fore and after optimization see table 4.

4.3 Fixed Point Conversion

The optimized equations from the previous sec-

tion (4.2) were manually transformed into dataﬂow

graphs. For conversion into ﬁxed point format suit-

able for efﬁcient hardware realization, we need to de-

ﬁne an as small as possible word length for every

variable and intermediate result in these graphs (e.g.,

see ﬁgure 6 calculation of p

from equation (13) ).

While automatic conversions from ﬂoating point to

ﬁxed point exist (Han, 2006), we preferred a manual

translation nevertheless, as automatic conversions are

still limited (e.g. not all operationsare supported). We

used two approaches for minimizing the word length

while retaining sufﬁcient precision.

Analytic approach: The input parameters have

a given precision and value range which were propa-

gated forward through the data ﬂow graphs. On the

other hand, the expected precision and value range of

the result were propagated backwards (e.g. the pre-

cision of a sum is the maximum of the precision of

the summands). Figure 6 shows the resulting value

ranges as annotations to the nodes.

Moreover, we can beneﬁt from dedicated knowl-

edge of the problem and inequalities to narrow the

value range even further. Referring to the graph in

ﬁgure 6, we know that the resulting point p

is the

elbow. Hence, the distance is given by d

, which also

implies that |p

| ≤ d

, |p

| ≤ d

and |p

| ≤ d

(nar-

rowed value ranges are shown inside dashed frames at

the nodes in ﬁgure 6).

Empiric approach: To verify the analytical anal-

ysis (and obtain results where the analytic approach

fails), all graphs were implementedin MATLAB. Then

random valid values were supplied as inputs to the

equations, which were calculated subsequently with

default Matlab double precision (= 64 bits) ﬂoating

point arithmetic as well as the Fixed-Point Toolbox

(The MathWorks, 2007). The Fixed-Point Toolbox

allows exact speciﬁcation of the number format used

for ﬁxed point arithmetic (sign bit, integer and frac-

tional word length) for each variable.

The analyses show that it is sufﬁcient to use a total

word length of 32 bits or less.

4.4 Hardware Realization

The dataﬂow graphs were implemented as a fully

spatial pipeline in the hardware description language

Verilog. Spatial parallelism means parallel execu-

tion of operations that can be performed simultane-

ously due to independent subexpressions. Pipelin-

ing further extends parallel execution to simultane-

ous calculation of sequentially dependent operations.

EFFICIENT INVERSE KINEMATICS ALGORITHM BASED ON CONFORMAL GEOMETRIC ALGEBRA - Using

Reconfigurable Hardware

305

Figure 6: Dataﬂow graph for p

Figure 7: Pipeline schedule for p

(cf. ﬁgure 6).

To this end, the parallel-sequential dataﬂow graph is

partitioned into many sequential execution steps, also

known as pipeline stages. The results of the ﬁrst ex-

ecution step are used as inputs for the second step.

This scheme is repeated for all subsequent operation

steps. Hence, parallel execution for all operations is

exploited if possible.

As a beneﬁt, the execution time for each pipeline

stage is minimized with respect to the whole sequen-

tial dataﬂow graph. The faster pipeline stages, execut-

ing in parallel, allow for a higher clock frequency of

the hardware, thus increasing overall throughput with

a set of results delivered at each clock cycle. The la-

tency is only experienced twice at pipeline startup and

shutdown, or respectively, in single shot mode.

We chose to optimize the pipeline for maximum

throughput to demonstrate the potential of acceler-

ating the algorithm in hardware. Nevertheless other

optimization goals (minimum use of hardware re-

sources/area, low latency, low power consumption)

are possible as well, but not discussed here.

Many subexpression trees have different heights

which mostly correspond to inhomogeneous numbers

of pipeline stages from common input values to in-

termediate values, the latter to be merged for subse-

quent calculations. To account for this difference, the

Table 5: Hardware mapping results.

Number of LUTs ≈ 30000 (45% of 2VP70 max.)

Clock frequency 100 MHz

Throughput 1 set of results per clock cycle

Latency ≈ 550 clock cycles

pipeline was balanced with additional ﬂip-ﬂop stages

on the shorter subtrees, hence providing intermediate

results synchronouslyat such merge points. The exact

pipline timing (also known as schedule) for p

(cf.

13) is shown in ﬁgure 7. The rectangles represent ﬂip-

ﬂops which temporarily store the intermediate results

of the 5 pipline stages (separated by dashed horizon-

tal lines). Note the balancing ﬂip-ﬂops for ein f PP,

tmp

sqrt

and PP

−

In contrast to a fully spatial pipeline which is cus-

tomized for the speciﬁc application, general purpose

instruction-based execution models as employed in

CPUs or Digital Signal Processors (DSPs) only pro-

vide for a very limited exploitation of both spatial

and sequential (pipelining) parallelism. E.g., a CPU

pipeline has to be ﬂushed and reﬁlled on every (un-

predicted) branch in the control ﬂow of the software

code. What is more, even a superscalar CPU hardly

executes more than half a dozen instructions per clock

cycle (Hennessy and Patterson, 2007, chapter 3.6).

5 RESULTS

The Verilog HDL description of the dataﬂow graph

was mapped to a Xilinx Virtex 2VP70 FPGA (Xil-

inx, 2005) using the Xilinx ISE tools. For synthesis,

we generated dedicated dividers and squareroot cores

with the Xilinx Core Generator (part of the ISE tools).

Multiplications were automatically mapped to multi-

plier units, which are specialized, non-reconﬁgurable

fast hardware blocks provided within the FPGA fab-

ric. Table 5 shows the mapping and performance re-

sults for the complete dataﬂow graph.

Our test scenario was the computation of the mo-

tion of the arm of the virtual character based on 100

steps of inverse kinematics computations. To evalu-

ate the hardware performance compared with a pure

software implementation, 100 data sets residing in

main memory which represent target points in space

are converted to quaternions by both our fully spa-

tial pipeline running at 100 MHz clock speed and a

Intel Centrino M715CPU running at 1.5 GHz. After

conversion, the resulting data sets are written back to

memory. Table 6 shows the accelerated HW and pure

SW execution times.

GRAPP 2008 - International Conference on Computer Graphics Theory and Applications

306

Table 6: SW and HW execution times for 100 data sets.

execution time

SW on CPU 860 us

HW pipeline 6 us

(5 us until ﬁrst data set +

100 * 10 ns remaining sets)

6 COMPARISON TO GPU

REALISATION

The ﬁned-grained parallelism exploited by our FPGA

realisation is very different from the coarser-grained

one accessible when implementing the algorithm on a

modern GPU. A typical example, such as the NVidia

G80 architecture (NVIDIA Corp., 2007), can process

just 128 operations in parallel, using a SIMD (small-

vector) paradigm. In our approach, however, each of

the 550 pipeline stages has 4.. .10 operatorsexecuting

in parallel, leading to a total parallelism of thousands

of operations. Additionally, the GPU processing ele-

ments have a ﬁxed architecture and cannot be tailored

to speciﬁc constants or precision requirements. Fur-

thermore, the GPU computation has to be manually

partitioned into units of parallel execution (so-called

threads or warps). In our approach, the mathemat-

ical formulation itself directly determines the hard-

ware structure of the accelerator, no additional par-

titioning is required. While we have not implemented

the algorithm on such a GPU yet, we expect the FPGA

realisation (especially one using a more recent chip

than the one in the current prototype) to be competi-

tive with a modern GPU implementation despite the

clock speed differences (550 MHz FPGA vs. 1350

MHz GPU). Such an evaluation is planned in a fur-

ther reﬁnement of this work.

7 CONCLUSIONS

We presented a way to implement algorithms based

on conformal geometric algebra on hardware. After

having developed an algorithm easy and geometri-

cally intuitive on a very high level we are using the

symbolic calculation functionality of Maple as a soft-

ware optimization procedure. While this approach is

already leading to very efﬁcient code we presented an

approach to further improve the performance using

conﬁgurable hardware. This implementation turned

out to be more than 100 times faster. These results

were shown based on an inverse kinematics algorithm

but we are convinced that this approach can be used

very advantageously in a lot of computer animation

and computer graphics applications.

REFERENCES

Bayro-Corrochano, E. and Zamora-Esquivel, J. (2004). In-

verse kinematics, ﬁxation and grasping using confor-

mal geometric algebra. In IROS 2004, September

2004, Sendai, Japan.

Han, K. (2006). Automating Transformations from

Floating-point to Fixed-point for Implementing Dig-

ital Signal Processing Algorithms. PhD thesis, Dept.

of Electrical and Computer Engineering, The Univer-

sity of Texas at Austin.

Hennessy, J. L. and Patterson, D. A. (2007). Computer ar-

chitecture. Kaufmann [u.a.], Amsterdam [u.a.].

Hildenbrand, D. (2005). Geometric computing in computer

graphics using conformal geometric algebra. Comput-

ers & Graphics, 29(5):802–810.

Hildenbrand, D., Bayro-Corrochano, E., and Zamora-

Esquivel, J. (2005). Advanced geometric approach for

graphics and visual guided robot object manipulation.

In proceedings of ICRA conference, Barcelona, Spain.

Hildenbrand, D., Fontijne, D., Perwass, C., and Dorst, L.

(2004). Tutorial geometric algebra and its applica-

tion to computer graphics. In Eurographics confer-

ence Grenoble.

Hildenbrand, D., Fontijne, D., Wang, Y., Alexa, M., and

Dorst, L. (2006). Competitive runtime performance

for inverse kinematics algorithms using conformal ge-

ometric algebra. In Eurographics conference Vienna.

L.Dorst, Fontijne, D., Mann, S., and Kaufman, M. (2007).

Geometric Algebra for Computer Science, An Object-

Oriented Approach to Geometry. Morgan Kaufman.

NVIDIA Corp. (2007). NVIDIA CUDA Compute Uniﬁed

Device Architecture Programming Guide Version 1.0.

NVIDIA Corp.

Rosenhahn, B. (2003). Pose Estimation Revisited. PhD

thesis, Inst. f. Informatik u. Prakt. Mathematik der

Christian-Albrechts-Universit¨at zu Kiel.

The MathWorks (2007). Fixed-Point Toolbox 2, Reference.

The MathWorks.

Tolani, D., Goswami, A., and Badler, N. I. (2000). Real-

time inverse kinematics techniques for anthropomor-

phic limbs. Graphical Models, 62(5):353–388.

Xilinx (2005). Virtex-II Pro and Virtex-II Pro X Platform

FPGAs: Complete Data Sheet (DS083). Xilinx.

EFFICIENT INVERSE KINEMATICS ALGORITHM BASED ON CONFORMAL GEOMETRIC ALGEBRA - Using

Reconfigurable Hardware

307