COOPERATION OF CPU AND GPU PROGRAMS

FOR REAL-TIME 3D MAP BUILDING

Yonghyun Jo

, Hanyoung Jang

, Yeonho Kim

, Joon-Kee Cho

, Hyoung- Ki Lee

, Young Ik Eom

and JungHyun Han

College of Information and Communication, Korea University, Seoul, Republic of Korea

Samsung Electronics Co., Samsung Advanced Institute of Technology, Suwon, Republic of Korea

School of Information and Communications Engineering, Sungkyunkwan University, Suwon, Republic of Korea

Keywords:

GPU program, Map building, CUDA.

Abstract:

This paper presents how the CPU and GPU programs coordinate in the context of 3D map modeling for a

mobile home service robot. In this study, the representation of the environment is given as point clouds,

and each scan of point clouds is quite efﬁciently processed using the parallel processing capability of GPU.

Then, the result is read back to CPU for incrementally constructing the map. Due to the coordination between

the CPU and GPU, a 3D map can be built at real time. This paper presents the software architecture of the

CPU-GPU coordination, the GPU algorithm, and its performance gain.

1 INTRODUCTION

Recently, much attention has been directed to home

service robots such as cleaning robots while the mar-

kets for conventional industrial robots are saturating.

A home service robot requires a map, which is a spa-

tial model of the physical environment. In our study,

the environment is ﬁrst captured as point clouds, and

then a 3D map is incrementally constructed from the

successive inputs of the point clouds. Noting that a lot

of planar surfaces, such as walls and ﬂoors, exist in

a typical indoor environments, we propose to extract

the plane information and store it into the 3D map.

CPU

GPU

scanning

map building

point clouds

(in a texture)

micro-planes

mean filtering

micro-plane

extraction

Figure 1: CPU-GPU coordination.

The algorithm proposed in this paper utilizes both

the CPU and GPU (Graphics Processing Unit) for

building a map with planar features included. The co-

ordination between them is illustrated in Fig. 1. First

of all, the scene is scanned and the point clouds are

stored in a texture, which refers to a rectangular array

of texels. Then, the texture is passed to GPU. Two

tasks are done by the GPU: (1) a micro-plane is ex-

tracted for each point, and (2) the micro-planes are

smoothed using mean ﬁltering. Finally, the buffer

storing the micro-planes is read back to CPU, and the

CPU builds a 3D map which contains plane informa-

tion of the environment.

This paper presents the micro-plane extraction and

mean ﬁltering algorithms, and shows why they can be

efﬁciently implemented in GPU. (Presenting the map

building algorithm is beyond the scope of this paper.)

The performance of the GPU implementation is pre-

sented, and is also compared with the CPU implemen-

tation of the same algorithm.

2 MOBILE ROBOT AND

WORKING ENVIRONMENT

Fig. 2-(a) shows the experimental robot and a sec-

tion of the home environment used in the experiment.

The time-of-ﬂight camera mounted on the robot scans

the environment at 5 frames per second (fps). Con-

sequently, the complete process of scanning a scene,

registering it into the previous scans, and constructing

the 3D map is done in 200 msec. A scan’s resolution

is 176×144. The section of the home environment

shown in Fig. 2-(a) requires 5 scans, and Fig. 2-(b)

shows the result of rendering the 3D points in a scan.

(Each point of the scan is represented as a 3D point,

and therefore can be directly rendered.) Fig. 2-(c)

visualizes the ﬁnal 3D map representation, where the

planar features are colored in red. (The points that are

302

Jo Y., Jang H., Kim Y., Cho J., Lee H., Ik Eom Y. and Han J..

COOPERATION OF CPU AND GPU PROGRAMS FOR REAL-TIME 3D MAP BUILDING.

DOI: 10.5220/0003613203020304

In Proceedings of the 6th International Conference on Software and Database Technologies (ICSOFT-2011), pages 302-304

ISBN: 978-989-8425-77-5

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

(a)

(b)

(c)

Figure 2: Indoor environment and its 3D map. (a) The robot

and a section of the environment. (b) The point clouds of

‘a’ scan for the section. (c) The 3D map representation with

the planar features included. (This is the output of the CPU

program.)

3 7 0

(b)

(a)

micro-plane

Figure 3: Micro-planes and normal computation. (a) A

micro-plane is presented by a point’s position and normal.

(b) A micro-plane’s normal is computed using the position

information of the neighboring points.

not included in the planar features are illustrated as

wireframe boxes containing the points.)

3 MICRO-PLANE EXTRACTION

In order to help the CPU detect the planar features of

the indoor environment, the GPU extracts the micro-

planes from the scanned points. For each point of a

scan, a micro-plane is extracted. As shown in Fig. 3-

(a), it is deﬁned as an inﬁnitesimal plane centered on

the point. To deﬁne its orientation, we have to com-

pute the normal. Fig. 3-(b) shows a block of 5×5

points in a scan. The normal of the center point (point

0) is computed as the cross product of the vector con-

necting points 1 and 3 and the one connecting points

2 and 4.

The GPU has a massively parallel processing ar-

chitecture such that hundreds of cores independently

operate on texels. A core is assigned to a texel to ex-

tract a micro-plane. The parallel architecture of the

GPU allows very efﬁcient manipulation of large bulks

of data, i.e., 176×144 texels in a scan.

Unfortunately, a lot of noises are generated in the

extracted micro-planes, i.e., the points sampled from

a real-world plane may have different normals. For

the purpose of de-noising, the GPU program uses a

mean ﬁlter (Boyle and Thomas, 1988). In the current

implementation, the ﬁltering process is iterated ﬁve

times. The parallel architecture of the GPU also ﬁts

to the ﬁltering algorithm.

The map building algorithm implemented by CPU

constructs a larger plane using a set of micro-planes.

The red-colored features shown in Fig. 2-(c) are the

planes extracted by the CPU algorithm. (Presenting

the map building algorithm is beyond the scope of this

paper.)

4 IMPLEMENTATION AND

EXPERIMENTAL RESULTS

GPU plays a key role in our study. This kind of

approach of solving non-graphics problems on GPU

is known as general purpose computation on GPU

(GPGPU). The micro-plane extraction and mean ﬁl-

tering algorithms are implemented in CUDA (Com-

pute Uniﬁed Device Architecture) (Nickolls et al.,

2008).

The algorithms for micro-plane extraction and

mean ﬁltering take about 5msec per 176×144-

resolution scan. For comparison, we also imple-

mented the same algorithms in CPU, and found they

take approximately 150msec, i.e., they are about 30

times slower than the GPU algorithm. Recall that our

goal is 5 fps map building, i.e., 200msec is assigned

to a scan. The CPU implementation makes it hard to

achieve the goal. It is because, in addition to micro-

plane extraction (and mean ﬁltering), the module for

‘map building’ shown in Fig. 1 should also be done

within 200msec. Unfortunately, ‘map building’ typi-

cally takes more time than micro-plane extraction and

mean ﬁltering.

Fig. 4 shows the experimental result for another

COOPERATION OF CPU AND GPU PROGRAMS FOR REAL-TIME 3D MAP BUILDING

303

(a)

(b)

Figure 4: Another test result. (a) Point clouds. (b) 3D map.

section of the home environment. Fig. 4-(a) is the

rendering result of the input point clouds from multi-

ple scans, and Fig. 4-(b) shows the 3D map where the

planar features are colored in red. The entire process

of scanning, micro-plane extraction, mean ﬁltering,

and map building takes about 70msec per scan.

5 CONCLUSIONS

A 3D map might be provided in advance to a mobile

robot, but it is not always possible especially for a dy-

namic environment. Therefore we need to construct

the map at real time. In our study, the home ser-

vice robot scans the environment ﬁve times per sec-

ond, and updates the map on the ﬂy. Such real-time

performance is achieved largely due to GPU imple-

mentation of the micro-plane extraction and mean ﬁl-

tering algorithms. The input point clouds are massive

(176×144 resolution) and independent of each other.

The parallel processing capability of many-core GPU

proves to satisfy the real-time constraint.

ACKNOWLEDGEMENTS

This research was supported by MKE, Korea un-

der ITRC NIPA-2011-(C1090-1121-0008) and the

National Research Foundation of Korea(NRF) grant

funded by the Korea government(MEST) (No. 2009-

0086684).

REFERENCES

Jang, H. and Han, J. (2008). Fast Collision Detection using

the A-Buffer. The Visual Computer, 24(7):659-667.

Boyle, R. D. and Thomas, R. C. (1988). Computer vision:

a ﬁrst course. Blackwell Scientiﬁc Publications, Ltd.,

Oxford, UK, UK.

Nickolls, J., Buck, I., Garland, M., and Skadron, K. (2008).

Scalable parallel programming with cuda. Queue,

6:40–53.

ICSOFT 2011 - 6th International Conference on Software and Data Technologies

304