3-D Position Detection of Partner Robot using SURF Descriptor and

Voting Method for Indirect Cooperation Between Multiple Robots

Toyomi Fujita and Kento Yamada

Department of Electronics and Intelligent Systems, Tohoku Institute of Technology, Sendai 982-8577, Japan

Keywords:

Robot Vision, Cooperation by Observation, SIFT (Scale-Invariant Feature Transformation), SURF (Speeded

Up Robust Features), Stereo Vision.

Abstract:

In some practical work by robots, it may happen that a working robot can not detect a target object for handling

due to a sensor occlusion. In this situation, if another cooperative robot observes the working robot with the

target object and detects their positions and orientations, it will be possible for the working robot to complete

the handling task. Such behavior is a kind of indirect cooperation. This study considers a method for such

an indirect cooperation based on an observation by the partner robot. The observing robot will be able to

perform such a cooperation by obtaining feature points and corresponding points on the working robot with

hand and the target object from multiple captured images, then computing 3-D positions of the targets and

motion of the hand. In this study, we mainly focus on 3-D position detection of the working robot and try

applying SURF (Speeded Up Robust Features) descriptor and a voting method for detecting the feature points

and corresponding points. The 3-D position of the working robot is then computed from these corresponding

points based on stereo vision theory. Fundamental experiments conﬁrmed the validity of presented method.

1 INTRODUCTION

Cooperation by multiple robots is effective to accom-

plish tasks. Multiple robots are able to complete a

task even when one robot is not able to perform it by

itself in some complicated environment. For example,

let us consider a situation in which a mobile working

robot that has a camera and a manipulator can not de-

tect a target object to handle due to an occlusion by its

arm for manipulation. In such a situation, if another

mobile robot that has a camera observes the working

robot, which is its partner, with the target object and

detects their positions and the hand motion for manip-

ulation, it can assist the handling of the working robot

indirectly by sending the information to the working

robot.

Such behavior is a kind of indirect cooperation.

This study considers a method for such an indirect

cooperation by the observing robot. The observ-

ing robot will be able to perform such a cooperation

by obtaining feature points and their corresponding

points on the working robot with hand and the target

object from multiple images captured at different po-

sitions, then computing their 3-D positions and mo-

tion of the hand. In this study, we mainly focus on

3-D position detection of the working robot. We try

applying SURF (Speeded Up Robust Features) de-

scriptor (Bay et al., 2008) and a voting method for

detecting the feature points and corresponding points.

3-D position of the working robot is then computed

based on stereo vision theory. Fundamental experi-

ments are conducted to conﬁrm the validity of pre-

sented method.

2 POSITION DETECTION OF

WORKING ROBOT

2.1 Robot Detection

Before detecting the 3-D position of the working

robot, the observing robot needs to detect it from cap-

tured images by a mounted camera. In order for the

detection, the observing robot compares two sets of

feature points on the working robot: one set is ob-

tained from an image captured in advance and another

set is obtained from an image currently observed.

Two points are picked from each set and they are cor-

responded if they have close feature values each other.

If the number of all possible corresponding points is

over than a threshold, we can consider that there is

522

Fujita, T. and Yamada, K.

3-D Position Detection of Partner Robot using SURF Descriptor and Voting Method for Indirect Cooperation Between Multiple Robots.

DOI: 10.5220/0006008905220525

In Proceedings of the 13th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2016) - Volume 2, pages 522-525

ISBN: 978-989-758-198-4

the working robot in the view of the image currently

observed. Then the area with which corresponding

points cover can be extracted as that of the working

robot in the image.

In this study, we propose a voting method for efﬁ-

cient detection. The captured image is divided by sev-

eral square regions. The working robot is recognized

if there are matched regions in which the number of

corresponding points is more than a threshold given

in advance. The matched regions are assumed as the

area of the working robot in the image.

2.2 3-D Position Detection

The 3-D position of the working robot is calculated

using stereo vision theory. Even if the observing robot

has only one camera, it can move to change its po-

sitions so that two or more images of the working

robot are captured from different views. The corre-

sponding points of feature are then obtained by the

images from different angles. We can apply stereo

vision techniques such as 8-point algorithm (Shi and

Tomasi, 1994) to those corresponding points to com-

pute the 3-D position.

3 FEATURE POINT DETECTION

BASED ON SURF DESCRIPTOR

In order for fast detection of the feature points,

we have tried to apply SURF (Speeded Up Robust

Features) descriptor (Bay et al., 2008). SURF is

improved algorithm of SIFT descriptor (D.G.Lowe,

1999) so that it can compute fast.

The SIFT descriptor is capable of robust detection

of feature points in an image. It is also able to describe

quantities of detected features robustly to the change

of scale, illumination, and rotation of image. It is,

therefore, useful for object detection and recognition.

The processes of the detection of SIFT features con-

sist of extraction of key points, localization, compu-

tation of orientation, and description of quantities of

features. In the process of the extraction of key points,

DoG (Difference of Gaussian) is used for searching

local maxima to detect the positions and scales of fea-

tures. Some points are then picked up from them by

the process of localization. The orientations for those

points are then computed, and their quantities of fea-

tures are described. To describe the quantities of fea-

tures based on the orientation, surrounding region di-

vided by 4×4 blocks at a feature point is rotated to the

direction of the orientation. Making a histogram on 8

directions for each block produces a 128(4 × 4 × 8)-

Figure 1: Experimental setup.

dimensional feature vector. The quantity of SIFT fea-

ture is represented by this vector.

The SURF descriptor is improved to be faster in

the above processes of extraction of key points and

description of quantities of features. In the process

of extraction of key points, SURF create the DoG im-

age by the determinant of Hessian using a box ﬁlter

instead of Gaussian function. The box ﬁlter is an ap-

proximate image of second derivative ﬁlter of Gaus-

sian. Using the box ﬁlter, the ﬁltering computation

becomes fast because it consists of pixels which have

same values so that we can obtain integral image in

advance. In the process of description of quantities

of features, the dimension of the feature vector is re-

duced to 64 from 128 by dividing orientation of each

block into 4 directions.

4 EXPERIMENTS

4.1 Experimental Setup

The method described abovehas been implemented to

two wheeled-mobile robots, Pioneer P3-DX (Mobile

Robots Pioneer P3-DX, 2007), which is 393 mm in

width, 445 mm in length, and 237 mm in height.

One robot has a camera, Canon VC-C50i, which

is able to rotate in pan and tilt directions so that it is

qualiﬁed as the observing robot. A board computer,

Interface PCI-B02PA16W, was also mounted in order

to process images from the camera in observation as

well as control the movement of the robot. We uti-

lized OpenCV for developing software for the image

processing in observation.

Fig. 1 shows experimental setup. The observing

robot initially stays at P

to observe and detect the

working robot, which does not have a camera, based

on the method described above. The observing robot

then changes the position from P

to P

in Fig. 1 to

observe the working robot in different visual angles in

order to obtain its corresponding points and calculate

its position.

3-D Position Detection of Partner Robot using SURF Descriptor and Voting Method for Indirect Cooperation Between Multiple Robots

523

4.2 Robot Detection

We extracted SURF features of the working robot

from registered images in advance. These features

were used for deﬁnition and detection of them by the

observing robot. The observing robot extracted fea-

tures from an input image and obtained correspon-

dences for the working robot to detect it.

We then applied the voting method to the robot de-

tection. Fig. 2 shows the result. Top panel shows the

result when the voting method was not applied. Bot-

tom panel shows the result when the voting method

was applied. Each left panel is a registered image

as the working robot. Each right panel is an image

used for detection and divided into rectangle regions.

The correspondences are indicated by solid brown

lines. In the right-top panel, the obtained correspond-

ing points are shown by blue circles and the extracted

rectangle regions are surrounded by red line. In the

right-bottom panel, the extracted points by the vot-

ing method are shown in pink circles and the rect-

angle regions surrounded by red line show detected

regions in which more than three feature points are

corresponded to those in the registered image.

The result showed the effectiveness of the voting

method. When the voting method was not applied,

improper regions were detected at the left bottom area

in the right-top panel. On the other hand, When the

voting method was applied, such improper regions

were not detected and the regions which are on the

robot are only detected as shown in the right-bottom

panel.

4.3 Position Detection

Fig. 3 shows correspondences of feature points be-

tween two images from P

and P

. The top panel

shows the image taken at P

and the bottom one shows

the image taken at P

. The left panel shows the result

when the voting method was not applied. The right

panel shows the result when the voting method was

applied. Each corresponding points are connected by

a solid line each other. In the result of the right panel,

unrelated correspondingpoints to the robot were com-

pletely excluded. This result shows that we can obtain

correct points to compute 3-D position of the working

robot by using the voting method.

Table 1 shows the errors of computed 3-D posi-

tions of the working robot by the two different meth-

ods; (a) shows the result when the voting method

was not applied, and (b) shows that when the vot-

ing method was applied. The errors of detected po-

sition values to actual position values on X, Y, and Z

directions are described. In both results, SURF de-

Figure 2: Experimental result of the working robot detec-

tion.

Figure 3: Correspondences between two images from

(top panel) and P

(bottom panel) without voting method

(left panel) and with voting method (right panel).

scriptor was used for detecting feature points. In the

result (a), the error on Y was large. The result of (b)

shows that the detection accuracy was improved spe-

cially on Y even though it was almost same on the

other directions. These results showed that effective

3-D position detection is possible by the use of the

voting method.

Table 2 shows the errors of computed 3-D posi-

tions of the working robot by using SURF and SIFT

descriptors. The errors of detected position values to

actual position values on X, Y, and Z directions are

described. The processing time in each case is also

described at the last row. In both results, the voting

method was also applied as (b) in Table 1. The re-

sult shows that each accuracy in the use of SURF is

almost same to that in the use of SIFT. With respect

to the processing time, however, the use of SURF de-

ICINCO 2016 - 13th International Conference on Informatics in Control, Automation and Robotics

524

Table 1: Error values on X, Y, and Z directions in 3-D po-

sitions computation of the working robot in different two

ways; (a) the voting method was not applied, and (b) the

voting method was applied.

(a) (b)

error on X 5% ( +14 mm ) 4% ( +12 mm )

error on Y 72% ( +862 mm ) 34% ( +404 mm )

error on Z 2% ( -2 mm ) 4% ( +4 mm )

Table 2: Comparison of 3-D position detection of the work-

ing robot between SURF and SIFT descriptors. Error values

on X, Y, and Z directions are described. In addition, pro-

cessing time in execution was also shown. In both result,

the voting method was applied.

SURF SIFT

error on X 4% ( +12 mm ) 8% ( +24 mm )

error on Y 34% ( +404 mm ) 33% ( +395 mm )

error on Z 4% ( +4 mm ) 9% ( -9 mm )

time 1112 ms 1988 ms

scriptor was signiﬁcantly much faster than the use of

SIFT. From these results, we can see that SURF is

very useful with respect to both of accuracy and exe-

cution time.

5 CONCLUSIONS

This study described fundamental detection of the po-

sitions of a working robot for assisting its object han-

dling task based on an observation by another robot in

the situation that the working robot can not perceive

the object. The fundamental experiments conﬁrmed

processes in the proposed method: detection of the

working robot and its 3-D position computation based

on the correspondences of SURF features. Our future

work will proceed with consideration of detection of

the other information: position of the target object,

hand motion, and so on. We will also try to expand

this method to practical case toward real-time cooper-

ation.

REFERENCES

Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L. (2008).

Speeded-up robust features (surf). Computer Vision

and Image Understanding, 110(3):346–359.

D.G.Lowe (1999). Object recognition from local scale-

invariant features. In Proc. of IEEE International Con-

ference on Computer Vision, pages 1150–1157.

Mobile Robots Pioneer P3-DX (2007). In

http://www.mobilerobots.com.

Shi, J. and Tomasi, C. (1994). Good features to track. In

Proc. IEEE Conf. on Computer Vision Pattern Recog-

nition, pages 593–600.

3-D Position Detection of Partner Robot using SURF Descriptor and Voting Method for Indirect Cooperation Between Multiple Robots

525