Material Recognition for Mixed Reality Scene including Objects’

Physical Characteristics

Kenzaburo Miyawaki and Soichi Okabe

Faculty of Information Science and Technology, Osaka Institute of Technology, Hirakata-city, Osaka-fu, Japan

Keywords:

Mixed Reality, Deep Learning, Material Recognition.

Abstract:

Mixed Reality (MR) is a technique to represent scenes which make virtual objects exist in the real world.

MR is different from Augmented Reality (AR) and Virtual Reality (VR). For example, in MR scenes, a user

can put a virtual Computer Graphics (CG) object on a desk of the real world. The virtual object can interact

with the real desk physically, and the user can see the virtual object from every direction. However, MR only

uses position and shape information of real world objects. Therefore, we present a new MR scene generator

considering real world objects’ physical characteristics such as friction, repulsion and so on, by using material

recognition based on a deep neural network.

1 INTRODUCTION

Mixed Reality (MR) has attracted more and more at-

tention. That is a new technique which extends Aug-

mented Reality (AR) and Virtual Reality (VR), but

different from them. MR can show very interest-

ing representation which mixes virtual objects and

real space. For example, if you put a virtual apple

Computer Graphics (CG) model on a desk of the real

space, the virtual apple has physical interaction with

the desk of the reality and then has been set on the

top of the desk. Additionally, you can see the virtual

apple from all directions. When there is another real

object on the desk of the real space and the virtual ap-

ple is moved to behind the real object, the part of the

apple will be hidden by the real another object. Thus

MR constructs the world that makes virtual objects

can interact with real objects.

However, the current MR only uses the position

and shape information of the virtual and real objects.

Therefore the interaction between virtual objects and

real objects is only collision of a constant pattern.

From this consideration, we propose a new method

for MR which can represent more realistic interaction

between virtual objects and real objects. Speciﬁcally,

our method recognizes materials of the real world ob-

jects, estimates their physical characteristics, such as

friction and repulsion, and builds MR scene which the

characteristics are reﬂected on. For example, imagine

a situation where a bouncy ball is dropped on a sofa.

It is thought that the ball dropped on the sofa does not

bounce much due to the inﬂuence of the cushion, and

it can not roll so long time, because it will stop imme-

diately by friction with the fabric. If this is done with

existing MR using a virtual bouncy ball, it will show

the same movement of jumping and rolling regardless

of falling on a sofa or on a ﬂoor. A sofa is soft and

easy to absorb shocks and a ﬂoor is hard, however

generic MR does not consider such physical charac-

teristics of the real objects. Contrary to this, our new

MR scene generator considers friction, repulsion and

contact sound of the real world objects. This method

recognizes the real world objects’ materials, estimates

thier physical characteristics and reﬂects the informa-

tion into the MR scene, so that realize more realistic

interaction between virtual and real objects.

The rest of this paper is organized as follows. In

the chapter 2, we describe related works. Chapter 3

shows our method overview. After that, we explain

the detailed algorithms in chapter 4. Chapter 5 shows

a brief demonstration of our MR scene generator, and

chapter 6 shows experiments of the material recogni-

tion that is an important function of our method. Fi-

nally, we conclude this paper in chapter 8.

2 RELATED WORKS

There are several studies that make virtual objects ap-

pear more realistically in MR scenes.

For example, Kakuda (Kakuta et al., 2008) uses

the MR technology to restore cultural properties.

Miyawaki, K. and Okabe, S.

Material Recognition for Mixed Reality Scene including Objects’ Physical Characteristics.

DOI: 10.5220/0008332502190224

In Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019), pages 219-224

ISBN: 978-989-758-382-7

219

Their method generates shadow images of virtual ob-

jects in real time based on the light source distribution

in the sky. That can make appropriate shadowing and

improve the reality of virtual objects. Inaba (Inaba

et al., 2012) realized robust feature point matching

against luminance changes for natural position align-

ment of virtual objects overlaying on the real world.

As described above, although there are many stud-

ies to make natural looks of virtual objects for MR, we

can not see so many researches for natural interaction

between virtual objects and real objects, that consid-

ers the physical characteristics of the material of real

objects.

3 METHOD

In this research, material recognition is a key func-

tion. That is based on the image recognition us-

ing deep learning, and repulsion and friction are es-

timated. At ﬁrst, color image is acquired by a de-

vice having a built-in depth sensor. We used Kinect

as the sensor. Object detection and recognition

are executed against the image by deep neural net-

work. Additional to this, transparent 3D meshes of

the detected objects are built from their point clouds

simultaneously. Next, each detected object’s material,

such as metal, wood, fabric and so on, are classiﬁed

by deep learning, and physical characteristics corre-

sponding to the material will be added into their 3D

meshes. After that, all of the 3D meshes which in-

clude physical characteristics are merged and overlay

on the real world image. Inside of this scene, virtual

objects will be affected by the physical characteristics

of the 3D meshes, and more realistic representation is

possible. We can make a virtual ball’s bound motion

and contact sound change by the recognized material

of a ﬂoor where the ball dropped on.

Note that, the physical characteristics are not

strictly calculated in this research. They are led from

common sense. For example, in the case of fabric, it

is “soft” and “rough”, and in the case of metal, it is

“hard” and “slippery” in general. The deﬁnitions are

made in advance and added into the 3D meshes.

4 SYSTEM FLOW

Figure 1 shows the ﬂow of our system.

The system roughly consists of four processes:

object detection, material recognition, generation of

Kinect v2 https://developer.microsoft.com/ja-jp/

windows/kinect/

Color & depth image acquisition

Object detection

Clipping of

object image

Material recognition

Point cloud

modification

3D mesh construction

Adding material information

into the 3D meshes

Combinning all meshes

to build a Mixed Reality scene

Image Processing

Mesh generating

Figure 1: System ﬂow.

3D meshes corresponding to the objects, and adding

material information to the 3D meshes.

This section gives a detailed explanation of these

processes.

4.1 Object Detection

First, our system acquires a color image and a depth

image from the Kinect sensor. The color image is

aligned to the depth image. From the color image, real

world objects are detected by a deep neural network.

We used YOLO (Redmon et al., 2015), which is an

object detection and recognition framework based on

deep neural network algorithms.

Figure 2 shows a sample result of the object de-

tection and recognition that shows names and areas of

the objects. This result is used for the material recog-

nition and the 3D mesh generation thereafter.

Name:chair

left-top

(28, 80)

right-bottom

(284, 472)

Name:dining table

left-top

(278, 122)

right-bottom

(641, 437)

Name:book

left-top

(482, 152)

right-bottom

(572, 186)

Name:book

left-top

(355, 129)

right-bottom

(397, 186)

Figure 2: Example of object detection.

Since this research places importance on what

kind of material the detected objects are made up, it

does not need perfect accuracy of the object recog-

nition. Only objects’ areas are critical information,

and mis-recognition is no problem. Figure 2 shows

an example of mis-recognition. The “pencase” is rec-

ognized as a “book”, but their areas are correctly de-

tected so that the image clips of the objects can be

extracted for the material recognition.

KMIS 2019 - 11th International Conference on Knowledge Management and Information Systems

220

4.2 Material Recognition

After the object detection, material recognition fol-

lows. For the recognition, material image clips are

extracted as separate images from the color image,

based on the areas of the objects acquired in the

section 4.1 During this image extraction, the system

crops 10% of the top, bottom, left, and right pixels of

the object area so that only its main material image

remains.

Next, material recognition is performed on each

trimmed image by deep learning. We used Caffe (Jia

et al., 2014) as a framework of deep learning and used

GoogLeNet(Szegedy et al., 2014) as a model of deep

neural network. Learning data is image collection that

shows typical features of the materials. We collected

the images from Web. Figure 3 shows example im-

ages of the learning data.

Metal

Cloth

Wood

Plastic

Figure 3: Material image samples.

4.3 Point Cloud Modiﬁcation and 3D

Mesh Construction

3D mesh Construction is performed concurrently with

the process of the section 4.2. As preparation for the

3D mesh construction, point clouds are built from the

depth image based on the areas of the detected ob-

jects. However, there may be a case where the point

clouds of different objects partially overlap as shown

in Figure 4

Figure 4 is a overlappingexample. When the point

cloud of the “table” is built, the cloud includes point

clouds of the “pen” and the “cup”, they are noise

for the “table” and should be scraped off. We used

kdtree(Indyk et al., 1997) to remove them.

Figure 5 shows the overlapping objects whose

point clouds are included into the point cloud of the

“table”. We extract their overlapping points from the

point cloud of the “table” by the kdtree search and

Point clou e

included into the point

Figure 4: Overlapping of the point clouds.

Point clouds of the "cup" and the "pencase".

They should be removed from the point cloud of

the "table".

Figure 5: Overlapping objects.

The point cloud of the "table".

Other objects' point clouds were removed.

Figure 6: Removal of other objects’ point clouds.

remove the points. The remaining points are correct

points of the “table”. Figure 6 shows the result, which

is clearly removed the overlapping objects’ points.

3D meshes are constructed from such clean point

clouds.

4.4 Adding Material Information into

the 3D Meshes

The information of the materials obtained in the sec-

tion 4.2 is associated with individual meshes to create

a MR scene which makes virtual objects physically

interact with real objects. The scene generated here

is not a mere collision detection, but is a scene added

with physical characteristics such as a friction coef-

ﬁcient according to the real world. In this research,

friction and repulsion coefﬁcient and contact sound

Material Recognition for Mixed Reality Scene including Objects’ Physical Characteristics

221

are selected as additional attributes to give to the 3D

meshes. These attributes were registered in advance

and stored in a database for typical materials such as

wood, metal, fabric and plastic.

By the process of the section 4.2 and 4.3, the 3D

meshes already have been added with the names of

what materials they consist of. The coefﬁcients of

the correspondingobject are retrievedfrom the above-

mentioned database and given to its mesh to generate

a MR scene.

5 DEMONSTRATION

Figure 7 shows a state of bouncingwhen a virtual rub-

ber ball is droppedon a generated 3D mesh with phys-

ical characteristics. The red, green, blue and yellow

points have physical characteristics of “metal”, “plas-

tic”, “fabric” and “wood” respectively.

It can be seen that the “table” made of wood prop-

erly rebounds the ball upward highly. On the other

hand, in the case of “chair” whose surface is fabric,

the ball can not rebounce highly. Although we cannot

represent in this paper, the contact sound is changed

according to the materials appropriately. This is suc-

cess case of constructing more realistic MR scene

than the existing MR. The success ratio depends on

the material recognition and it was preferable result.

We show that in the next chapter.

Virtual

bouncy ball

The ball

Figure 7: Demonstration. (Yellow points and blue points

are recognized as “wood” and “fabric” respectively).

6 EXPERIMENTS

6.1 Method and Environment

The purpose of the experiment is to conﬁrm whether

the object detection and the material recognition are

properly performed. For this, we prepared some ob-

jects whose main materials can be classiﬁed into 4

kinds and arranged the objects randomly, and process-

ing is performed in the order of object detection and

material recognition to conﬁrm their accuracy. We se-

lected “fabric”, “metal”, “plastic” and “wood” as the

target materials.

This series of the object detection and material

recognition is regarded as one set, and 30 trials were

conducted. We changed the kind and number of the

objects, and rearrange the objects’ positions, to make

all situations of the trials become different.

Note that, since the system is supposed to be used

indoors, all experiments were conducted indoors, and

Kinect sensor looks at the target object from a certain

distance as shown in Figure 8.

objects

Figure 8: Experimental environment.

The conﬁguration of Figure 8 is based on the as-

sumption of a situation where a MR user who wears a

head mounted display and sits on a chair looks at the

objects on the table.

The objects prepared for the experiment are 15

categories that are shown in Figure 9, and sample sit-

uations of the experiment are shown in Figure 10. We

checked a success count of the object detection and

the material recognition for each object on the all tri-

als, and calculated their success ratio. The success

ratio of the object detection is calculated by the fol-

lowing equation.

Success

detected

total

(1)

Where Success

is the succes ratio of the object de-

tection, O

detected

is the count of detected objects and

total

is the count of objects located on the situations.

detected

and O

total

are summation over the 30 trials.

The success ratio of the material recognition is

calculated by the following equation.

Success

recognized

detected

(2)

KMIS 2019 - 11th International Conference on Knowledge Management and Information Systems

222

Where Success

is the succes ratio of the material

recognition and M

recognized

is the success count of the

material recognition. M

recognized

is summation as well

as O

detected

Figure 9: Target objects.

Figure 10: Target situations.

Speciﬁcations of a computer used for this experi-

ment are shown in the Table 1.

Table 1: Speciﬁcations of a computer for the experiment.

Item Speciﬁcation

CPU Intel Core i7-6700K

Main Memory 16GB

GPU NVIDIA GeForce GTX 980i

GPU Memory 6GB

6.2 Results

According to the Table 2, 3 and 4, the overall succes

ratio of the object detection and the material recogni-

tion was 81% and 89% respectively.

Table 2: Counts of the object detection and material recog-

nition.

Item Count

total

Total objects over the 30 trials

123

detected

Success count of the detection

100

recognized

Success count of the material recognition

Table 3: Success ratio of the detection and the recognition.

Item Ratio

Success

81%

Success

89%

Table 4: Success ratio of the recognition for each material.

Material Success ratio

Fabric 80.5%

Metal 85.1%

Plastic 57.9%

Wood 87.2%

7 DISCUSSION

7.1 Object Detection

The succes ratio of the object detection was 81%. Fig-

ure 11 shows an example shot of a object detection

failure. The “cutting board” can not be detected, be-

cause of occlusion. We have to consider more robust

algorithms against the occlusion.

Figure 11: Object detection failure.

Material Recognition for Mixed Reality Scene including Objects’ Physical Characteristics

223

7.2 Material Recognition

The succes ratio of the material recognition was 89%

and was preferable result. As shown in the Table 4,

“wood” and “metal” was good result. “Wood” materi-

als has grain and “metal” materials has specular high-

light. It is supposed that such appearances became

strong features which can distinguish them from other

materials. “Fabric” was moderate result also. We es-

timate that its rough surface became a good feature of

the “fabric” materials.

On the other hand, “plastic” was very hard to rec-

ognize. “Plastic” materials have specular highlight

like as the “metal”. We corrected learning samples

of the “plastic” while focuesd on such specular high-

light. However, object image clips extracted by the

object detection, did not have enough specular high-

light. This may be improved by more appropriate se-

lection of learning samples.

From the aspect of object location, overlapping of

objects dropped the success ratio. For example, when

some objects, which are composed of the “metal”,

“fabric” and so on, were put on a “table” whose sur-

face is “plastic”, the material of the “table” was not

classiﬁed successfully. This is a natural result, and we

have to use pixel level image segmentation to solve

this problem.

8 CONCLUSIONS

In this paper,we proposeda method to represent inter-

actions between virtual objects and real objects in MR

scene more realistically than conventional MR tech-

nologies by the material recognition of objects in the

real space. At ﬁrst, RGB-D camera grabs a color im-

age and a depth image. From the color image, our

system detects objects to get the positions of objects

in the real space. Then, material recognition using

deep learning is performed over the objects’ image

clips, and 3D meshes of the detected objects are con-

structed. After that, the results of the material recog-

nition is reﬂected to each corresponding object’s 3D

mesh. Physical characteristics, such as friction and re-

pulsion coefﬁcient and contact sound, are added into

the 3D meshes during the process. By overlaying the

3D meshes on the real world image, we can get more

realistic MR scene where not only the virtual objects

can interact with real objects, but also the motion of

the virtual objects changes by difference of materials

of the real objects. Our method will be applicable to

realize more realistic MR world which can be used to

many ﬁelds, such as sports with virtual balls, simula-

tion with virtual objects, and so on.

Currently, our method only recognizes the kind of

materials, and it does not consider how the material is

processed. For example, metal has been assumed to

be smooth on the surface, but some of them have been

rough machined. Similarly, varnished wood products

may have a smooth surface, but rough wooden objects

also exist. It is thought that more natural expression

becomes possible by considering not only the mate-

rial but also how its surface is processed. Addition-

ally, we have to consider pixel level object recogni-

tion. We used YOLO for ojbect recognition, and it

caluculates object’s bounding box. The bounding box

always includes other objects and it will affect mate-

rial recognition. We focuse on such points and con-

tinue to improve our method.

ACKNOWLEDGEMENTS

This work was supported by JSPS KAKENHI Grant

Number 17K01160.

REFERENCES

Inaba, M., Banno, A., Oishi, T., and Ikeuchi, K. (2012).

Achieving robust alignment for outdoor mixed reality

using 3d range data. In Proceedings of the 18th ACM

Symposium on Virtual Reality Software and Technol-

ogy, VRST ’12, pages 61–68, New York, NY, USA.

ACM.

Indyk, P., Motwani, R., Raghavan, P., and Vempala, S.

(1997). Locality-preserving hashing in multidimen-

sional spaces. In Proceedings of the Twenty-ninth

Annual ACM Symposium on Theory of Computing,

STOC ’97, pages 618–625, New York, NY, USA.

ACM.

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J.,

Girshick, R., Guadarrama, S., and Darrell, T. (2014).

Caffe: Convolutional architecture for fast feature em-

bedding. arXiv preprint arXiv:1408.5093.

Kakuta, T., Oishi, T., and Ikeuchi, K. (2008). Fast shad-

ing and shadowing of virtual objects using shadow-

ing planes in mixed reality. The journal of the Insti-

tute of Image Information and Television Engineers,

62(5):788–795.

Redmon, J., Divvala, S. K., Girshick, R. B., and Farhadi, A.

(2015). You only look once: Uniﬁed, real-time object

detection. CoRR, abs/1506.02640.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,

Anguelov, D., Erhan, D., Vanhoucke, V., and Rabi-

novich, A. (2014). Going deeper with convolutions.

In Large Scale Visual Recognition Challenge 2014,

ILSVRC ’14.

KMIS 2019 - 11th International Conference on Knowledge Management and Information Systems

224