REAL WORLD SENSORIZATION AND VIRTUALIZATION FOR

OBSERVING HUMAN ACTIVITIES

Koji Kitamura

Tokyo University of Science

2641, Yamazaki Noda-shi Chiba 278-8510 JAPAN

Yoshifumi Nishida, Makoto Kimura

Digital Human Research Center, National Insititute of Advanced Industrial Science and Technology (AIST)

2-41-6, Aomi Koto Tokyo 135-0064 JAPAN

CREST, JST (Japan Science and Technology Agency)

Hiroshi Mizoguchi

Tokyo University of Science

2641, Yamazaki Noda-shi Chiba 278-8510 JAPAN

Keywords:

Human Behavior Detection, Ubiquitous Computing, Sensorization, Distributed Sensor.

Abstract:

This paper describes a method for robustly detecting and efﬁciently recognizing daily human behavior in real

world. The proposed method involves real world sensorization for robustly observing his or her behavior using

ultrasonic 3D tags, which is a kind of an ultrasonic location system, real world virtualization for creating a vir-

tual environment through modeling 3D shape of real objects by a stereovision system, and virtual sensorization

of the virtualized objects for quickly registering human activities handling objects in real world and efﬁciently

recognizing target human activities. As for real world sensorization, this paper describes algorithms for ro-

bustly estimating 3D positions of objects that a human handles. This paper also describes a method for real

world virtualization and virtual sensorization using the ultrasonic 3D tag system and a stereovision system.

1 INTRODUCTION

The observation of human activities in the real world

makes it possible to input personal information into

a computer without any conscious operation of an in-

terface. Human-centered applications based on im-

plicit input of human information require the facility

to observe and recognize activities as a basis. This

paper describes a method for realizing a function for

robustly and efﬁciently detecting daily human activity

events in the real world.

There are two problems in realizing and utilizing

a function for recognizing human activity in the real

world: the robust observation of a human activity

pattern, and the efﬁcient recognition of meaning of

activity from the observed pattern. Without solving

the ﬁrst problem, a human activity pattern to be ana-

lyzed cannot be obtained. Without tackling the sec-

ond problem, guaranteeing a solution to the equation

within the time frame demanded by the application is

impossible.

As a method for efﬁcient recognition of activity, the

idea of object-based activity recognition has been pro-

posed (Mizoguchi et al., 1996). In theory, the activity

of handling objects in an environment such as an of-

ﬁce or home can be recognized based on the motion

of the objects. However, when applying the method to

real environments, it is difﬁcult to even achieve an ad-

equate level of object recognition, which is the basis

of the method.

Separating the problems of object recognition and

activity recognition is becoming increasingly realis-

tic with the progress in ubiquitous computing tech-

nology such as microcomputers, sensor, and wireless

networks technology. It has now become possible to

resolve object recognition into the problems of sen-

sorizing objects and tagging the objects with identiﬁ-

cation codes (IDs), and to address activity recognition

separately through the development of applied tech-

nology.

As for robust observation of human activity, this

paper describes a method for ”sensorizing objects in

Kitamura K., Nishida Y., Kimura M. and Mizoguchi H. (2004).

REAL WORLD SENSORIZATION AND VIRTUALIZATION FOR OBSERVING HUMAN ACTIVITIES.

In Proceedings of the Sixth International Conference on Enterprise Information Systems, pages 15-20

DOI: 10.5220/0002625400150020

 SciTePress

real world” using a special device. The present au-

thors have developed a three-dimensional ultrasonic

location and tagging system, an ultrasonic 3D tagging

system, for that purpose. In terms of cost and robust-

ness against environmental noise, the ultrasonic sys-

tem is superior to other location techniques such as

visual, tactile, and magnetic systems. A number of ul-

trasonic location systems have already been proposed

or commercialized (Hopper et al., 1999; Shih et al.,

2001). The system presented in the present paper is

developed speciﬁcally to address the issue of robust-

ness and accuracy in real time when a person handles

objects having ultrasonic 3D tags.

As for efﬁcient recognition of target activity, this

paper describes a method for ”creating virtual ob-

jects” and ”virtually sensorizing the virtualized ob-

jects” for recognizing target activity. It is important

to create virtual environment extracting essential fea-

tures of the real world so that the created virtual envi-

ronment can eliminate unnecessary process but can

maintain association with target phenomena of the

real world. The method enables a user to quickly reg-

ister target activity to be recognized interactively on a

computer.

This paper is organized as follows. The next sec-

tion describes the method for real world sensoriza-

tion using the ultrasonic 3D tagging system. The de-

veloped ultrasonic 3D tagging system is introduced

brieﬂy. Algorithms for robustly measuring 3D posi-

tions of the objects handled by a person and exper-

imental results are shown. Section 3 describes the

method for creating virtual objects and virtually sen-

sorizing the virtual objects using the ultrasonic 3D

tagging system and a stereovision system.

2 REAL WORLD

SENSORIZATION FOR ROBUST

DETECTION OF HUMAN

ACTIVITY

2.1 Ultrasonic 3D tag

The ultrasonic 3D tagging system developed by the

authors(Nishida et al., 2003) consists of an ultrasonic

receiving section, an ultrasonic transmitting section,

a time-of-ﬂight measuring section, a network section,

and a personal computer. The ultrasonic receiving

section receives ultrasonic pulses emitted from the

ultrasonic transmitter and ampliﬁes the received sig-

nal. The time-of-ﬂight measuring section records the

travel time of the signal from transmission to recep-

tion. The network section synchronizes the system

and collects time-of-ﬂight data from the ultrasonic re-

ceiving section. The positions of objects are calcu-

lated based on more than three time-of-ﬂight results.

The sampling frequency of the proposed system is

50 Hz. The system can keep the sampling frequency

as high as 50 Hz when the number of the target trans-

mitters are less than three or four(Hori et al., 2003). A

user of the system can attach ultrasonic receivers on

arbitrary positions of ceilings or walls and can eas-

ily calibrate the receivers’ positions using a portable

calibration device.

Figure 1 shows the experimental systems for evalu-

ating a function for robust detection of human activity.

The experimental results are shown later. The upper

part of the ﬁgure shows a tiny, a small, and a long

life battery type of ultrasonic 3D tag and objects with

ultrasonic 3D tags.

Tiny type

(12x12x20mm)

Small type

(28x20x17mm)

Long life

battery type

(65x44x20mm)

Tag

Figure 1: Ultrasonic 3D tag and sensorized environment

The room was 3.5 × 3.5 × 2.7 m in size, and was

ﬁtted with 307 ultrasonic receivers embedded in the

wall and ceiling. Tags were attached to various ob-

jects, including a cup and a stapler. Some objects

were ﬁtted with two transmitters.

2.2 Multilateration method 1:

linearization of the minimization

problem

Trilateration or multilateration algorithms have been

proposed in the ﬁeld of aerospace(Ho, 1993;

Manolakis, 1996). This paper presents the multilat-

eration algorithms applicable to a more general case

that multiple ultrasonic receivers are put on arbitrary

positions. Using distance data l

, l

and the receiver

positions (x

, y

, z

), (x

, y

, z

), we obtain the fol-

lowing spherical equations for the possible position

of the target.

− x)

+ (y

− y)

+ (z

− z)

= l

, (1)

− x)

+ (y

− y)

+ (z

− z)

= l

. (2)

By subtracting Eq. (2) from Eq. (1), we obtain an

equation for intersecting planes between the spheres.

ICEIS 2004 - HUMAN-COMPUTER INTERACTION

2(x

− x

)x + 2(y

− y

)y + 2(z

− z

)z =

− l

− x

− y

− z

+ x

+ y

+ z

(3)

By inputting pairs of (i, j) into the above equation,

we obtain simultaneous linear equations, as expressed

AP = B, (4)

where P =

, (5)

A =

2(x

− x

) 2(y

− y

) 2(z

− z

)

2(x

− x

) 2(y

− y

) 2(z

− z

)

2(x

− x

) 2(y

− y

) 2(z

− z

)

,(6)

B =







− l

− x

− y

− z

+ x

+ y

+ z

− l

− x

− y

− z

+ x

+ y

+ z

− l

− x

− y

− z

+ x

+ y

+ z







(7)

The position (ˆx, ˆy, ˆz) can then be calculated by a

least-squares method as follows.

P = (A

−1

B. (8)

This method minimizes the square of the distance be-

tween the planes expressed by Eq. (3) and the esti-

mated position. In actual usage, the rank of matrix A

must be considered.

2.3 Multilateration method 2:

Robust estimation by RANSAC

Data sampled by the ultrasonic tagging system is

easily contaminated by outliers due to reﬂections.

Method 1 above is unable to estimate the 3D po-

sition with high accuracy if sampled data includes

outliers deviating from a normal distribution. In the

ﬁeld of computer vision, robust estimation methods

that are effective for sampled data including outliers

have already been developed. In this work, the ran-

dom sample consensus (RANSAC) (Rousseeuw and

Leroy, 1987; Fishler and Bolles, 1981) estimator is

adopted to eliminate the undesirable effects of out-

liers. The procedure is as follows.

1. Randomly select three distances measured by three

receivers (jth trial).

2. Calculate the position (x

, y

, z

) by trilatera-

tion.

3. Calculate the error ε

cji

for all receivers (i =

0, 1, ..., n) by Eq. (9), and ﬁnd the median ε

cji

4. Repeat steps 1 to 3 as necessary to ﬁnd the combi-

nation of measurements giving the minimum error,

and adopt the corresponding 3D position.

cji

−

− x

)

+ (y

− y

)

+ (z

− z

)

(9)

= med

|ε

cji

| (10)

(ˆx, ˆy, ˆz) = min ε

(11)

2.4 Robustness to occlusion

As in other measuring techniques such as vision-

based methods, it is necessary to increase the num-

ber of sensors to solve the problem of sensor occlu-

sion, where the line of sight to the target object is ob-

structed by other objects such as walls or room occu-

pants. In the present tagging system, the problem of

occlusion occurs often when a person moves or oper-

ates an object. These situations give rise to two sep-

arate problems; a decrease in the number of usable

sensors for the target, and an increase in reﬂections

due to obstruction and movement. As one of the most

typical situations where occlusion occurs, this section

focuses on occlusion due to a hand.

Figure 2 shows how the error increases and the

number of usable sensor decreases as a hand ap-

proaches an object ﬁtted with an ultrasonic trans-

mitter for the least-squares and RANSAC methods.

Although the error increases signiﬁcantly by both

methods when the hand approaches the object, the

RANSAC method is much less affected than the least-

squares method. This demonstrates that the propor-

tion of outliers increases when occlusion occurs, and

that RANSAC is more robust in this situation because

it can mitigate the effect of such outliers.

Distance from object [cm]

Error [mm]

100

120

140

160

180

200

10 15 20 25 30 35 40 45 50

Least Square

RANSAC

The number of valid sensors

Figure 2: Accuracy of the ultrasonic tagging system when

occlusion due to a hand occurs

2.5 Experimental results: robust

detection of human activity

Figure 3 shows the measured trajectory for a person

moving a cup to a chair, the ﬂoor, and a desk. The

REAL WORLD SENSORIZATION AND VIRTUALIZATION FOR OBSERVING HUMAN ACTIVITIES

ﬁgure demonstrates that the system can robustly mea-

sure the positions of the objects in most places of the

room regardless of occlusion by a hand or body. In

the current system, the sampling frequency is about

50 Hz. Basically this frequency decreases to 50/n Hz

when n objects are being monitored although the sys-

tem can keep the sampling frequency as high as 50 Hz

when the number of the target transmitters is less than

three or four(Hori et al., 2003). However, it is possible

to maintain a high sampling frequency by selecting

which transmitters to track dynamically. For exam-

ple, a transmitter can be attached to a person’s wrist,

and the system can select transmitters in the vicinity

of the wrist to be tracked, thereby reducing the num-

ber of transmitters that need to be tracked at one time

and maintaining the highest sampling frequency pos-

sible.

chair

cup

trash

cell phone

documents

stapler

Figure 3: Robust detection of human activity

3 VIRTUAL SENSORIZATION

FOR QUICK REGISTRATION

AND EFFICIENT

RECOGNITION OF HUMAN

ACTIVITY

3.1 Virtual Sensorizaztion

This section describes a method for virtualizing real

objects and virtually sensorizing the virtualized ob-

jects for efﬁciently recognizing human acitivities.

The real objects virtualization enables to extract es-

sential geometric features of real objects by simplify-

ing 3D shape of real objects. The 3D shape simpliﬁ-

cation is performed using a stereovision ﬁtted with ul-

trasonic 3D tags in combination with interactive soft-

ware. The software abstracts the shapes of objects in

real world as simple two-dimensional shapes such as

lines, circles, or polygons.

The virtual sensorization of virtualized objects en-

ables to extract essential physical phenomena among

the real objects relating to target activity events. In

order to describe the real world events when a person

handles the objects, the software abstracts the func-

tion of objects as simple phenomena such as touch,

detouch, or rotation. The software adopts the concept

of virtual sensors and effectors to allow the user to de-

ﬁne the function of the objects easily through simple

mouse operations. For example, to deﬁne the activity

”put a cup on the desk”, the user simpliﬁes the cup

and the desk as simple two-dimensional models of a

circle and a rectangle using the photo-modeling func-

tion of the software. Using a function for editting vir-

tual sensors, the user then adds a ”touch” virtual sen-

sor to the model of the desk, and adds a ”bar” effector

to the model of the cup. Details of real object virtu-

alization, virtual sensorization of virtualized objects,

registration of target activity, and real time detection

and recognition of the target activity are described in

the following.

3.2 Virtual Sensorizaztion Procedure

Step A: Real object virtualization Figure 7 shows

examples of simpliﬁed 3D shape models of objects

such as a tissue, a cup, a desk and a stapler. The cup

is expressed as a circle and the desk as a rectangle.

The simpliﬁcation is performed using a stereovision

in combination with photo-modeling function (Fig. 6)

of the software.

There is a problem with photo-modeling function

of stereovision. It is difﬁcult to have a target object to

be modeled in stereovision’s sights.To solve the prob-

lem, the authors developed the stereovision system

ﬁtted with multiple ultrasonic 3D tags. We call the

system an ”UltraVision”. Since the UltraVision can

track its position and posture, it is possible to move

the UltraVision freely when the user creates simpli-

ﬁed 3D shape models and the system can integrate

the created models into the world coordinate system.

Concrete process for integrating models is described

in the following.

We assume that the UltraVision is placed at posi-

tion P1 initially and moves from position P1 to posi-

tion P2. The UltraVision has sterevision system and

the ultrasonic 3D tags. There are two coordinate sys-

tems, U1 and C1 as shown in Fig. 4. U1 indicates the

local coordinate system whose origin is the position

of a tag attached on the UltraVision placed at posi-

tion P1, and C1 indicates the local coordinate system

of stereovision. Coordinate systems U2 and C2 are

deﬁned similarly to the case of U1 and U2.

Since the relative location between the stereovi-

sion and tags doen’t change even if the Ultravision

moves, the transformation matrices M

c1u1

and M

c2u2

are constant as follows.

c1u1

= M

c2u2

= M

(12)

If M

is known, we can transform the local coordi-

nate value P

and P

to the world coordinate value

ICEIS 2004 - HUMAN-COMPUTER INTERACTION

using the following equation.

= M

u1w

· M

· P

(13)

= M

u2w

· M

· P

(14)

Note that M

u1w

and M

u2w

can be calculated using

the positions of multiple tags attached on the UltraV-

ision after the UltraVision moves.

Example of moldeling large room actualy using Ul-

traVision based on this process is Fig. 5.

(World coordinate system)

Pc1

Pc2

Mc2u2(=Mcu)

Mc1u1(=Mcu)

Mu1w

(Stereovision coordinate system)

(Coordinate system of UltraVision

placed at position P1 )

Mc1w

(Stereovision coordinate system)

Mc2w

(Coordinate system of UltraVision

placed at position P1 )

Mu2w

Figure 4: Coordinate Conversion in UltraVision system

Chair

Table

Desk

Chair

Stapler

Cup

Ultrasonic 3D tag

Stereoscopic camera

Virtualized objects

Objects in real world

UltraVision

Figure 5: UltraVision for virtualizing objects and example

of virtualized objects

Step B: Virtual sensorization of virtualized objects

The software creates a model of an object’s function

by attaching virtual sensors and effectors to the model

created in step A. Virtual sensors and effectors are

prepared in advance by the software and function as

sensors and effectors affecting the sensors on com-

puter. The current system has an ”angle sensor” for

detecting rotation, a ”bar effector” to represent touch,

and a ”touch sensor” for detecting touch. In the right

part of Fig. 8, red indicate a virtual bar effector, and

green indicates a virtual touch sensor. Using simple

mouse operations, it is possible to add virtual sen-

sors/effectors to the 3D shape model.

+OCIGUHTQOUVGTGQUEQRKEECOGTC

5RGEKHKPIEJCTCEVGTKUVKERQKPVU

Figure 6: Photo-modeling by stereovision system

Figure 7: Real object virtualization

Step C: Associating virtual object sensor with

human activity event Human activity can be de-

scribed using the output of the virtual sensors created

in Step B. In Fig. 9, red indicates that the cup touches

the desk, and blue indicates that the cup does not. By

creating a table describing the relationship between

the output of the virtual sensors and the target events,

the system can output symbolic information such as

”put a cup on the desk” when the states of the virtual

sensors change.

Step D: Real time detection and recognition of hu-

man activity event When the software inputs the

$CT'HHGEVQT

6QWEJ5GPUQT

$CT'HHGEVQTVWTPUTGF

YJGPKVKUVQWEJKPIYKVJ6QWEJ5GPUQT

Figure 8: Create model of physical object’s function using

virtual sensors/effectors

REAL WORLD SENSORIZATION AND VIRTUALIZATION FOR OBSERVING HUMAN ACTIVITIES

#EVKXKV['XGPV

GZ;QWRWVVJGEWRQPVJGYJKVGVCDNG

$CT'HHGEVQTVQWEJGU6QWEJ5GPUQT

Figure 9: Associate output of virtual sensors with target ac-

tivity event

position data of the ultrasonic 3D tag, the software

can detect the target events using the virtual sensors

and the table deﬁned in Step A to C, as shown in Fig.

Human activity events

Output of virtual sensors

(touch state)

3D positions from ultrasonic 3D tag system

Refering table where association of virtual

sensors' output and activity events are defined.

Figure 10: Real time detection and recognition of human

activity by virtual object sensor

4 CONCLUSION

This paper described a method for robustly detecting

human activity in real world and a method for quickly

registering and efﬁciently recognizing target activity.

The robust detection of human activity is per-

formed by sensorizing objects in real world using an

ultrasonic 3D tagging system, which is a kind of an

ultrasonic location sensor. In order to estimate the

3D position with high accuracy and robustness to oc-

clusion, the authors propose two estimation methods,

one based on a least-squares approach and one based

on RANSAC. The results of experiments conducted

using 48 receivers in the ceiling for a room with di-

mensions of 3.5 × 3.5 × 2.7 m show that it is possible

to improve the accuracy and robustness to occlusion

by increasing the number of ultrasonic receivers and

by adopting a robust estimator such as RANSAC to

estimate the 3D position based on redundant distance

data.

The efﬁcient recognition of human activity in-

volves a method for creating virtual objects using the

ultrasonic 3D tagging system and a stereovision and

a method for virtually sensorizing the created vir-

tual objects interactively on a computer. To verify

the effectiveness of the function, using a stereovi-

sion with ultrasonic 3D tags and interactive software,

the authors registered activity such as ”put a cup on

the desk” and ”staple document” through creating the

simpliﬁed 3D shape models of ten objects such as a

TV, a desk, a cup, a chair, a box, and a stapler.

Further development of the system will include re-

ﬁnement of the method for measuring the 3D position

with higher accuracy and resolution, and development

of a systematic method for deﬁning and recognizing

human activity based on the tagging data and data

from other sensor systems.

REFERENCES

Fishler, M. and Bolles, R. (1981). Random sample consen-

sus: A paradigm for model ﬁtting with application to

image analysis and automated cartography. Commu-

nication of the ACM, 24:381–395.

Ho, K. (1993). Solution and performance analysis of geolo-

cation by tdoa. IEEE Transaction on Aerospace and

Electronic Systems, 29(4):1311–1322.

Hopper, A., Steggles, P., Ward, A., and Webster, P. (1999).

The anatomy of a context-aware application. In Pro-

ceedings of 5th Annual International Conference Mo-

bile Computing and Networking (Mobicom99), pages

59–68.

Hori, T., Nishida, Y., Kanade, T., and Akiyama, K. (2003).

Improving sampling rate with multiplexed ultrasonic

emitters. In Proceedings of 2003 IEEE International

Conference on Systems, Man and Cybernetics, pages

4522–4527.

Manolakis, D. (1996). Efﬁcient solution and performance

analysis of 3-d position estimation by trilateration.

IEEE Trans. on Aerospace and Electronic Systems,

32(4):1239–1248.

Mizoguchi, H., Sato, T., and Ishikawa, T. (1996). Robotic

ofﬁce room to support ofﬁce work by human be-

havior understanding function with networked ma-

chines. IEEE/ASME Transactions on Mechatronics,

1(3):237–244.

Nishida, Y., Aizawa, H., Hori, T., Hoffman, N., Kanade, T.,

and Kakikura, M. (2003). 3-d ultrasonic tagging sys-

tem for observing human activity. In Proceedings of

IEEE International Conference on Intelligent Robots

and Systems, pages 785–791.

Rousseeuw, P. and Leroy, A. (1987). Robust Regression and

Outlier Detection. Wiley, New York.

Shih, S., Minami, M., Morikawa, H., and Aoyama, T.

(2001). An implementation and evaluation of indoor

ultrasonic tracking system. In Proceedings of the 2001

IEICE Domestic General Conference.

ICEIS 2004 - HUMAN-COMPUTER INTERACTION