Avatar2Avatar: Augmenting the Mutual Visual Communication between

Co-located Real and Virtual Environments

Robin Horst

1,2,3

, Sebastian Alberternst

, Jan Sutter

, Philipp Slusallek

, Uwe Kloos

and Ralf D

orner

German Research Center for Artiﬁcial Intelligence (DFKI), Saarbr

ucken, Germany

Reutlingen University of Applied Sciences, Reutlingen, Germany

RheinMain University of Applied Sciences, Wiesbaden, Germany

uwe.kloos@reutlingen-university.de

Keywords:

Mixed Reality, Video Avatar, Multi-user Environments, Low-cost, Knowledge Communication.

Abstract:

Virtual Reality (VR) technology has the potential to support knowledge communication in several sectors.

Still, when educators make use of immersive VR technology in favor of presenting their knowledge, their

audience within the same room may not be able to see them anymore due to wearing head-mounted displays

(HMDs). In this paper, we propose the Avatar2Avatar system and design, which augments the visual aspect

during such a knowledge presentation. Avatar2Avatar enables users to see both a realistic representation

of their respective counterpart and the virtual environment at the same time. We point out several design

aspects of such a system and address design challenges and possibilities that arose during implementation.

We speciﬁcally explore opportunities of a system design for integrating 2D video-avatars in existing room-

scale VR setups. An additional user study indicates a positive impact concerning spatial presence when using

Avatar2Avatar.

1 INTRODUCTION

One of the core aspects which connects areas of hu-

man mediated knowledge communication is the mu-

tual communication of humans. At that, the visual

aspect is the most crucial one because about 55% of

a communication is transported visually (Mehrabian

and Ferris, 1967).

When it comes to the usage of Virtual Environ-

ments (VEs), most VR setups constrain the commu-

nication due to wearing a closed VR HMD. This

leads to a lack of a mutual representation that can be

beneﬁcial for co-located knowledge communication

(Bronack et al., 2008). As a consequence of using

such technology immersed learners cannot see non-

immersed educators even if they are located in the

same room. Thereby, the pedagogical presence, that

is of importance relating to speciﬁc learning method-

ologies (Rodgers and Raider-Roth, 2006; Bronack

et al., 2008; Anderson et al., 2001), is constrained.

An example is a class-room setting, where teachers

integrate HMD VR technology in their course. They

cannot be seen by the learners, even though they are

in the same room.

By referring to the model of visual awareness that

Benford et al. (1994) proposed concerning collabo-

rative VEs and applying it to our application space,

no focus is provided for the immersed learners. Their

nimbus (the space in which users can be seen by oth-

ers), however, is available for non-immersed educa-

tors, so that they visually can perceive the immersed

learners. But these non-immersed educators instead

cannot get insights in the VE that the immersed learn-

ers act in.

In this paper, we contribute a novel collaborative

Mixed Reality (MR) system and its design aspects,

called Avatar2Avatar. This system indicates to equal-

ize the described information discrepancy and to aug-

ment the mutual awareness to enhance pedagogical

presence. We show the feasibility, discuss design

challenges and illustrate the integration into an ex-

isting room scale VR setup. We focus on utilizing

low-cost VR technology so that e.g. costly hardware

or procedures as complex device-calibration are ex-

cluded in advance. We also draw conclusions about

the actual presence relating to our system, measured

after Schubert et al. (2001).

Although it is possible to enhances similar co-

presence environments with 3D representations in

real-time (e.g. (Sousa et al., 2017; Gugenheimer et al.,

Horst, R., Alberternst, S., Sutter, J., Slusallek, P., Kloos, U. and Dörner, R.

Avatar2Avatar: Augmenting the Mutual Visual Communication between Co-located Real and Virtual Environments.

DOI: 10.5220/0007311800890096

In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019), pages 89-96

ISBN: 978-989-758-354-4

Figure 1: The setting of Avatar2Avatar: (a) virtual ﬁrst-person point of view; (b) RGB-D camera attached to an immersed

learner for capturing co-located educators; (c) RGB-D camera to capture all collaborators in the setting; (d) overview screen

showing the virtual scene and realistic avatar representations; (e) co-located educator with tracker.

2017) we indicate that 2D video-avatar representa-

tions still provided enough visual and spatial clues

to support visual communication and increase spatial

presence within our learning use-case.

As the foundation of our work, we subdivide the

overall system design into two sub-system concepts

which work as distributed components – one for each

class of users (immersed learners and non-immersed

educators):

1. Virtual POV – The immersed learners’ POV

get augmented by a captured overlay image of

the segmented collaborating non-immersed edu-

cators. This image is integrated into the VE as a

2D texture and then consequently rendered in the

HMD (Fig. 1 (a)). A head-mounted RGB+depth

camera prototype is utilized for this purpose.

2. Scene Overview – An overview perspective of

the virtual scene and image-textures of the im-

mersed learners and the non-immersed educators

are composed and visualized on a large screen.

This screen is placed within the room where both

user-classes are co-located in. As a consequence,

educators can orient themselves with respect to

the virtual scene and visually communicate with

the immersed learners (Fig. 1 (d)).

2 RELATED WORK

Pioneering work by Benford et al. (1994) and Ben-

ford and Fahl

en (1993) proposes concepts and tax-

onomies that mostly aim for considering co-operative

VEs with regard to tele-presence. They are applied

in several systems. Especially in MASSIVE (Green-

halgh and Benford, 1995) the authors show how dif-

ferently users interact with mutual users with respect

to the avatar representation and the degree of immer-

sion. While users with graphical representations of

others keep a personal distance, users with text-only

presentation of the scene are perceiving space com-

pletely different. They lack notions of this natural

personal space.

Billinghurst and Kato (1999) introduce a co-

located MR system that provides a three-dimensional

browser. The system offers one single degree of im-

mersion for all users and does not support immersive

VEs. The presented augmented reality interface of-

fers users to collaboratively browse web pages by us-

ing natural voice and gestures. This paper shows that

with a rising degree of immersion and the tendency

towards augmented virtuality (Milgram et al., 1995),

the amount of visual perception of co-located users

quickly decreases. The nimbus of users stays unaf-

fected, but their focus gets diminished. Even recent

system designs for co-located or tele-presence collab-

orations, as One Reality (Roo and Hachet, 2017), il-

lustrate similar issues. The need of a system design

that could extend existing VEs and provide mutual

visual representation for users with asymmetrical dif-

ferent immersion is indicated here. In particular re-

lating to multi-user environments that support more

than one level of immersion this is not a trivial task,

since multiple hardware setups must be incorporated

in such system designs.

Our system generally relates to realistic avatar

representations. There already exist several concepts

and systems that deal with realistic human avatars.

Each of them uses a different approach due to their

area of application. Huang and Kallmann (2009) pro-

pose a system which focuses on realistic motions for

avatars. Lok et al. (2014) propose a system that cre-

ates the realism aspect of avatars through high physi-

cality. There also exists work (Kotranza et al., 2009)

which introduces a system to augment an immersive

VE by providing haptic aspects. Here, non-verbal

communication beneﬁts from the realistic tangibility

of a virtual human avatar. In contrast to the above

we integrate the realism-aspect of our avatars through

real-time captured video textures of the co-present

HUCAPP 2019 - 3rd International Conference on Human Computer Interaction Theory and Applications

Figure 2: Compositional structure of the overall system design. The yellow shape depicts the system’s boundary, the red and

the green respectively a sub-system.

persons.

Current VR systems for collaborative environ-

ments similar to ours (Sousa et al., 2017; Gugen-

heimer et al., 2017) propose processes to integrate

real-time 3D aspects, as well. Gugenheimer et al.

(2017) on the one hand do not focus on avatars, but

on the interactions that are provided for the differ-

ent user-classes. Sousa et al. (2017) on the other

hand show opportunities and use-cases for both 3D

and 2D representations from video-streams for tele-

communication purposes. In the latter, a toolkit is

proposed that models displays as planar rectangular

static surfaces within the virtual room. Our work,

however, focuses on these limitations of the 2D avatar

integrations in VEs and proposes a system design for

ﬂexible and non-static video-avatars.

Further work (Garau et al., 2003; Latoschik et al.,

2017; Roth et al., 2016) points out the needs for non-

verbal communication beneﬁts for realistic avatars.

Therefore, we refer to work considering low-cost

technologies, like Kinect cameras or HTC Vive sys-

tems, which capture video textures and integrate them

into VEs (Slater et al., 2010; Nordby et al., 2016; Re-

genbrecht et al., 2017; Zhu et al., 2016; Lee et al.,

2006; Barmpoutis, 2013). Work by Lee et al. (2006)

which is, however, comparable to ours integrates 2D

avatars visible for only one class of users (the non-

immersed educators). The integration of such avatars

into the immersive part of the system (immersed

learners) or into an existing VR system is not men-

tioned in these examples.

3 AVATAR2AVATAR SYSTEM

DESIGN AND PROTOTYPE

Our system is used to visualize realistic avatar rep-

resentations of all users and the VE (Fig. 1 (d)) so

that collaborating non-immersed educators can see

it. Simultaneously, pictorial representations of non-

immersed educators are integrated in the immersive

VE. The immersed learners therefore are enabled to

perceive their collaborators while wearing the closed

HMD (Fig. 1 (a)).

The Avatar2Avatar system design is composition-

ally modeled in Fig. 2 using fundamental modeling

concepts (Kn

opfel et al., 2005). Here, we see how the

overall system (yellow boundary) is designed to be

run on two machines – each with a sub-system (green

and red boundary) which communicate over a local

network. The system is built from six components

(Fig. 2) which are described in detail in the next sec-

tions:

1. Synchronized Scene

2. Overview RGB+D

3. Overview Compositor

4. VR Scene

5. POV RGB+D

6. POV Compositor

All of the above modules can be classiﬁed into ei-

ther capturing the virtual scene (components 1 and

4), extracting people’s image (components 2 and 5) or

composing these with the virtual scene (components 3

and 6).

Avatar2Avatar: Augmenting the Mutual Visual Communication between Co-located Real and Virtual Environments

3.1 Capturing the Virtual Scene

We utilize compositing techniques to create two sepa-

rate textures: On the one hand, a full screen texture of

user representations and the virtual scene, to be drawn

on the external overview screen and on the other hand

a texture of non-immersed educators to be integrated

into the VE for the immersed learners. Therefore, we

incorporate color and depth information from the VE.

The modules VR Scene and Synchronized Scene ( 2)

provide this image-based data. To capture the neces-

sary information from the virtual scenes we point out

three tasks to be performed:

Camera Alignment – Since we compose real and

virtual footage, there must be an alignment of the

real and the virtual camera. For matching the cam-

eras’ rotation and position we use tracking provided

by the tracking system that is part of an HTC Vive

VR setup. Other parameters are approximated by ad-

justing the virtual camera so that no further optical

calibration is needed for the physical counterpart. Ex-

amples are the radial distortion of the optical lens and

extrinsic/intrinsic camera parameters (like the ﬁeld of

view).

Color Image Acquisition – While we propose

a design that is based on low-cost technologies, we

make use of the game engine Unity, which is appro-

priate for the concept. A major challenge when ac-

quiring visual information from game engines is the

interference with the game loop. In this context, an

asynchronous real time texture read-back from the al-

ready frequented GPU is compulsory. The rendering

must be performed independently of the game loop so

that a continuous and consistent image stream can be

assured.

Depth Acquisition – In contrast to simple color

image acquisition using virtual cameras, the function

of acquiring depth information must be provided, too.

To solve this issue a custom shader can be used to

write depth information into a separate texture. This

texture can asynchronously be read from the GPU the

same way the color image texture is read back. The

game loop which runs the existing VE remains unaf-

fected by this integration into our system design.

3.2 Capturing and Extracting Peoples’

Textures

To augment the mutual visual communication be-

tween real and virtual environments we propose to

integrate 2D images of peoples as video textures into

VEs that have different degree of immersion. For pro-

viding these realistic avatars we rely on real texture

representations which must be extracted from camera-

captured 2D images beforehand. This functionality

is provided by the Overview RGB+D and the POV

RGB+D modules ( 2).

For capturing RGB+D image resources, two Mi-

crosoft Kinects (1920x1080 RGB; 512x424 D; 30

fps) are utilized in the proposed system. A major

difference in terms of processing captured images

for either the overview screen or the virtual POV is

the segmentation of the peoples’ textures. For the

non-moving camera which captures the whole scene

(Fig. 1 (c) and Fig. 2 RGB+D Camera 2) the Kinect

API’s functionality can be used for segmentation.

The process that provides the textures for integra-

tion into the VE in contrast cannot rely on such APIs’

functionality. None offers the extraction of people

in images for a constantly moving camera, as it is

for simulating a ﬁrst person POV. This segmentation

therefore is handled during the compositing step it-

self.

3.3 Composing People and the Virtual

Scene

For compositing we differentiate between two mod-

ules, the Overview Compositor and the POV Com-

positor. They are responsible for composing the im-

age for the overview screen and the virtual ﬁrst per-

son POV representation respectively. Both composi-

tor modules receive similar data from the Kinects and

the virtual scene to perform a depth compositing in a

similar way as proposed by Zhu et al. (2016). Since

there already exists work which proposes straightfor-

ward solutions to use consumer-oriented VR technol-

ogy to visualize immersed people and their VE on an

external screen (Zhu et al., 2016), we will focus on the

novel aspects of composing video textures into im-

mersive VEs:

Integrating Two Compositing Systems – Our

system integrates the two compositing modules sim-

ilar to a client-server architecture. The ﬁnal image

for the external screen is processed and rendered di-

rectly on the external screen. The texture for the VE

in contrast must be processed in the compositor and

then sent back to Unity, where additional composit-

ing steps are processed.

Extracting People based on Room-scale VR

Technology – To handle the ﬁrst-person POV sim-

ulation challenge, mentioned in the previous sec-

tion, we cannot use a common method as for the

overview screen (e.g. provided by the Kinect API).

In Avatar2Avatar we propose to segment them by

combining a virtual bounding box (BB) (Fig. 3 (a)).

The depth information is furthermore used to per-

form a ﬁrst occlusion calculation by testing the depth

HUCAPP 2019 - 3rd International Conference on Human Computer Interaction Theory and Applications

Figure 3: Footage (a)-(d) for the compositor of the virtual POV (e) and footage (f)-(i) for the compositor of the overview screen

(j). a) shows the virtual bounding box of the educator (blue shirt). It is used with the color image (b), the corresponding depth

image (c) and the depth of the scene (d) to compose the texture of the educator into the view of the immersed learner (checked

shirt; (e)). A Kinect segmentation work-ﬂow is incorporated in our system ((g), (h)) to compose textures of both persons with

the virtual scene ((f), (i)) and render them together on an external screen (j).

of virtual objects to the captured depth of the non-

immersed educators.

Composing Realistic Avatar Textures Into a VE

– While the avatar texture here is already segmented

and occlusions with virtual objects are realistically

calculated (for counteracting false positive occlusion,

Tab. 1), it still must be integrated into the VE. There-

fore, we draw a 2D plane with dimensions of the tex-

ture at the position of the non-immersed educator and

apply the texture to it. As we capture the texture from

a ﬁrst person POV of the immersed learner, we must

ensure that this texture is projected on a plane orthog-

onally to the virtual viewing direction, as well.

Another challenge is the projection of a three-

dimensional human body onto a single depth value.

We propose to track the non-immersed educator so

that the avatar texture plane can be positioned in the

corresponding position within the VE. Since we only

tested for occlusions from virtual objects it is hy-

pothetically possible that extremities of the learners

could reach out from this single depth point. These

extremities could then the other way around occlude

a virtual object instead. This cannot be processed be-

forehand due to the three-dimensionality of the virtual

scene. As a consequence, we propose to offset the

plane about the length of half an arm span towards

the immersed user. This is a trade-off between the

correct placement of the texture and what we refer to

as false negative texture occlusion (Tab. 1).

Table 1: Classiﬁcation of the occlusion problem into

false/true negative/positive.

Pixel is in

front of the

object

Pixel is behind

the object

Pixel is drawn

in front of the

object

true positive false positive

Pixel is drawn

behind the ob-

ject

false negative true negative

4 EVALUATION

Compared to absence of mutual visual representation,

related work indicates that an existing and realistic

representation will improve the mutual visual com-

munication (see section 2. Since we integrate existing

VEs into our system, the impact of our concept on the

presence of immersed users is of interest. A negative

impact could negate the advantage of the augmented

communication at the expense of the presence.

The use-case of our study was set within a col-

laborative training scenario, where a non-immersed

trainer had to familiarize a trainee with a construc-

tion environment in the automotive section. New con-

struction procedures were to explain. A collaboration

with a robot assistant should be utilized by the im-

mersed trainee to solve the construction task.

Avatar2Avatar: Augmenting the Mutual Visual Communication between Co-located Real and Virtual Environments

4.1 User Study

The study involved 12 paid, voluntary participants

(7 male, aged 23 to 35 with Ø 26,5 and SD 3,34). The

procedure was based on the approaches in (MacKen-

zie, 2012). Participants were welcomed, ﬁlled out

demographics, then were asked to interactively ex-

plore the VE in cooperation with the experimenter and

ﬁnally were interviewed and ﬁlled out a post-study

questionnaire.

The design of the study included a random distri-

bution of the participants into two groups (between-

group design) - the experimental group which used

the proposed enhancement of the visual communica-

tion and the control group which only used common

VR technology (an unextended HTC Vive setup). The

presence of the participants was measured for being

the dependent variable. The IGROUP Presence Ques-

tionnaire (IPQ

) was used. It subdivides the presence

into three units: spatial presence, involvement and ex-

perienced realism.

For analyzing the results of the experiment, we

conducted a two sample t-test, by assuming normal-

distribution on the data. Tests on separate aspects of

presence revealed a phenomenon concerning the spa-

tial presence. Three out of ﬁve questions regarding

the spatial presence factor of the IPQ show a signiﬁ-

cant difference in favor to the score of the experimen-

tal group for questions Sp2 (p-value ¡ 0,00001) and

Sp3 (p-value = 0,021028) with p ¡ 0,05 and Sp5 (p-

value = 0,076108) with p ¡ 0,10 (Fig. 4).

Observations and user statements have indicated

four phenomena: Five out of the six experimental

group participants mentioned that the Kinect rig was

uncomfortable to wear, particularly due to the weight

of the rig which was mounted at the head. Two partic-

ipants mentioned the infrequent cut off of the extrem-

ities of the avatars. One participant unexpectedly sig-

naled that it would be comfortable if the video avatar

visualization could be turned on and off by him- or

herself, depending whether help was needed or not.

All six participants of the experimental group explic-

itly expressed that they perceived the video avatar rep-

resentation of the co-located educator helpful.

4.2 Discussion

The results indicate an improvement of the spatial

presence of the participants that were in the experi-

mental group. Most critical of the qualitatively mea-

sured issues (cut off limbs and rig weight) can be at-

tributed to the speciﬁcs of the Kinect camera. This is

http://www.igroup.org/pq/ipq/index.php (June 18,

2018)

a critical aspect of the design, because other cameras

that deliver similar content (matched color and depth

content) have as well similar speciﬁcations. Since we

target a low-cost concept we exclude to use profes-

sional hardware to solve this challenge. A software-

based solution for addressing the differing aspect ratio

and resolution of the Kinect and the HTC Vive, for ex-

ample as a separate module within the system design

that handles multiple cameras, however, is appropri-

ate. The camera itself could alternatively be mounted

at the torso of participants so that the weight of the

camera will not be perceived as focused as on the

head.

As we chose to represent users within the 3D VE

as a 2D textured plane, there surely are alternative

representations that could be of signiﬁcance. The

study, however, indicates, that 2D video avatars of-

fered enough visual and spatial clues for all of our par-

ticipants to perceive corresponding information that

is necessary for mutual visual communication. Since

one participant remarked that the optionality of the

video avatar visualization would be helpful, this phe-

nomenon should as well be considered in future sys-

tem designs. Questionnaire comments and qualitative

observations during the experiment indicate that over-

all, participants were satisﬁed with Avatar2Avatar and

found it particularly helpful for the given collabora-

tive knowledge communication task.

During the implementation of the system there ap-

peared several issues that had to be solved. Since

the camera that simulates the POV is attached above

the real POV (eyes) of the user, there are challenges

of transforming the virtual human texture/objects into

the correct angle and position. This is negligible if

objects are farther away but attract more visual atten-

tion from close up. Furthermore, there exist obvious

visual quality issues, like artifacts, in the avatar tex-

ture (Fig. 3 (e) and (j)). These can be attributed to the

relatively low depth resolution of the Kinect, which is

only 512x424 pixels. While the majority of artifacts

could be eliminated using dilate and erode algorithms

for edge cleaning, the artifacts are accepted in favor

to ensure a stable 30 frames per second rate and low

latency of the compositors. Some artifacts also ex-

ist because the statistical model of the Kinect API’s

human recognition is not arranged to recognize users

with an HMD or even a Kinect rig on the head.

Another issue that exists due to Kinect speciﬁcs is

that the virtual POV augmentation cannot cover the

entire resolution of the HMD. It is calibrated to a part

in the upper middle of the view. Dependent on the

position, extremities of persons could be cut off the

texture. The reason for this is the different aspect ra-

tio of the Kinect image (1920x1080 pixels ∼ 16:9)

HUCAPP 2019 - 3rd International Conference on Human Computer Interaction Theory and Applications

Figure 4: Three items of the IPQ show signiﬁcant difference in favor to score of the experimental group.

and the Vive HMD (one eye with 1080x1200 pixels

∼ 9:10). As usual, a small part is not at all in the view

of a single eye due to parallax of two eyes.

Finally, jittering of the virtual bounding box can

be seen which results in partially cutting body parts of

the human texture within the VE. This jittering occurs

due to interfering infrared rays of the two Kinects,

since they use the same wave length as the HTC Vive

Lighthouses. Depending on the distance to each other

and the orientation, more or less jittering appears.

5 CONCLUSION AND FUTURE

WORK

In this paper we proposed the Avatar2Avatar system

and design, which augments the mutual visual com-

munication between co-located real and virtual envi-

ronments. Non-immersed educators are involved in

the VE by providing an overview about the virtual

scene and all users of the system, including them-

selves. These non-immersed educators therefore get

immersed on a low level which creates a base context

for mutual awareness. In terms of the visual aware-

ness model we provide a focus into the real environ-

ment for immersed learners that utilizes the nimbus

(Benford et al., 1994) – as mentioned in section 1

– of co-located non-immersed educators. Educators

get their already existing focus widened so that they

can simultaneously see both the real persons and the

virtual environment.

While we focused our work speciﬁcally on knowl-

edge communication between users within one phys-

ical room, it is of major interest for us to transfer our

system design to the ﬁeld of telepresence. Therefore,

there will be several changes necessary regarding the

hardware. Especially the simulation of the dynamic

virtual ﬁrst person POV will be challenging. Further

calibration of all devices would be necessary (as e.g.

in (Beck and Froehlich, 2017)), but which is at the

expense of the low-cost and consumer-oriented as-

pect of the setup. In the second room there must be a

hardware construction that moves the camera accord-

ingly to the immersed users POV in the ﬁrst room and

vice versa. Prototyping such a construction and fur-

thermore restricting the set-up to low-cost consumer

hardware could have a signiﬁcant impact on knowl-

edge communication sectors, like e-Learning or dis-

tance learning.

ACKNOWLEDGEMENTS

The work is supported by the Federal Ministry of Ed-

ucation and Research of Germany in the project Hybr-

iT (Funding number:01IS16026A). The work further-

more is minor supported by the Federal Ministry of

Education and Research of Germany in the project In-

novative Hochschule (Funding number: 03IHS071).

REFERENCES

Anderson, T., Liam, R., Garrison, D. R., and Archer, W.

(2001). Assessing teaching presence in a computer

conferencing context.

Barmpoutis, A. (2013). Tensor body: Real-time recon-

struction of the human body and avatar synthesis from

rgb-d. IEEE transactions on cybernetics, 43(5):1347–

1356.

Beck, S. and Froehlich, B. (2017). Sweeping-based vol-

umetric calibration and registration of multiple rgbd-

sensors for 3d capturing systems. In Virtual Reality

(VR), 2017 IEEE, pages 167–176. IEEE.

Benford, S., Bowers, J., Fahl

en, L. E., and Greenhalgh, C.

(1994). Managing mutual awareness in collaborative

virtual environments. In Virtual Reality Software and

Technology, pages 223–236. World Scientiﬁc.

Avatar2Avatar: Augmenting the Mutual Visual Communication between Co-located Real and Virtual Environments

Benford, S. and Fahl

en, L. (1993). A spatial model of in-

teraction in large virtual environments. In Proceed-

ings of the Third European Conference on Computer-

Supported Cooperative Work 13–17 September 1993,

Milan, Italy ECSCW93, pages 109–124. Springer.

Billinghurst, M. and Kato, H. (1999). Collaborative mixed

reality. In Proceedings of the First International Sym-

posium on Mixed Reality, pages 261–284.

Bronack, S., Sanders, R., Cheney, A., Riedl, R., Tashner, J.,

and Matzen, N. (2008). Presence pedagogy: Teaching

and learning in a 3d virtual immersive world. Inter-

national journal of teaching and learning in higher

education, 20(1):59–69.

Garau, M., Slater, M., Vinayagamoorthy, V., Brogni, A.,

Steed, A., and Sasse, M. A. (2003). The impact of

avatar realism and eye gaze control on perceived qual-

ity of communication in a shared immersive virtual

environment. In Proceedings of the SIGCHI confer-

ence on Human factors in computing systems, pages

529–536. ACM.

Greenhalgh, C. and Benford, S. (1995). Massive: a col-

laborative virtual environment for teleconferencing.

ACM Transactions on Computer-Human Interaction

(TOCHI), 2(3):239–261.

Gugenheimer, J., Stemasov, E., Frommel, J., and Rukzio,

E. (2017). Sharevr: Enabling co-located experi-

ences for virtual reality between hmd and non-hmd

users. In Proceedings of the 2017 CHI Conference on

Human Factors in Computing Systems, pages 4021–

4033. ACM.

Huang, Y. and Kallmann, M. (2009). Interactive demon-

stration of pointing gestures for virtual trainers. In

International Conference on Human-Computer Inter-

action, pages 178–187. Springer.

opfel, A., Gr

one, B., and Tabeling, P. (2005). Funda-

mental modeling concepts. Effective Communication

of IT Systems, England.

Kotranza, A., Lok, B., Pugh, C. M., and Lind, D. S. (2009).

Virtual humans that touch back: enhancing nonverbal

communication with virtual humans through bidirec-

tional touch. In Virtual Reality Conference, 2009. VR

2009. IEEE, pages 175–178. IEEE.

Latoschik, M. E., Roth, D., Gall, D., Achenbach, J., Walte-

mate, T., and Botsch, M. (2017). The effect of avatar

realism in immersive social virtual realities. In Pro-

ceedings of the 23rd ACM Symposium on Virtual Re-

ality Software and Technology, page 39. ACM.

Lee, S.-Y., Ahn, S. C., Kim, H.-G., and Lim, M. (2006).

Real-time 3d video avatar in mixed reality: An imple-

mentation for immersive telecommunication. Simula-

tion & gaming, 37(4):491–506.

Lok, B., Chuah, J. H., Robb, A., Cordar, A., Lampotang,

S., Wendling, A., and White, C. (2014). Mixed-reality

humans for team training. IEEE Computer Graphics

and Applications, 34(3):72–75.

MacKenzie, I. S. (2012). Human-computer interaction: An

empirical research perspective. Newnes.

Mehrabian, A. and Ferris, S. R. (1967). Inference of atti-

tudes from nonverbal communication in two channels.

Journal of consulting psychology, 31(3):248.

Milgram, P., Takemura, H., Utsumi, A., and Kishino, F.

(1995). Augmented reality: A class of displays on the

reality-virtuality continuum. In Telemanipulator and

telepresence technologies, volume 2351, pages 282–

293. International Society for Optics and Photonics.

Nordby, K., Gernez, E., and Børresen, S. (2016). Efﬁcient

use of virtual and mixed reality in conceptual design

of maritime work places. In 15th International Con-

ference on Computer and IT Applications in the Mar-

itime Industries-COMPIT’16.

Regenbrecht, H., Meng, K., Reepen, A., Beck, S., and Lan-

glotz, T. (2017). Mixed voxel reality: Presence and

embodiment in low ﬁdelity, visually coherent, mixed

reality environments. In Mixed and Augmented Real-

ity (ISMAR), 2017 IEEE International Symposium on,

pages 90–99. IEEE.

Rodgers, C. R. and Raider-Roth, M. B. (2006). Presence in

teaching. Teachers and Teaching: theory and practice,

12(3):265–287.

Roo, J. S. and Hachet, M. (2017). One reality: Augmenting

how the physical world is experienced by combining

multiple mixed reality modalities. In Proceedings of

the 30th Annual ACM Symposium on User Interface

Software and Technology, pages 787–795. ACM.

Roth, D., Lugrin, J.-L., Galakhov, D., Hofmann, A., Bente,

G., Latoschik, M. E., and Fuhrmann, A. (2016).

Avatar realism and social interaction quality in vir-

tual reality. In Virtual Reality (VR), 2016 IEEE, pages

277–278. IEEE.

Schubert, T., Friedmann, F., and Regenbrecht, H. (2001).

The experience of presence: Factor analytic insights.

Presence: Teleoperators & Virtual Environments,

10(3):266–281.

Slater, M., Spanlang, B., Sanchez-Vives, M. V., and Blanke,

O. (2010). First person experience of body transfer in

virtual reality. PloS one, 5(5):e10564.

Sousa, M., Mendes, D., Anjos, R. K. D., Medeiros, D.,

Ferreira, A., Raposo, A., Pereira, J. M., and Jorge, J.

(2017). Creepy tracker toolkit for context-aware inter-

faces. In Proceedings of the 2017 ACM International

Conference on Interactive Surfaces and Spaces, pages

191–200. ACM.

Zhu, Y., Zhu, K., Fu, Q., Chen, X., Gong, H., and Yu,

J. (2016). Save: shared augmented virtual environ-

ment for real-time mixed reality applications. In Pro-

ceedings of the 15th ACM SIGGRAPH Conference

on Virtual-Reality Continuum and Its Applications in

Industry-Volume 1, pages 13–21. ACM.

HUCAPP 2019 - 3rd International Conference on Human Computer Interaction Theory and Applications