A KINECT-BASED AUGMENTED REALITY SYSTEM FOR

INDIVIDUALS WITH AUTISM SPECTRUM DISORDERS

Xavier Casas, Gerardo Herrera, Inmaculada Coma and Marcos Fernández

Institute of Robotics, University of Valencia, Valencia, Spain

Keywords: Augmented Reality, Motion Capture Systems, Autism Spectrum Disorders.

Abstract: In this paper an Augmented Reality system for teaching key developmental abilities for individuals with

ASD is described. The system has been designed as an augmented mirror where users can see themselves in

a mirror world augmented with virtual objects. Regarding the content the tool is designed with the aim of

facilitating the acquisition of certain skills in children with ASD. From a technical point of view, it has been

necessary to solve two problems to develop the tool. The first one has been how to capture the user’s

movements, without wearing invasive devices, to animate their virtual avatar. A Kinect device has been

chosen. The other problem has been to mix in the augmented virtual scene real objects located at different

depths. An OSG shader has been developed, of which details will given in this paper.

1 INTRODUCTION

Autism is a neuro-developmental disorder, and

individuals with Autism Spectrum Disorder (ASD)

have specific difficulties developing verbal and non-

verbal communication. They also have difficulties

socializing, a limited capacity for understanding

mental states, and a lack of flexibility of thinking

and behaving, especially when “symbolic play” is

involved.

While Virtual Reality has proven effective in

some people with autism (Stricklan, 1997; Herrera,

2008), there are still some more severely affected

individuals who are unable to generalize their

learning to new situations. In this sense, Augmented

Reality is directly related to reality and mixes

aspects of reality with computer-generated

information, so it does not require as much capacity

for abstraction as VR and people with autism who

do not have abstraction capacity could benefit from

their use (Herrera, 2006).

With regard to AR systems, researchers have

proposed solutions in many different application

domains. Some of them include applications

intended for young children (Berry, 2006;

Kerawalla, 2006), but a few of them are focused on

disabled children (Richard, 2007).

In this paper we present “Pictogram room”, a

project developed by the Orange Foundation that

intends to apply the latest technological advances of

Augmented Reality to benefit people with autism.

The goal of the project is to help the child to

improve critical abilities for his development. More

specifically, the aim is to teach the individual with

autism about self-awareness, body schema and

postures, communication and imitation by means of

an A.R. system.

Compared to other AR systems for individuals

with ASD, this application takes the users’ body

movements as a method for interaction. Thus,

moving their own body, children can play with the

AR system. The purpose of the application is to

teach the individual with ASD to develop his/her

body schema and, after this, to go through a series of

playful educational activities related to body

postures, communication and imitation. These

games show pictorial representations of various

elements, including avatars representing the child

and the teacher.

Therefore, our first task was to study existing

MoCap technologies used to control virtual avatars

with user’s movements, looking for the one that fits

better with our application, as described in section

2.Section 3 describes the system modules and how

an augmented scene has been created by using

Kinect data, and the implementation of an OSG

shader that allows you to have virtual objects with

real ones mixed at different depths is also described.

Finally, testing results and conclusions are detailed.

440

Casas X., Herrera G., Coma I. and Fernández M..

A KINECT-BASED AUGMENTED REALITY SYSTEM FOR INDIVIDUALS WITH AUTISM SPECTRUM DISORDERS.

DOI: 10.5220/0003844204400446

In Proceedings of the International Conference on Computer Graphics Theory and Applications (GRAPP-2012), pages 440-446

ISBN: 978-989-8565-02-0

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

2 STATE OF THE ART AND

REQUIREMENTS.

Although traditionally MoCap systems for moving

virtual actors have been used in Virtual Reality

applications, with the growth of Augmented Reality

(AR) applications, these systems are used to control

a user in the real-world interacting with virtual

objects in the augmented scene.

One of the earliest examples found is the ALIVE

system (Maes, 1997), where a video camera captures

images of a person in order to detect the user’s

movements but also to integrate the real image onto

a virtual world. This can be considered a precursor

of AR, where video images are integrated with 3D

objects (like a dog) that are activated by the user’s

movements.

Several AR systems make use of markers

(usually geometric figures) to detect positions

(Mulloni, 2009). However, these systems do not

provide enough accuracy to move a virtual character

and all his joints. Sometimes optical systems with

markers (Dorfmuller, 1999) are used to detect

positions but, in other cases, more complex systems

are used that are based on ultrasound or inertial

sensors that detect movements in a wide range of use

(Foxlin,1998;Vlasic, 2007).

Regarding our system requirements, it is

necessary to track the positions of different body

parts in order to move an avatar representing the

user. Moreover, taking into account the users to

whom it is directed, who frequently experience

sensory difficulties (Bogdashina, 2003) it is better

that users do not wear complex devices. Thus,

mechanical and electromagnetic and other invasive

devices have been discarded from the beginning.

The release of the PrimeSense OpenNI for

programming Kinect in December 2010, opened up

new possibilities for us. Kinect incorporates an RGB

camera (640x480 pixels at 30Hz) and a depth sensor.

This is especially suitable to be used in an A.R.

system where images from the real word are needed.

The depth sensor provides distance information

which is useful to create the augmented scene

placing objects correctly.

Some researchers have started to use it in their

applications to track people (Kimber, 2011), but

there are no references of the use of this device in

Augmented Reality applications to control avatars.

The PrimeSense OpenNI (OpenNI, 2011)

provides information about positions and

orientations of a number of skeleton joints, which

can be used to control our virtual puppet. The first

tests performed with this system were satisfactory

regarding motion capture. However, it has a negative

aspect: the calibration requires the user to remain

still for a few seconds in a certain posture in front of

the camera. As the system is intended to train the

individual with autism to match postures, as one of

the final educational objectives, it makes no sense

that he/she has to be able to copy a posture in order

to enter the game. With the release of Microsoft’s

SDK the calibration problem was solved, this made

us choose it as a solution for our system.

Thus, we have got a system that captures user

positions in varying lighting conditions and without

dress requirements or markers.

3 SYSTEM DESCRIPTION

After comparing the pros and cons of different

motion capture devices a Pictogram Room using

Kinect has been made. This system consists of a

visualization screen measuring 3 x 2 meters, a

projection or retro-projection system (depending on

the room where it is to be installed), a PC, a Kinect

device and speakers. Kinect is equipped with two

cameras, an infrared camera and a video camera

with a 640x480 pixel resolution, and a capture rate

of 30 fps. So using Kinect it is possible to obtain not

only a standard video stream, but also a stream of

depth-images.

On the screen, images captured by Kinect are

displayed mixed with virtual information, creating

for users an augmented mirror where they can see

themselves integrated onto the augmented scene.

The system is designed to be used by users

playing with two different roles: child and educator.

Each user is represented by a virtual puppet colored

differently. Both users are located next to each other

and they are tracked in the same space.

The task of the educator is to select exercises and

activities that will be developed by the child, and

after that to give him/her appropriate explanations.

To achieve this, the teacher is provided with a menu

system displayed on the screen which can be

accessed by using the hand as a pointer. Once an

exercise is selected, the child’s and teacher’s

movements are captured and used as the interaction

interface with the system.

This system has been implemented by creating a

set of subsystems that deal with different tasks. On

the one hand, the input system allows you to choose

activities and capture the user’s actions and

movements. Depending on the activity, and the

user’s actions, the output system creates an

augmented environment by integrating images from

A KINECT-BASED AUGMENTED REALITY SYSTEM FOR INDIVIDUALS WITH AUTISM SPECTRUM

DISORDERS

441

the real world, video, sound and virtual elements.

Finally, there is educational software to edit and

manage exercises. Let’s look at these three

subsystems in more detail.

3.1 Input System

The input systems task is to collect information and

it can work in two different modes. On the one hand,

edit mode is intended to configure the system and to

allow the educator to select educational activities.

On the other hand, in play mode a child will develop

the proposed activities. In both modes of

functioning, there can be one or two users whose

actions are detected by the system.

Kinect provides information regarding the

positions of a number of user’s joints. These

positions are captured and transformed into positions

and rotations applied to virtual skeletons.

In edit mode, Kinect provides the hand position

which serves as an input pointer to select menus and

push buttons.

In play mode, the user interacts with the system

by moving his/her body and uses it for advancing

throughout the series of educational activities.

3.2 Output System

This module is responsible for creating an

augmented scene, taking input data coming from the

capture devices and control messages sent by the

exercise module.

In the augmented environment data coming from

several sources are integrated: video images from

the real world, virtual puppets representing users,

virtual objects, recorded videos and music. Thus,

users can see themselves in an augmented mirror

projected in front of them, and they can interact with

virtual objects.

Virtual avatars are controlled by means of joint

positions acquired by the input system. As one of

systems goals is to teach children to understand

pictograms, these avatars have been designed as

stick figures, and they are painted in the virtual

world over user images. The augmented scene is

created with OpenSceneGraph (OSG).

Depending on the exercise, bi-dimensional

pictograms or even tri-dimensional objects can

appear in the A.R. scene, and they can be located at

different positions, even behind the user. With OSG

it is easy to draw video images and virtual objects

displayed over the video.

Nevertheless, if a virtual object must be placed at

a certain depth, two things are needed: the z-position

of the users in the real world, and a mechanism to

mix the real and virtual objects while maintaining

the depth where they are located.

The problem of obtaining z-positions of users

and objects in the real world is solved by Kinect. In

addition to video images Kinect provides a depth

map, which is a matrix where each pixel’s depth is

stored. With this map it is possible to create a

drawing process maintaining the proper occlusion

between real and virtual elements through the z-

buffer.

The steps in this process (fig 1) are as follows:

1. Draw the image captured by the camera. This

image reflects the real scene.

2. With the depth information obtained by the

Kinect camera, it is possible to compute the z-

buffer values related to each pixel of the real

scene. This information will allow real objects

to be placed in their proper position within the

virtual world. To optimize the process a shader

has been implemented, so operations are

performed on the graphics card improving the

process velocity.

3. Virtual objects are drawn and mixed with the

real ones. To achieve this, a virtual camera must

be configured with the same parameters as the

real one. At this point, when drawing a virtual

object, the z-buffer data written previously

allows only those parts of the virtual objects that

are in front of the real ones to be drawn.

Music Engine

Music therapy has been proved very useful for

training individual with autism (Braithwaite, 1998).

For this reason, in addition to the visual information,

a music engine to generate musical sounds in the

exercises has been made use of. This music engine

can generate MIDI data in real time. In addition,

MIDI stored files can also be modulated, so

children’s favorite songs can be loaded. With the

information provided by Kinect about position and

momentum, tempo and song volume can be

modulated.

3.3 Educational Software

To configure the activities a management exercises

module has been created. This module contains a set

of classes to represent the different elements

involved: virtual puppets, video images, sounds,

virtual objects and pictograms and are described in

an XML file.

The exercises are shown to the teacher through

an interface where he/she can select the activities

(see Figure 2).

GRAPP 2012 - International Conference on Computer Graphics Theory and Applications

442

Figure 1: Scene composition using depth information.

Figure 2: Activities selection menus.

The activities are designed with the aim of

facilitating the acquisition of certain skills in ASD

children and the contents are classified into four

sections with 20 activities each one, with two of

them included in the first public version of our

software: body and postures.

In the learning about body section, the activities

are intended to develop the correspondence between

oneself and the avatar schematic figure, to identify

one own image, to discriminate between oneself and

others and to identify different body parts. This has

been developed overlaying the virtual avatar on the

user’s real image with different colors for each one.

Some of the activities designed involve users

taking pictures of themselves and using these

pictures later in the activity. Others consist of users

touching pictograms that appear on the screen with

different body parts (head, arms) (see Figure 3a).

Other activities are designed to perform body

movements. These movements are accompanied by

musical notes dynamically generated by the MIDI

engine, videos or lights appearing in the augmented

scene always in a playful way.

The body postures section includes activities to

Figure 3: (a) Two users touching pictograms (b) Shadows

guiding user.

A KINECT-BASED AUGMENTED REALITY SYSTEM FOR INDIVIDUALS WITH AUTISM SPECTRUM

DISORDERS

443

train the user to match postures. For the design of

these activities, in addition to the puppet

representing the user, a shadow that guides the user

has been created. This shadow is a grey colored stick

figure and in some exercises it guides user’s

movements (See figure 3b).

Other times, instead of a full avatar, there are

only certain parts of the body represented in the AR

scene. This strategy is used to focus attention on

those parts of the body. Some activities also make

use of clipped images with a body outline in a

posture that the user must imitate (see Figure 4).

Figure 4: Clipped images.

3.4 Interacting with the A. R. World

The system supports two users: the teacher and the

student, who are represented in the augmented world

by different virtual puppets. The activities are

designed to be played by a child following the

teacher’s instructions and in some cases in a

collaborative way. Moreover, users can enter and

leave the scene at any time, being detected by the

system without previous calibration.

In edit mode the teacher is responsible for

selecting exercises to be executed. This can be done

by means of a 2D menu system created with

OpenSceneGraph and displayed on the screen (see

Figure 2). To select menu options the teacher uses

his/her hand to move a pointer represented with a

hand icon. A button or menu option is activated

when the hand is placed on it for a few seconds. This

interaction with the 2D menu can also be done with

a standard mouse.

Regarding the interaction in play mode, the

exercises are designed to use body movements as a

form of interaction. Thus, there are exercises that

involve activities such as moving to different areas

of the room, peering into the holes of the images or

moving one’s body into the indicated positions.

In this mode there are also two options to

visualize the augmented world. One is showing the

video images with the virtual objects superimposed

on the display, and the other is eliminating the

background and the user images and showing only

the virtual figures and objects.

4 EVALUATION

In addition to functional testing, a set of assessment

tests with users have been carried out.

The first tests consisted of a system assessment

with 22 typically developed children, not having

autism. These children were aged between 3 and 4.

For these tests, the system was installed in a

school using a projection screen, a computer,

speakers and a Kinect device. At the time the tests

were conducted, Microsoft SDK was not yet

released so they were done with OpenNI. As we

have described in section 3, this library needs

previous calibration, requiring the users to remain

still in a certain position for several seconds (see

Figure 5).

Figure 5: a) System calibration. b) Two users with their

puppets.

Thus, as a first step, the children were asked to

stay still for a few seconds in the calibration

position. Then, the teacher gave explanations about

the activities to be performed. Each child played for

around 15 minutes, and he/she performed four

different activities related to movements, touching

objects or imitating postures.

When an activity had been completed the

educator noted whether the child did it without

support (3 points), with verbal (2 points) or physical

(1 point) assistance from the educator, or was not

able to do it (zero points). These tests with typically

GRAPP 2012 - International Conference on Computer Graphics Theory and Applications

444

developed children are useful to obtain reference

values for each activity. With this normative

information, each child with autism that uses the

application can be compared with these reference

values in order to have an idea of which skills have

developed more and which have to be improved.

During the development of the activities the only

problem that arose was the loss of calibration when a

child went outside the system and re-entered. With

regard to the activities, all the children were able to

do them, although sometimes further explanations

were necessary.

Following this, tests were performed in the same

school with children with ASD. The tests were

conducted with 5 children. One of them did not do

any of the activities and could not even manage to

do the posture calibration. Another child managed to

do the calibration and began to play, but after a few

minutes he stopped doing the activity and would not

continue. The remaining three calibrated and played,

some showing more skill than others. One of them

played like the typically developed children (see

Figure 6)

In future studies more test will be carried out

with a great number of children with autism.

Moreover, all the activities will be tested to find out

if the activities are properly designed or if some

aspects must be changed.

With regard to the calibration problems, as it was

mentioned above, these have been solved with the

use of Microsoft SDK which does not require

calibration (i.e. it is not necessary for the child to

match a given posture before entering the game).

Figure 6: System evaluation.

5 CONCLUSIONS

In this paper we have presented a system intended to

improve the development of children with autism.

The system uses Augmented Reality technologies.

The users are represented by means of virtual

stick figures which have been implemented with a

motion capture system that meets the two main

requirements; it is a low cost system, and it does not

require users to wear any device.

After the initial steps to test the feasibility of the

system on a CAVE, the motion capture was

developed based on vision algorithms for motion

detection. Finally, the release of the Kinect SDK

gave us a tool appropriate to our needs. From the

technical point of view, the problem of integrating

objects at different depths in the virtual world has

been solved by programming an OSG shader that

uses the depth map provided by Kinect.

With regard to the contents, a wide range of

activities have been developed. These activities

include images, videos, sounds and virtual objects

integrated in the augmented world where children

can play.

Therefore, the possibility of using Kinect to

develop this type of application aimed at children

with autism has been shown.

REFERENCES

Berry R., Makino, M., Hikawa, N., Suzuki, M., Inoue, N.,

2006. Tunes on the table. Multimedia Systems, 11 (3):

280-289.

Bogdashina, O. 2003. Sensory Perceptual Issues in Autism

and Asperger Syndrome Different Sensory

Experiences Different Perceptual Worlds. Jessica

Kingsley Publishers: London UK.

Braithwaite, M., & Sigafoos, J. 1998. Effects of social

versus musical antecedents on communication

responsiveness in five children with developmental

disabilities. Journal of Music Therapy, 35(2), 88–104.

Dorfmüller, K., 1999. Robust tracking for augmented

reality using retroreflective markers. Computer &

Graphics. Volume, 23, 6, 795-800.

Foxlin, E., Harrington, M., Pfeifer, G., 1998.

Constellation: A Wide-Range Wireless Tracking

System for Augmented Reality and virtual set

applications. Proceedings of SIGGRAPH’98.

Vlasic D. et al., 2007. Practical motion capture in

everyday surroundings. ACM Transactions on

Graphics, Vol. 26, 3.

Herrera, G; Jordan, R; Gimeno, J, 2006. Exploring the

advantages of Augmented Reality for Intervention in

ASD. Proceedings of the World Autism Congress,

Southafrica.

Herrera, G., Alacantud, F., Jordan, R., Blanquer, A.,

Labajo, G., 2008 Development of symbolic play

through the use of virtual reality tools in children with

autistic spectrum disorders SAGE Publications and

The National Autistic Society, Vol 12(2) 143–157.

Kerawalla, L., Luckin, R., Seljeflot, S., Woolard, A.,

A KINECT-BASED AUGMENTED REALITY SYSTEM FOR INDIVIDUALS WITH AUTISM SPECTRUM

DISORDERS

445

2006. Making it real: exploring the potential of

augmented reality for teaching primary school science.

Virtual Reality, 10, 163-174.

Kimber, D., Vaughan, J., Rieffel, E., 2011. Augmented

Perception through Mirror Worlds. Conference

AH’11.

Maes, P., Darrell, T., Blumberg, B., Pentland, A., 1997.

The ALIVE system: wireless, full-body interaction

with autonomous agents. Multimedia Systems 5: 105–

112.

Mulloni, A. 2009. Indoor Positioning and Navigation with

Camera Phones. Pervasive Computing, 8, 2, 22-31.

OpenNI TM. http://www.openni.org/ Visited: 2011.

Richard, E., 2007 Augmented Reality for Rehabilitation of

Cognitive Disabled Children: A Preliminary Study.

Virtual Rehabilitation. pp.102-108.

Strickland, D. 1997. Virtual reality for the treatment of

autism. In Virtual Reality in Neuro-PsychoPhysiology.

G. Riva, Ed. IOS press. Chapter 5, pp. 81-86.

GRAPP 2012 - International Conference on Computer Graphics Theory and Applications

446