A GAME PLAYING ROBOT THAT CAN LEARN A TACTICAL

KNOWLEDGE THROUGH INTERACTING WITH A HUMAN

Raafat Mahmoud, Atsushi Ueno and Shoji Tatsumi

Department of Physical Electronics & Informatics, Osaka City University, Osaka 558-8585, Japan

Keywords: Humanoid robot, Game strategy, Learning from observation, Structured interview, Long-term memory,

Sensory memory, Working memory.

Abstract: We propose a new approach for teaching a humanoid-robot a task online without pre-set data provided in

advance. In our approach, human acts as a collaborator and also as a teacher. The proposed approach

enables the humanoid-robot to learn a task through multi-component interactive architecture. The

components are designed with the respect to human methodology for learning a task through empirical

interactions. For efficient performance, the components are isolated within one single API. Our approach

can be divided into five main roles: perception, representation, state/knowledge-up-dating, decision making

and expression. A conducted empirical experiment for the proposed approach is to be done by teaching a

Fujitsu’s humanoid-robot "Hoap-3" an X-O game strategy and its results are to be done and explained.

Important component such as observation, structured interview, knowledge integration and decision making

are described for teaching the robot the game strategy while conducting the experiment.

1 INTRODUCTION

Learning from the environment through interaction

is a skill well mastered by human beings. Humans

adopt their learning algorithms according to the task

in which they wish to learn. For example, learning

how to drive cars is different from learning how to

play chess. If we are learning chess through

empirical teaching class, the teacher and the

collaborator is only one person. The learner plays

following naive game theories at the early stages of

learning procedure. In order to improve these naive

theories the collaborator performs an interruption to

the game events through various forms according the

situation. On the other hand, the learner needs to

understand the context of such interrupted situations

in order to update his knowledge, and make use of it

whenever needed. Therefore, the learner starts

expressing his misunderstanding through various

multi-modals interactions. The interaction between

the learner and his teacher improves the learner

understanding level about these situations. In these

cases, the learner’s brain processes these situations

to store certain information about these situations,

which improves the learner’s naive theories of the

game. Theoretically based on many cognitive

researches, human learning is assumed to be

“storage of automated schema in long-term memory

of human brain” (Sweller, J 2006). Schema is

chunks of multiple individual units of memory that

are linked into a system of understanding (

Bransford, J., Brown, A., 2001 ). However, the

learner’s brain performs many processes to the input

data at different places, such as short-term memory

and working memory (Baddeley, A. D 1996). Then a

certain extracted data is stored at its long-term

memory. The short-term memory is assumed as the

place where experiencing any aspect of the world.

Working memory is a place where thinking gets

done. It is actually more brain function than a

location. The working memory is dual coded with a

buffer for storage of verbal/text elements, and a

second buffer for visual/spatial elements (Marois, R.

2005). The main function of Long-term is storing the

learned data as a schema to make use of it when ever

needed, without the need of learning the subject

from the first steps again. The main phases of the

learner’s brain in learning such a task are

observation of the task sequences, interviewing

about what we do not understand, recording the

concluded data from observation and interviewing,

process tracking and making use of the recorded

data to support a hypothetical scenarios for making

decisions. In this paper, we describe an architecture

through which a human teacher teaches a humanoid-

robot "Hoap-3" the game strategy through an

interaction algorithm which humans follow in order

609

Mahmoud R., Ueno A. and Tatsumi S..

A GAME PLAYING ROBOT THAT CAN LEARN A TACTICAL KNOWLEDGE THROUGH INTERACTING WITH A HUMAN.

DOI: 10.5220/0003182406090616

In Proceedings of the 3rd International Conference on Agents and Artiﬁcial Intelligence (ICAART-2011), pages 609-616

ISBN: 978-989-8425-40-9

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

to learn a task for the first time. Many other

architectures for teaching a robot by demonstration

were introduced ( Kuniyoshi. Y., M. Inaba. M, and

Inoue. H. 1994)(Voyles. R and Khosla P. 1998).

However such approaches use demonstrations in

order to optimize a predefined goal, and also the

interactive behaviours follow human-machine

(Reeves, B. & Nass, C. 1996) interaction, but does

not follow human-human interaction, which come

out of the strict paradigm that robots are following.

A tutelage and socially guided approach for teaching

the humanoid robot "Leo" a task (Lockerd A.,

Breazeal C. 2004) was proposed, where machine

learning problem is framed into collaborative

dialogue between the human teacher and the robot

learner, however every task has a specific single

goal. In our approach making decisions is based on

accumulative learning that "Hoap-3" gains while

interaction, additionally adaptive selections

behaviour for each new situation in order to achieve

individual goals based on the accumulative learning

information. As an architecture about learning and

interacting in human-robot domain and task learning

through imitation and human-robot interaction

(Nicolescu M. N., Mataric M. J 2001), a behaviour

based (Barry Brian Werger, 2000) interactive

architecture applied to a Pioneer 2-DX mobile robot

is proposed. In these approaches the behaviours are

mainly built from two components, abstract

behaviour and primitive behaviours. However these

two architectures are not suitable and flexible

enough to be applied for teaching a robot various

tasks through interaction. Also this method in

various forms has been applied to robot-learning for

different single-task such as hexapod walking (Maes

P. and brooks R. A. 1990), and box-pushing

(Mahadevan S. and Connell J. 1991). Many other

single task navigation and human-robot instructive

navigation (Lauria S., Bugmann G., 2002) have been

proposed. In our proposed architecture there is no

data provided in advance, and the goals of a task are

being taught while interactions.

Figure 1: The robot interacts with its human teacher.

Moreover, the interactive behaviours are

resembled to those of human’s behaviours while

learning. In this paper, the main features of our

architecture and the developed behaviours are

explained at the following sections. Following this

section, the internal system structure is explained.

Then decision making process is explained. At the

last two sections, testing our architecture and results

from an experiment are explained. This is followed

by discussions about our architecture.

2 OUR ARCHITECTURE

In our approach we use an upper torso of a

humanoid robot "Hoap-3", which has a total 28

degree of freedom (DOFs), 6 flexibility degrees in

each arm, and other 6 flexibility in each leg, 3

flexibility degrees in the head, one degree in the

body (see figure 1).

In order to provide an interactive learning

behaviour, the architecture must be flexible. This

improves the internal processing strategy between

the architecture components, which enables the

robot to recognize and identify its environment

correctly, which in return, improves the efficiency of

mapping between the robot expression components.

This flexible system provides perfect interactive

behaviours in response to its environment changes.

To achieve such an aim, we designate architecture

which composed of multi-components within a

single API root layer (see figure 2).

Our architecture has components requiring

information from the system, such as environment

handling, knowledge updating and expressions, and

other components which provide information to the

system, such as streaming information from the

environment through a vision sensor. In addition, it

has intermediaries components that provide the

necessary information within the system. In our

design we isolated these components from each

other. A proper combination of these components

can perform fair specialized behaviours. The next

subsection will describe and explain such interactive

behaviours.

2.1 Interactive Expressions

The expressions performed by the robot must be

influenced by the task at which the robot is

interacting with, and of-course in addition to the

internal final information that resulted from

processing the task events. However, behaviours

performed by the robot are mainly low level-

behaviours.

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

610

Figure 2: Hoap-3 mind-map.

Main_Loop() // Where the work gets done

{

Do-Processing (): feature detection-

observation,confirmation/representation,

state/knowledge, decision-making.etc.

}

-------------------------

Void Vision_sensor_component

{

While: true

Stream into the main processing loop

}

------------------------

Motor_expression(source, target)

{

While: true

Get-from-main-process-loop (source,

target)

Set trajectory

Perform-low-level-interface

}

Figure 3: Main components of our architecture.

The low-level robotic behaviours mainly include

processing the streamed signals from the sensors,

and performing low level interface to the robot

actuators. A proper combination of such individual

low-level behaviours enables the robot to produce a

higher level behaviour. The robot expression may be

motor-expression Μ or utterance-expressions U.

Μ = { m

, m

, … , m

}

U ={ u

, u

,…..…, u

}

(1)

Motor-expression is a component that provides the

mapping from high-level commands to low-level

motor commands that are physically realized as

high-level behaviours while executing a task. It

consists of collection of trajectory motor angle

algorithms m

at any step of the task in low-level

parameters that provide a task execution to be done.

The utterance-expression is a collection of

individual spoken words u

at any step of the task. A

proper combination of the individual words u

produces high-level interactive behaviours while

interacting with the robot. In our architecture, since

the other components have the knowledge of

contextual state information, the developer

responsible for utterance-expressions does not need

to worry about this contextual information.

In the approach of learning and playing a game

such as X-O game, the robot must have the

capability of moving an object from one place to

another. The source and the target of an object are

specified by decision-making process. However, a

source of an object at a situation may become a

target at another situation. Therefore, and in order to

avoid the duplicating the software code and inverse

kinematics calculations, a software plug-in is to

A GAME PLAYING ROBOT THAT CAN LEARN A TACTICAL KNOWLEDGE THROUGH INTERACTING WITH A

HUMAN

611

override the system and handle such a conflict by

controlling the robot gripper. Mapping and selecting

the motor-expression Μ or utterance-expressions U

are made by the Decision-making (will be explained

later) based on the information stored at the long-

term memory; also the conclusion depends on the

environment events computed by the perception

component. In the next section we will describe the

internal components strategy in processing the

information while teaching the robot a task.

3 SYSTEM ARCHITECTURE

In this section, to show a theoretical instantiation of

the architecture and procedure of interacting with a

robot, X-O game developed between a human

partner and "Hoap-3" is referred. X-O game consists

of 3*3 square board, and six game pieces. Three for

a human partner (h square parts and indexed as 2

while processing) and the other three are for the

robot ( r round parts and indexed as 15 while

processing). Human plays first, and then the robot

plays (see figure 1). Winning is achieved when one

of the players assembles a complete line (row or

column). Human partner should play only one of his

own game pieces at his game turn, and then prompts

the robot to play, and it is the same for the robot. It

only should play one of its game pieces, and then

prompts its human partner to play. We divided the

X-O game into stages, when one of the players wins,

or the robot learns a new idea; a new stage starts

from the home position all over again. While

playing, the robot indexes every step played by the

indexer (see figure 2); and records it as a variable

named no_of_steps_played, also indexes every stage

and records it as a variable named no_of_stage. A

new stage starts if no_of_steps_played is reset to

one. no_of_stage is increased by one while the game

going. In the approach of teaching a humanoid robot

the game strategy, we aim to teach the robot a high

level behaviour performed by a human partner.

The final information from low-level signals of the

vision sensors results in a matrix Χ(h,r) that includes

the number of the human game pieces h, and the

robot game pieces r at every game step as shown in

figure 4.

2 2 2

0 0 0

15 15 15

Figure 4: The final matrix resulted from vision sensors

processing (home position matrix).

While playing, this matrix X is stored as archive data

Φ as in equation (3)(see figure 2).

Φ = {X

(

)

(

)

(

)

…..

}, (2)

where N is the index number (no_of_steps_played,

no_of_stage). Another place to store matrix X is the

sensory memory as in human brain, however, in the

sensory memory register λ, we only store a single

piece of data X and replace every game step, as

explained in equation (3).

λ= X

(3)

Where X is piece of data that denotes the game

matrix, however n denotes to the index coordinate

(no_of_steps_played - 1, no_of_stage) of the game

step.

At every game step, human teacher performs

action δ. These actions are general actions or have a

specific purpose ğ as equation 4 shows.

δ= { δ│ğ

│

, δ│ğ

│

δ│ğ

│

…. .

δ│ğ

│

} (4)

The preceding equation δ denotes the action made

by human teacher along task playing and teaching

(general action), and δ│ğ denotes to the actions such

that a specific purpose ğ is achieved at game step N.

In our task we have specific goals, such as teaching

the robot a winning or defence movements

(is_winning or is_defence) for the human teacher or

for the robot (my_wining, my_defence) as explained

in the next equation(5) which denotes that, for every

special action δ a specific purposes ğ

at N index.

ğ│

= {is_winning, my_wining , is_defence my_defence}

(5)

3.1 Observation of Human Behaviour

In many proposed approaches (Brian S, Gonzalez J.

2008), templates were provided in advance in order

to assist the system to recognize the context an

action. Also in another proposed approach

(Mahnmoud, R. A., Ueno. A., Tatsumi, S., 2008), a

knowledge data are provided in advance. However,

in our architecture we extract the individual low-

level behaviour context which leads the robot to the

high-level behaviour learning by applying the

following algorithm.

Starting from low-level processing, at which the

robot is able to identify the game pieces coordinates

according the 2-D camera frame and obtain the

game matrix Χ(h,r). This contains the three pieces of

the human teacher h and the other three pieces for

the robot r. The robot should have the ability to

recognize the high-level behaviour performed by

human teacher. To do so, the observation component

in our architecture performs the following processes.

First the human game pieces are replaced with zeros,

which results in a matrix X׀

N- Hoap-3"-Part

that includes

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

612

only the robot game pieces as follows;

X׀

N- Hoap-3"-Part

׀=0

, r)

Then the robot’s game pieces are replaced ones in

which X

becomes a logical matrix X׀

N-"Hoap-3"-Part

and includes only the spatial coordinates of robot’s

game pieces at N step as follows;

X׀

N-"Hoap-3"-Part

׀=0

, r

׀ =1

)

On the other hand the same processes are performed

to the same matrix Χ(h,r) but for the teacher’s game

pieces, and produces a logical matrix X׀

N-Teacher-Part

which includes the spatial coordinates of the

teacher’s game pieces at the same N, as follows;

X׀

N-Teacher-Part

׀=1

, r

)

Also the same processes are being performed to the

data λ= X

in equation (3) resulting two matrixes,

the first one is logical matrix X

׀n-"Hoap-3"-Part

, includes

only the spatial coordinates of robot’s game pieces

at n step, and another logical matrix X׀

n-Teacher-Part

and includes only the spatial coordinates of teacher’s

game pieces at the same step n as follows;

X׀

n-"Hoap-3"-Part

(h׀

, r ׀

)

and

X׀

n-Teacher-Part

(h׀

, r ׀

)

In order to obtain the context of the low level

behaviour performed to the task is compare the both

the data in X

and X

in a special manner using Ex-

or logic gate as in the syntax followed in the two

equations (6)(7);

Ð׀

Teacher-Part

= (X׀

N-Teacher-Part

(h׀

, r ׀

) EX-OR

X׀

n-Teacher-Part

(h׀

,r ׀

)) (6)

and

Ð׀

"Hoap-3"-Part

= (X׀

N-"Hoap-3"-Part

(h׀

,r ׀

) EX-OR

X׀

n-"Hoa

-3"-Part

(h׀

, r ׀

)) (7)

The resultant data of this procedure is called an

observation data β as shown in equation (8);

β=< Ð׀

Teacher-Part

, Ð׀

"Hoa

-3"-Part

> (8)

The resultant information from the observation is

one of three cases directives statuses, status one <

status=No pieces have been moved >, if

β =<0, 0>

Status two indicates <status= the robot game piece

has been moved>, if

β =<0, 1>

And finally status three <status="User-Teacher"

piece has been moved>, if

β=<1, 0>

In addition to this, the observation component at our

architecture is able to identify the spatial coordinates

of the game piece which has been moved. These

bundles of data are submitted to the Decision-

making process as will be explained at the next

section, in which the appropriate action is to be

selected.

4 DECISION MAKING

PROCEDURES

During the interaction procedure, the human teacher

sometimes plays random steps. In this case the

concluded information from the observation process

β, are sent to the Decision-making process ψ as.

Ψ (X, Φ, β, δ, τ, σ, ρ)

│

=∑

= 1

B (9)

In the proceeding equation, the Decision-making

main frame Ψ produces a number q for an individual

behaviours B orchestrated by the robot at any step N

while interacting with the human teacher. The

behaviour B is a combination of the motor-

expressions M and/or utterance-expressions U.

B = < M, U >

In order to orchestrate a suitable behaviour in

response to the interaction situation, the decision-

making process Ψ, subscribes to the information

resulted from the observation component β, and also

the decision-making process subscribes to the

archive Φ by performing a process tracking

procedure τ shown in equation (10);

∩ Φ (10)

This resulted information from the process tracking τ

provides Ψ the necessary information in order to

perform hypothetical scenario σ, which is the

resultant data from the union of process tracking

information τ and knowledge data ρ as shown in

equation (11)

This enables the robot to predict and

decide the new step of the task which as follows;

σ = τ

∪ ρ

(11)

The knowledge ρ which the robot obtains through

interacting with its human teacher which as follows;

{X׀

│δ

│

, X׀

│δ

│

X׀

│δ

│

,..,X׀

│δ

│

} (12)

Where X is the matrix resulted from the interacting

with the human teacher, at the ith situation which the

human teacher teaches the robot a specific action δ.

This matrix is stored at the long-term memory, m is

the index as in ((no_of_steps_played, no_of_stage,

max_no_of_steps_played) where

max_no_of_steps_played is the end step played for

the same stage index no_of_stage.

As explained the robot is able to interact with the

human while learning the task through different

A GAME PLAYING ROBOT THAT CAN LEARN A TACTICAL KNOWLEDGE THROUGH INTERACTING WITH A

HUMAN

613

expressions. Let us consider the following

interactive situations that occurred while teaching

the robot the X-O game.

Figure 5: Teaching "Hoap-3" how to achieve winning.

5 TESTING AND EVALUTING

OUR ARCHITECTURE

In order to evaluate the proposed architecture, we

have performed an experiment in which various

interactive situations have been taken place, and

among these situations, a situation at which a

winning chance is available for the robot as in figure

(5-a). However as there is no any data provided in

advance, the robot will not be able to recognize it.

Human teacher, at his playing turn, performs an

interrupting step by moving the robot’s game piece

instead of his game pieces to set the winning row as

in figure (5-b), then prompts the robot to play. The

robot applies low-level identification, starting from

analysing the data streams from the vision sensors,

and obtains the resultant matrix Χ (h,r), which is

stored as a archived data Φ (see equation 2 ). On the

other hand a single piece of data λ (see equation 3)

(see figure 2) which in our present situation is the

matrix in figure 5-a. After applying the observation

algorithm in equations (6) and (7) which leads to

higher level observation β at equation (8), the

following β is obtained;

β =<0, 1>

This information is submitted to the Decision-

making procedure Ψ, which orchestrates number q

of individual behaviour B such as moving the

robot’s upper-torso in addition to its arm through the

arm motor-expressions motor-expressions M, in

addition to this, the Ψ orchestrates an utterance-

expressions U as in figure 2.

The utterance-expression provides the necessary

information as

={ ğ

│

is_winning

│

my_wining

=1,

│

is_defence

=0, ğ

│

my_defence

=0},

which purify the purpose of the human-teacher

action δ.

Figure 6: Teaching "Hoap-3" how to make a defence step.

As the structured interview shows, the robot asks

the human teacher to reset the game set in order to

restart a new stage.

On the other hand the knowledge ρ of the robot

must be updated. Therefore the structured interview

result ğ

and the two matrixes as in figure (5-a, 5-b),

are stored at a different register as a long term

memory as a knowledge data ρ at index m (see

figure 2). Noting that the index N is turned into m

which holds the index N in addition to storing

no_of_steps_played as max_no_of_steps_played

which is useful for the robot whenever using the

knowledge ρ data for making a decision.

Another situation that is a chance to teach the

robot how make a defence step is available as shown

in figure (6). For the pesent situition the same

procces starting from Χ (h,r), until updating the

knowledge ρ takes place. The only different is the

structured interview resultant data ğ

N..

Which is

follows;

={ ğ

│

is_winning

│

my_wining

= 0,

│

is_defence

=1, ğ

│

my_defence

=1}

This leads to inform the robot the high level context

of the teacher’s action δ. Also an updating the

knowledge data ρ is being performed.

In a different situation, at which the human

teacher aims to teach the robot the form of his

winning as shown in figure (7-a). Human plays and

gets the available winning chance as in figure (7-b),

and then he prompts the robot to start playing. Now

we should start a new playing stage due to the

wining that was achieved for human partner.

As there is no data provided in advance, the

robot will not recognize it and starts to play

randomly and may be as in figure (7-c), and then it

prompts its human partner to play. In order to teach

"Hoap-3" how human wining is achieved, human

partner will not move any of the game pieces then it

prompts the robot to play. In this situation high-

level observation β and results as follows;

β =<0, 0>

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

614

Figure 7: Teaching "Hoap-3" how human winning is

achieved.

This means there is no any of the game pieces

have been moved. This data is submitted to decision

making main frame Ψ which orchestrates a new

structure interview based on the real-time interaction

as figure 7 shows. Also the same procedure is

followed by the robot. However the resultant data

from the structured interview ğ

is different for the

previous two situations, which as follows;

={ ğ

│

is_winning

│

my_wining

= 0,

│

is_defence

=0, ğ

│

my_defence

=0}

Also another different in this situation is that the

data X׀

that submitted to knowledge updating ρ

includes the matrix figure 5-a.

During the interaction procedure, the human

teacher sometimes plays random steps. In this case

the observation data is obtained as follows;

β =< 1,0 >

This informs "Hoap-3" that the movement made by

its human teacher is a regular step. In this case

decision making process Ψ performs a different

procedure from the previously explained situation.

The first procedure is performing process

tracking τ by matching the present matrix X

with

the archive data Φ as explained in equation 10 (see

figure 2) if τ = < empty> the robot plays randomly

(see figure 2). However, if τ

≠ < empty> the robot

unites the resultant data from τ with the knowledge

data ρ.

The knowledge data ρ includes the high level

context of every interactive action δ│ğ

made by its

teacher. Based on this union, the robot performs a

hypothetical scenario σ in order to make a rational

choice. However, if the process tracking is τ

> 1 then

the hypothetical scenario’s σ main priority is given

to choose knowledge as follows;

={ ğ

│

is_winning

│

my_wining

= 1,

│

is_defence

=0, ğ

│

my_defence

=0}

Table 1: Statistics of teaching experiment.

Order of the tenth sample space "Hoap-3"

Winning

Achievement

"Hoap-3"

Interviewing its

Human-teacher

First 10

sample space 0 10

Second 10

sample space 1 9

Third 10

sample space 4 6

Fourth 10

sample space 4 6

Fifth 10

sample space 4 6

Sixth 10

sample space 1 9

Seventh 10

sample space 3 7

Eighth 10

sample space 5 5

Ninth 10

sample space 8 2

Tenth 10

sample space 9 1

If the hypothetical scenario σ > 1, then the robot’s

final decision Ψ is by choosing an action resembles

the stage which has the minimum difference

between max_no_of_steps_played and

no_of_steps_played of the X

at which its main

priority is achieved.

∆│

min

=max_no_of_steps_played - no_of_steps_played

The second priority is given to

={ ğ

│

is_winning

│

my_wining

= 0,

│

is_defence

=1, ğ

│

my_defence

=1}

Also if the hypothetical scenario σ

> 1 "Hoap-3"

final decision Ψ is by choosing an action resembles

the stage which has the minimum difference

between max_no_of_steps_played and

no_of_steps_played of the X

From these combinations, the robot is able to

select only rational choice, then the robot says as

follow:

Hoap-3: I think I can win.

Among the individual B (see equation 9) expression

which the robot performs various motor expressions

are made such as upper-torso, hip movements, head

movements, and arms movement. These expressions

improve and imply the human-human behaviour.

6 RESULTS

In order to show the efficiency of our proposed

architecture, we performed an experimental test

composed of 100 stages and its sample space is as

shown in Table 1. New stage occurs if the robot

1 112131415161718191

winning

achievment

LearningRate

A GAME PLAYING ROBOT THAT CAN LEARN A TACTICAL KNOWLEDGE THROUGH INTERACTING WITH A

HUMAN

615

learns new idea about the winning or defence for

itself or for the human. Also if a winning case of the

taught ones to the robot is performed by the robot

itself. The results at the table are indicated at the

graph, shows that the rate of winning achieved by

the robot is increased gradually, which indicates that

robot learning level is increased by the increasing

the number of interactive stages. This is a clue for

improving robot knowledge of the game strategy.

7 DISCUSSION

We will now reflect some design issues on our robot

architecture from two perspectives: component

design and communication of information between

components.

7.1 Information Generation

An important requirement is the need of building an

approach that is able to generate new valuable

information to be based and resulted from the

available information. For example, in the X-O

game, observation component is able to detect the

spatial positions of the moved game piece with

respect to the camera frame in terms of 2-D. This

coordinates information is processed by position

component and transformed into 3-D, and

transferred to knowledge-updating, allowing "Hoap-

3" to use when executing knowledge based

decisions.

7.2 Information Flow

In order to improve the overall system

responsiveness, we have found that one-to-many

information flow structure is very useful. Where, the

information is produced by one component and

published to the system, where, other components

process this information for their own purposes. For

example, during the X-O game, the human partner

performs interruptive movements to the game;

observation component detects these interruptive

events. The resultant detected information is

published to the rest of the system. Simultaneously,

the published information is handled by other

component. The decision-making process uses this

information in order to decide the proper choice of

wording of the structured interviews. Meanwhile,

the detected information in addition to the resultant

interviewing flags are used to update "Hoap-3"

knowledge.

REFERENCES

Sweller, J., 2006. Visualization and instructional design.

In3rd Australasian conference on Interactive

entertainment, Vol, 207,pp 91-95.

Bransford, J., Brown, A., & Cocking, R, 2001. How

people learn. Brain, Mind, Experience, and School.

Expanded version. National Academy Press:

Washington, DC. p. 33.

Baddeley, A. D., 1996. Human Memory: Theory and

Practice. Hove: Psychology Press.

Marois, R. 2005. Two-timing attention. Nature

Neuroscience. In Nature Neuroscience. pp 1285-1286.

Kuniyoshi. Y., Inaba M., and Inoue H. 1994. Learning by

watching: Extracting reusable task knowledge from

visual observation of human performance. In IEEE

Transactions on Robotics and Automation, vol 10

pp.:799–822.

Voyles. R and Khosla P., 1998. A multi-agent system for

programming robotic agents by human demonstration.

In Proceedings of AI and Manufacturing Research

Planning Workshop.

Reeves, B. & Nass, C. 1996. The media equation—how

people treat computers, television, and new media like

real people and places. Cambridge, UK: Cambridge

University Press.

Lockerd A., Breazeal C. 2004 Tutelage and socially

guided robot learning”. In Proceedings. International

Conference on Intelligent Robots and Systems,

IEEE/RSJ Vol. 4, pp. 3475 – 3480..

Nicolescu M. N., Mataric M. J., 2001. Learning and

interacting in human-robot domains” In Systems, Man

and Cybernetics, Part A: Systems and Humans, IEEE

Transactions on Vol. 31 No.5. ,pp. 419 – 430.

Barry Brian Werger, 2000. Ayllu: Distributed port-

arbitrated behaviour-based control. In Proc., The 5th

Intl. Symp. On Distributed Autonomous Robotic

Systems, Knoxville, TN, pp. 25–34

Maes P. and Brooks R. A. 1990. Learning to coordinate

behaviours. In Pros AAAI, Boston, MA, pp.796-802.

Mahadevan S. and Connell J., 1991. Scaling reinforcement

learning to robotics by exploiting the subsumption

architecture. In Proc. Eighth Int. Workshop Machine

Learning, pp.328-337.

Lauria S., Bugmann G., 2002 Mobile robot programming

using natural language. Robotics and Autonomous

Systems, 38(3-4):171–181.

Brian S, Gonzalez J. 2008. Discovery of High-Level

Behaviour from Observation of Human Performance

in a Strategic Game. In Systems and Humans, IEEE

Transactions on Vol. 38 No.3, pp. 855 – 874.

Mahnmoud, R. A., Ueno, A., Tatsumi S., 2008, A game

Playing robot that can learn from experience. In

HSI’08 on Human System Interactions pp. 440-445.

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

616