DIALOGUE-BASED MANAGEMENT OF USER FEEDBACK IN AN

AUTONOMOUS PREFERENCE LEARNING SYSTEM

∗

Juan Manuel Lucas-Cuesta, Javier Ferreiros

Speech Technology Group, Universidad Polit

ecnica de Madrid, Spain

Asier Aztiria

University of Mondragon, Spain

Juan Carlos Augusto, Michael McTear

School of Computing and Mathematics, University of Ulster at Jordanstown, Northern Ireland, UK

Keywords:

Spoken dialogue systems, Autonomous preference learning, User feedback, Human-computer interaction.

Abstract:

We present an enhanced method for user feedback in an autonomous learning system that includes a spoken

dialogue system to manage the interactions between the users and the system. By means of a rule-based

natural language understanding module and a state-based dialogue manager we allow the users to update the

preferences learnt by the system from the data obtained from different sensors. The design of the dialogue

together with the storage of context information (the previous dialogue turns and the current state of the

dialogue) ensures highly natural interactions, reducing the number of dialogue turns and making it possible to

use complex linguistic constructions instead of isolated commands.

1 INTRODUCTION

Ambient Intelligence (AmI), deﬁned as ‘a digital en-

vironment that proactively, but sensibly, supports peo-

ple in their daily lives’ (Augusto, 2007), is the fo-

cus of intensive research within the Computer Science

community, as AmI systems provide signiﬁcant op-

portunities for computers to improve the standards of

life for some segments of our society. Applications of

AmI are nowadays being developed in a variety of ar-

eas: automotive, healthcare, etc. (Cook et al., 2009).

All those applications have a common overar-

ching aim of serving people unobtrusively (Weiser,

1991; Aghajan et al., 2009). Spoken dialogue systems

(McTear, 2004) are particularly well suited to cover

this requirement as they allow humans to communi-

cate with a system in a natural way, without demand-

ing technical training. This provides a much more

natural interaction mode than, for example, forcing

users to use a PDA, a keyboard, a touch screen, re-

∗

This work has been partially supported by the Spanish

Ministry of Education under the Program of University Per-

sonnel Training (FPU), reference AP2007-00463.

mote controls, or other physical devices.

A natural interaction also implies the ability of the

system to adapt to different users. It is thus important

that the system can identify the habits and preferences

of each user, and learn this information to modify its

behaviour in order to offer each user the functionali-

ties that are most appropriate. The learning of these

usage patterns has to be carried out in a transparent

way for the user. State-of-the-art learning algorithms

would produce associations which are not entirely ap-

propriate for the user. Therefore, it is important that

such algorithms are coupled with a system which al-

lows a user to ﬁne-tune those learnt preferences.

1.1 Preference Learning

Learning the habits and preferences of each user is an

essential feature in any AmI system. There exist dif-

ferent examples in the literature regarding this learn-

ing task. One of the ﬁrst approaches concerned the

use of Artiﬁcial Neural Networks for inferring rules

for smart homes (Begg and Hassan, 2006). Other

techniques, such as Fuzzy-Logic (Hagras et al., 2004)

330

Manuel Lucas-Cuesta J., Ferreiros J., Aztiria A., Augusto J. and McTear M. (2010).

DIALOGUE-BASED MANAGEMENT OF USER FEEDBACK IN AN AUTONOMOUS PREFERENCE LEARNING SYSTEM.

In Proceedings of the 2nd International Conference on Agents and Artiﬁcial Intelligence - Artiﬁcial Intelligence, pages 330-336

DOI: 10.5220/0002732403300336

 SciTePress

and Case-Based Reasoning (Sadeh et al., 2005), have

also been used but, as M

uller pointed out (M

uller,

2004), ‘the overall dilemma remains: there does not

seem to be a system that learns quickly, is highly ac-

curate, is nearly domain independent, does this from

few examples with literally no bias, and delivers a

user model that is understandable and contains break-

ing news about the user’s characteristics’.

1.2 User Feedback

It is important that the user gets information regard-

ing the patterns learnt by the system, in such a way

that he/she can accept the correct ones and reﬁne or

remove the wrong ones. To offer the user that intor-

mation, the system needs an interaction interface. It

is also important that the interaction between the user

and the system takes place as intuitively for the human

as possible. In that sense, it is a highly desirable goal

to allow users to interact with the system by speak-

ing with it, since speech is the most natural way of

communication between humans.

A speech-based system needs to understand what

the users say and to provide spoken answers to them.

It also has to gather the actions that users want to carry

out and deliver them to the proper set of actuators. We

will refer to a system that can perform all these tasks

as a Spoken Dialogue System (SDS, (McTear, 2004)).

To offer the user a ﬂexible and natural interaction,

it is possible to design the dialogue manager (i.e. the

core of the SDS) as a ﬁnite state machine (FSM) that

offers the users the different actions to perform fol-

lowing a given command. However, more interesting

approaches offer the user more initiative when inter-

acting with the system. For instance, a Bayesian Net-

works (BN) approach (Fern

andez et al., 2005), makes

the system able to deal with incomplete information

situations, such as ellipsis or anaphora, thus allowing

the user to speak in a highly natural and friendly way.

Our approach of developing a complete spoken di-

alogue system makes use of a FSM strategy, enhanc-

ing the vocabulary that the recognizer is able to han-

dle. However, although we provide the users with the

ability to talk in a more natural way, the initiative of

the dialogue still relies on the system, which controls

the dialogue ﬂow. To address this, we offer the user

the chance to skip several of the states, if he/she has

enough experience in interacting with the system.

1.3 Autonomous Learning System

Our approach aims at obtaining user patterns from

sensor data. For that we have developed a system,

called PUBS (Patterns of User Behaviour System)

(Aztiria et al., 2008), which discovers user’s common

behaviours and habits from data recorded by sensors.

PUBS is made up of three different modules. A Lan-

guage (L

PUBS

) to represent the learnt patterns in a

clear and non ambiguous way; an Algorithm (A

PUBS

)

that discovers patterns in data collected by sensors;

and an interaction system (I

PUBS

) which allows the

users to have a basic interaction with the system.

Our original HCI interface, I

PUBS

, was developed

using Sphinx-4 (Walker et al., 2004) for the Auto-

matic Speech Recognition (ASR) task, and FreeTTS

(Walker et al., 2002) to provide speech-based feed-

back to the user. This system had two main draw-

backs. On one hand, the user had to know the way

PU BS stored and represented the information. On the

other hand, the system-initiative approach for man-

aging speech interaction allowed the user to talk in a

very restricted way, using only isolated words (i.e. ac-

cept, reject, or closed answers), or simple commands,

such as device names or numbers.

Given that we want to provide the users with a

more natural and ﬂexible speech interaction, our aim

is to develop a dialogue-based interaction system,

substituting the baseline keyword-based approach but

reusing the recognition and synthesis tools. The new

interaction system, which will be referred to as I

PUBS

is presented in the next Section.

2 DIALOGUE-BASED USER

FEEDBACK

The spoken dialogue system we have developed,

PUBS

, makes use of the Sphinx-4 recognition soft-

ware and the FreeTTS synthesizer, as we have said

before. Therefore, our goal is to design the natural

language understanding (NLU) module, as well as the

dialogue manager (DM).

Although we have kept the ASR module, we have

modiﬁed the language models it makes use of, allow-

ing more complex sentences. We have collected a set

of sentences of the application domain (i.e. the con-

trol of the patterns learnt by the system), and used

them to train an n-gram based language model.

Each of the new modules works as follows. Once

the ASR has recognized the input utterance (i.e. iden-

tiﬁed the words which form the utterance), the NLU

extracts the semantics of those words. We will refer to

this semantics as the dialogue concepts. Using these

concepts, the DM obtains the user intentions, that is,

the commands that the user has said to the system. If

this information is enough to carry out the task the

user has requested, the system will perform it. Oth-

erwise, it will store this information, together with

DIALOGUE-BASED MANAGEMENT OF USER FEEDBACK IN AN AUTONOMOUS PREFERENCE LEARNING

SYSTEM

331

the current dialogue state, in the Context Information

Memory, and will ask the user for the remaining data.

2.1 Natural Language Understanding

The NLU module extracts the semantics from the

recognized utterance. By semantics we refer to the

‘meaning’, related to the application domain, of the

different words or phrases which form the utterance.

We represent the semantic information as dialogue

concepts. Each concept consists of an attribute (a

concept identiﬁer) and the value that the user has as-

signed to the attribute. In our domain, the values can

be literals (i.e. the name of a room), numbers (i.e. the

value measured by a sensor) or binary variables (i.e.

whether the user has requested some information).

The NLU module develops a two-step procedure.

First of all, each recognized word is categorized ac-

cording to different criteria. Once every word is cat-

egorized, the understanding step converts the catego-

rized words into dialogue concepts. Both the catego-

rization and the understanding submodules perform

their tasks using a set of rules deﬁned by an expert.

We have deﬁned a set of 20 different categories in-

cluding a GARBAGE tag, which is used to label those

words that do not carry semantic information.

A given word can be labeled with more than one

category, depending on its context (i.e., the closest

words in the utterance). We have developed a bottom-

up strategy to categorize the recognized utterance. We

ﬁrst categorize the words with a single category (i.e.

numbers, rooms, or days). Then the analysis is ex-

tended to the neighbourhood of each word and the

already-tagged words. The combination of words or

categories can thus generate new categories.

Once all the words of the recognized utterance

have been uniquely labeled, the understanding mod-

ule applies a set of expert-deﬁned rules to assign the

dialogue concepts (attributes and values) to the sen-

tence. We have deﬁned 16 different concepts, which

are extracted by applying a speciﬁc subset of rules.

2.2 Dialogue Manager

The dialogue manager that we have developed for

PUBS

is a ﬁnite state machine (FSM). Depending on

the current state and using the information the user

provides (which the understanding module has ex-

tracted), the system takes a decision about what action

the user wants to fulﬁll. The system tries to perform

this action, using also the information stored in a con-

text memory, providing all this information is enough

for fulﬁlling the required action. Otherwise, the sys-

tem will ask the user for the information it needs.

As the architecture of the system is based on a

FSM, most of the initiative belongs to the system, as

in I

PUBS

. However, our system allows the user a cer-

tain degree of control of the dialogue. For instance,

the user does not need to follow each step of the dia-

logue. If the users have enough knowledge about the

system, they can avoid different states, making the di-

alogue more ﬂexible. Furthermore, the users have the

chance to stop the current dialogue if they perceive

some wrong behaviour of the system. The state ma-

chine of the dialogue manager is shown in Fig. 1.

The full process of the dialogue management can

be viewed as the following algorithm:

state := BEGIN;

synthesize(greeting);

while (state != END) do

if (recognition_expected) do

recognize();

understand(output_recognition);

end if;

dialogue_management(output_understanding);

synthesize(output);

end while;

At the beginning of the interaction, the system

synthesizes a welcome and sets the state to BEGIN-

NING. If the user requests some information, the sys-

tem returns the number of devices and the number of

patterns learnt for each device. Now the user can ask

for a speciﬁc pattern or for a set of them. Depending

on that, the system will update the dialogue memory

with the patterns the user has requested. In any case,

the next state of the dialogue is ACTION PATTERN.

For each pattern the user asks for, the system syn-

thesizes all the information related to it (that is, its

event, condition and action). The user can then per-

form an action over the pattern (i.e. accept, reject

or reﬁne it). Accepting a pattern means that the user

ﬁnds it useful, so it is stored for using it proactively. A

rejection implies that the pattern is not useful, so it is

deleted. If the user wants to reﬁne a pattern it means

that the pattern is useful for the user, but it need to be

tuned according to the user’s desires.

In the two ﬁrst cases, the system updates the

state of the pattern, conﬁrming the accepted patterns

and deleting the rejected ones. As the process for

the current pattern has ﬁnished, in the next state

(CHECK PATTERNS) the DM will check whether

there are more patterns to be shown to the user.

When the user wants to modify a pattern, the sys-

tem (in the state MODIFY PATTERN) asks the user

which of the three pattern ﬁelds he/she wants to mod-

ify. The user can ask for a single ﬁeld, or for any

combination of them (i.e. the event and the condition,

or even the three ﬁelds at the same time).

Each ﬁeld has a speciﬁc main state (MOD-

ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence

332

Figure 1: State machine of the Dialogue Manager.

IFY EVENT, MODIFY CONDITION, and MOD-

IFY ACTION), which controls the different elements

that form the ﬁeld. If the user wants to modify the

event, he/she can modify the device which triggers

the event, or the location in which he/she has to be

detected. If the user decides to reﬁne the condition,

he/she can act over the attribute of the condition, or

the value this attribute takes. For modifying the ac-

tion, the user can select the device to act over, the ac-

tion to perform (to switch on/off the selected device),

or the moment in which the action will take place.

For each element of each ﬁeld, once the user has

modiﬁed the desired setting, the system starts a con-

ﬁrmation step, providing spoken feedback to the user

to ensure that the recognition and understanding steps

worked properly. If the user conﬁrms his/her se-

lection, the system will update the state of the pat-

tern. Furthermore, the dialogue manager will check

(at the dialogue memory) if any ﬁeld of the cur-

rent pattern remains to be modiﬁed. In this case,

the system will repeat the previous steps for the new

ﬁeld (in the state CHECK

FIELDS). Otherwise, the

procedure for the pattern is ﬁnished. The system

sets CHECK PATTERNS as new state, and checks

whether the user asked for more patterns, in which

case it has to start again the dialogue steps for the new

pattern. When the user has evaluated all the desired

patterns, the system moves to the END state and says

goodbye to the user, ﬁnishing thus the interaction.

If the user detects some wrong behaviour of the

system (a misrecognition of his/her utterance, or

an incorrect understanding, which leads to a non-

expected state), he/she can abort the current dialogue

by asking the system to cancel the spoken interaction.

In each state, if the user wants to cancel the dialogue,

the system interrupts the normal dialogue ﬂow by di-

rectly reaching the END state of the FSM. In this sit-

uation, the system informs the user about what the

problem has been, and ﬁnishes the interaction.

The system has a context memory for storing the

information related to every dialogue turn, as well as

the current dialogue state. Using this information, the

system can decide which action to take and which

state to reach. The dialogue memory acts thus as a

short-term memory, (i.e. it only takes into account

the information relevant for the pattern presented to

the user). Therefore, the information stored in the

context memory includes the device the pattern be-

longs to, the index of the pattern (provided that there

is more than one pattern learnt for this device), and the

current values of the three ﬁelds of the pattern (event,

condition and action). The system can easily check

which of these ﬁelds the user has already modiﬁed,

and which ones remain to be modiﬁed.

DIALOGUE-BASED MANAGEMENT OF USER FEEDBACK IN AN AUTONOMOUS PREFERENCE LEARNING

SYSTEM

333

Table 1: Initial dialogue turns.

Dialogue turn Next state Context memory

S. Hello, welcome to the feedback system. How can I help you? BEGINNING

U. How many patterns have you learnt?

S. There are three devices. WAITING PATTERN

Device Lamp A has ﬁve different patterns.

Device Lamp B has two different patterns.

Device Lamp C has two different patterns.

Which pattern of which device do you want to listen to?

U. I want to listen to every pattern of Lamp A.

S. Pattern Lamp A. ACTION PATTERN device: Lamp A

user is in V A pattern: 0

Condition 0 time is earlier than seven a m twelve

then turn on Lamp A

What do you want to do with this pattern?

U. Accept it. REMAINING

PATTERNS

S. Pattern Lamp A. ACTION PATTERN device: Lamp A

(. . . ) pattern: 1

3 INTERACTING WITH THE

SYSTEM

The examples we present next show different dia-

logue turns of both the user (U.) and the system (S.),

as well as the next state of the dialogue manager, and

the information stored in the context memory.

Table 1 shows the beginning of the interaction

with the system. Once the system synthesizes the

greeting message, the user requests some information.

This turn makes the system wait for information about

a speciﬁc pattern, or a set of patterns.

The user can ask not only for a single pattern, but

also for every pattern of a device, or even for every

pattern of all the devices. The context memory will

store the pattern presented and its associated device.

Once a pattern has been presented to the user,

he/she has to decide what to do with it. If the user

accepts or rejects it, the system conﬁrms or deletes

the pattern. Then the system checks if there are more

patterns to show to the user. In this case, it updates

the context memory to present the next pattern.

Table 2 shows a situation in which the user wants

to reﬁne a pattern. He/she can select any ﬁeld of the

pattern, or a combination of them. In any case, the

system updates the context memory.

At any time, the system tries to offer the user the

information he/she would need to fulﬁll the current

action. This way, when the user wants to modify a de-

vice, the system shows a list of the available devices,

avoiding an additional dialogue turn in which the user

probably asks the system about the available devices.

We can thus complete the dialogue with fewer turns.

4 CONCLUSIONS

We propose an enhanced speech interface to allow

user feedback in a learning system by means of a

SDS. Our new approach not only deals with isolated

commands, but also allows speakers to use more nat-

ural linguistic constructions, achieving more dynamic

and user-friendly human-machine dialogues.

The inclusion of a context database which stores

the information regarding previous dialogue turns en-

ables the system to tackle incomplete utterances. This

way the naturalness of the dialogues is increased.

Currently we are designing a framework to evalu-

ate the performance of I

PUBS

. Our aim is to develop

a set of feasible scenarios in which a user interacts

with the system fulﬁlling different actions regarding

the patterns learnt (selecting and asking for different

patterns; accept, reﬁne, or delete them, and so on).

We can measure the goodness of our approach by an-

alyzing both objective and subjective metrics. Among

the objective ones, the most interesting metrics are

the recognition rates and the concept conﬁdence, to-

gether with several dialogue variables: the number of

dialogue turns the system needs to fulﬁll a speciﬁc ac-

tion, or the number of turns in which the system needs

to ask the context memory for previous information.

The subjective metrics can be estimated by means of

a questionnaire in which we ask for different aspects

of the system, such as how natural the dialogue is per-

ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence

334

Table 2: Example of pattern modiﬁcation.

Dialogue turn Next state Context memory

S. Pattern Lamp A. (. . . ) ACTION PATTERN device: Lamp A

What do you want to do with this pattern? pattern: 0

U. Modify it.

S. What do you want to modify in this pattern? MODIFY PATTERN

U. Could I modify the event and the action?

S. User location was V A. MODIFY EVENT ﬁeld: event

Which location do you want to be the new one?

Possible user locations are V A, V Bedroom. MODIFY LOC

U. Please set V Bedroom as new user location.

S. Do you want to conﬁrm V Bedroom as new user location? CONFIRM LOC

U. Yes.

S. The action carried out was turn on the Lamp A. CHECK FIELDS

What do you want to modify in the current action? MODIFY ACTION ﬁeld: action

You can modify the device and the action.

U. I would like to modify the device and the action.

S. Possible devices are Lamp A, Lamp B, Lamp C. MODIFY ACTION

Which device and action do you want to set?

U. Switch on Lamp B.

S. Do you want to conﬁrm On as new action CONFIRM ACTION

and Lamp B as new device?

U. Yes.

S. Thank you for your collaboration. END

ceived, the ease of interaction with the system, or the

advantages over the original approach.

Using user-related information (i.e. knowledge

about the identity of the user who has caused the

learning of each pattern) would be a highly attrac-

tive task. This information would allow the system to

show each user only those patterns related to him/her.

We could also modify on-line the behaviour of the di-

alogue system, adapting it to each user’s preferences.

Neither I

PUBS

nor I

PUBS

can know beforehand the

devices in the environment, so that grammars for the

recognizer could be dynamically created and loaded.

A simple way of doing that consists of mixing a static

grammar and different dynamic ones (Lucas-Cuesta

et al., 2009). The ﬂexibility of Sphinx-4 will ensure a

good performance of the recognized sentences as long

as the vocabulary matches the dynamic grammar.

REFERENCES

Aghajan, H., Delgado, R. L.-C., and Augusto, J. (2009).

Human-Centric Interfaces for Ambient Intelligence.

Academic Press, Elsevier.

Augusto, J. C. (2007). Ambient Intelligence: The Con-

ﬂuence of Pervasive Computing and Artiﬁcial Intel-

ligence. In Schuster, A., editor, Intelligent Computer

Everywhere, pages 213–234. Springer Verlag.

Aztiria, A., Augusto, J. C., and Izaguirre, A. (2008). Au-

tonomous Learning of User’s Preferences Improved

through User Feedback. In Gottfried, B. and Aghajan,

H. K., editors, BMI, volume 396 of CEUR Workshop

Proceedings, pages 72–86. CEUR-WS.org.

Begg, R. and Hassan, R. (2006). Artiﬁcial Neural Networks

in Smart Homes. Designing Smart Homes: The Role

of Artiﬁcial Intelligence, pages 146–164.

Cook, D. J., Augusto, J. C., and Jakkula, V. R. (2009).

Ambient Intelligence: applications in society and op-

portunities for AI. Pervasive and Mobile Computing,

5:277–298.

Fern

andez, F., Ferreiros, J., Sama, V., Montero, J. M., Se-

gundo, R. S., and Mac

ıas-Guarasa, J. (2005). Speech

interface for controlling a Hi-Fi audio system based

on a Bayesian Belief Networks approach for dialog

modeling. In Proc. INTERSPEECH, Lisbon, Portu-

gal, September 2005, pages 3421–3424.

Hagras, H., Callaghan, V., Golley, M., Clarke, G., and

Pounds-Cornish, A. (2004). Creating an ambient-

intelligence environment using embedded agents.

IEEE Intelligent Systems, 19(6):12–20.

Lucas-Cuesta, J. M., Fern

andez, F., and Ferreiros, J.

(2009). Using Dialogue-Based Dynamic Language

Models for Improving Speech Recognition. In Pro-

ceedings of the 10th Annual Conference of the Inter-

national Speech Communication Association (Inter-

speech), Brighton, UK, pages 2471–2474.

McTear, M. (2004). Spoken Dialogue Technology: Toward

the Conversational user Interface. Springer Verlag.

uller, M. E. (2004). Can user models be learned at all?

Inherent problems in machine learning for user mod-

eling. The Knowledge Engineering Review, pages 61–

88.

DIALOGUE-BASED MANAGEMENT OF USER FEEDBACK IN AN AUTONOMOUS PREFERENCE LEARNING

SYSTEM

335

Sadeh, N., Gandom, F., and Kwon, O. (2005). Ambient

Intelligence: The MyCampus experience. Technical

report, CMU-ISRI-05-123.

Walker, W., Lamere, P., and Kwok, P. (2002). FreeTTS

- A Performance Case Study. Technical report, Sun

Microsystems Laboratories (TR-2004-139).

Walker, W., Lamere, P., Kwok, P., Raj, B., Singh, R., Gou-

vea, E., Wolf, P., and Woelfel, J. (2004). Sphinx-4: A

Flexible Open Source Framework for Speech Recog-

nition. Technical report, Sun Microsystems Laborato-

ries (TR-2002-114).

Weiser, M. (1991). The computer for the 21st century. Sci-

entiﬁc American, 265(3):94–104.

ICAART 2010 - 2nd International Conference on Agents and Artificial Intelligence

336