Asking the Right Questions:

Task Hierarchy Predictive Traversal Mechanisms for

Mixed Initiative Dialog Management

Juan M. Huerta

IBM T J Watson Research Center, 1101 Kitchawan Road,

Yorktown Heights NY, 10598

Abstract. This paper describes an approach for building conversational appli-

cations that dynamically adjust to the user’s level of expertise based on the

user’s responses. In our method, the Dialog Manager interacts with the applica-

tion user through a mechanism that adjusts the prompts presented to the user

based on a hierarchical model of the domain, the recent interaction history, and

the known complexity of the domain itself. The goal is to present a conversa-

tional modality for experienced or confident users, and a simpler Directed Dia-

log experience for the more inexperienced users, and to dynamically identify

these levels of expertise from the user’s utterances. Our method uses a task hi-

erarchy as a representation of the domain and follows a feedback control sys-

tem framework to traverse of this tree. We illustrate these mechanisms with a

simple sample domain based on a car rental application

1 Introduction

Voice-based applications are becoming more prevalent in our lives due mainly to

advances in fundamental pattern recognition technology, computer system middle-

ware, and other supporting technologies. Today, we can find voice-based telephony

applications aimed at automating call centers, enabling automated transactions for e-

commerce and supporting basic customer care management. Whether voice or text

based (e.g., [1,7]), conversational technology is emerging as an important type of

interface when interacting with enterprise information systems. Examples of architec-

tures implementing telephony-based conversational applications are described in

[9,10].

In some domains, simpler Directed Dialog interaction constitutes perhaps a more

desirable interaction choice. In a Directed Dialog modality, the application drives the

interaction by asking specific questions. Directed Dialog applications are better suited

for clearly structured domains, for resolving very specific ambiguities, or for interact-

ing with naïve users. VXML and SALT [8] are examples of typical architectures

useful to implementing Directed Dialog applications.

In the Mixed Initiative modality, the application is able to switch between free

form interaction (i.e., fully conversational) and directed dialog depending on the

needs and expertise of the user and complexity of the active subtask. It is the respon-

sibility of the application (the Dialog Manager, specifically) to determine the nature

M. Huerta J. (2004).

Asking the Right Questions: Task Hierarchy Predictive Traversal Mechanisms for Mixed Initiative Dialog Management.

In Proceedings of the 1st International Workshop on Natural Language Understanding and Cognitive Science, pages 106-115

DOI: 10.5220/0002681301060115

 SciTePress

of the interaction presented to the user at any given time. Examples of mechanisms

useful to implementing Mixed Initiative applications are [6, 12].

In this paper we propose a method to dynamically adapt the user interface of a

conversational application based on a predictive traversal mechanism of the task

hierarchy tree. Combined with this domain hierarchy traversal, we incorporate a

mechanism to apply scaffolded prompting in order to allow self-revealing help and

prompt rewording. Combined, these approaches enable a system to “Ask the Right

Questions” based on the difficulty of the task and the experience of the user, hence

driving the interaction at its most natural pace and level of complexity.

This paper is organized as follows: we present a brief background on types of

domains or tasks (structured vs. unstructured) and discuss their representational diffi-

culties. We then review techniques to represent domains and their prompts, specifi-

cally the work previously proposed by Hochberg et al describing a method to define

structured domains as task hierarchies. We then review the work by Gorin et al on

Spoken Dialog as a Feedback Control System. We then propose a method to base the

traversal of the domain hierarchy on a feedback control mechanism that adapts the

modality of the interaction with the user. And finally, we give examples of this tech-

nique.

2 Structured and Unstructured Conversational Domains

In certain types of applications, the information that is needed from the user in or-

der to execute a transaction is contained in a clearly defined set of attributes. The

relationship between them is also well defined and structured and typically well un-

derstood by the application user. Examples of these domains are Travel Reservation

and Car Rental. When using voice as the interaction modality, these applications are

easily implemented using Directed Dialog and the frameworks that support it (e.g.,

Voice XML). For the Graphical User Interface it is easy to implement these applica-

tions using forms-based markup (e.g., XForms [14]).

It is also possible to implement this type of application with more conversational

interfaces. The assumption is that in a single turn of a dialog, a user might include, by

his initiative, more than a single token of information, making the overall interaction

more efficient. An example of this would be a user spontaneously providing a cluster

of information, like all the pickup information pertinent to a car rental transaction.

On the other side of the spectrum are applications where the domain is less struc-

tured (e.g., The “How May I help You” system [7]) where utterances relate the same

information could be worded in a variety of ways and might be embedded in utter-

ances containing additional spurious information. A speech application should be able

to deal with these scenarios by incorporating statistical recognizers and parsers that

accept less restricted ways of speaking, and represent any extracted information in

canonical ways prior to processing it.

107

3 Tree-based hierarchical representation of Structured Domains

Hochberg et al described in [4] a method that uses task hierarchies to provide a ba-

sic representation of the domain to the dialog system. The task hierarchy of a domain

is a tree in which the leaves are associated with specific canonical values of domain

attributes. In the framework proposed by Hochberg et al, (called HOT) the root of the

hierarchy specifies general methods for managing the dialog, while the intermediate

nodes perform actions based on their children or other intermediate nodes. This is a

parsimonious representation of the domain that in the HOT framework allows imple-

menting 5 specific relationships of the tasks (i.e., the intermediate nodes) in a hierar-

chy.

In contrast, other techniques have been proposed to represent a domain, utilizing

forms (e.g., Papineni [10]), and schemas, etc. There are also other graph-based repre-

sentations of semantic domains; for example Wright et al [11] and Huerta et al [5].

Fig. 1. A partial task hierarchy of a Car Rental application. Colored nodes corre-

spond to terminal nodes.

For the purpose of illustration, Figure 1 shows an example of a task hierarchy for a

Car Rental application. For the remainder of the paper we refer to the notation de-

scribed in this section. The nodes are labeled as L

i,j

where i is the level of the node

and j is its place in its corresponding level. In this example, the terminal nodes are in

Level 4 (i=4). In this figure, only the “Pickup” node at level L3 has been extended to

terminal nodes. The terminal nodes are represented in colored nodes. In a Speech

Recognition based application, such nodes can be recognized and identified by gram-

mars, language models, or name entity detectors. The name of each node contains the

value of the attribute named at the first node of each level (i.e., Action is the L2 at-

tribute, Topic is L3, and Entity is L4). Based on the framework employed by the

Dialog Manager to traverse the application, the task hierarchy representation is util-

ized to determine the prompts presented to the user with specific questions. These

date time

Root:Car Rental

Cancel

Change PIN

Topic(L

2,1

Pickup

Return

Vehicle Type

Action(L

1,1

)=New Reservation

4,3

2,1

2,3

2,2

3,1

3,2

3,3

4,1

Entity(L3)=location

1,1

4,2

108

questions can be triggered by non-terminal nodes (broad questions) or by leaves or

terminal nodes (narrow or specific questions). Even at a single node, different word-

ings can be employed at different times, depending on the situation (e.g., a rewording

of the initial prompt might be useful if the user needs help). These mechanisms re-

lated to automatic prompt extensions are generally described as prompt scaffolding

[13]. In the next subsection we describe how to incorporate prompt scaffolding into

the task hierarchy representation.

3.1 Prompt Scaffolding in a Task Hierarchy

When traversing the task hierarchy tree, at any given terminal or non-terminal

node a user might need to be presented with prompt rewording, further prompt ex-

planation, or help. Prompt Scaffolding is a common technique in Voice User Inter-

face Design used to introduce successive levels of complexity in help and guidance

according to the user’s responses. Figure 2 shows and example of scaffolded prompts

for a voice interface associated to a connected subset of the nodes of the task hierar-

chy presented in Figure 1. A prompt L

i,j

is associated with the node L

i,j

in its k

re-

prompting.

In the same figure, prompt L

3,1

corresponds to a template prompt which is utilized

when 1 out of the 3 attributes for level L

3,1

are obtained and a prompt for the remain-

ing 2 is produced. In this prompt <attname1> and <attname2> are the names of

the missing attributes. In a typical Dialog Manager, if there is a required valid input

from the user, the system will attempt initial prompting and then subsequent second,

third or further prompts until the user generates a valid response. Prompt scaffolding

is a simple way to deal with naïve users or difficult spots in an application regardless

of interaction modality (Directed Dialog, Mixed Initiative or Conversational).

Fig. 2. Sample scaffolded prompts for a subset of the nodes of the task hierarchy in

Figure 1

1,1

: How may I help you?

1,1

: Please choose one of the following new reservation, cancel reservation or change personal in-

formation.

2,1

: New Reservation. I need some information about this rental.

3,1

: Please tell me the pickup information.

3,1

: Say a location, date and time for picking up the car. For example “Tomorrow in Boston”’

3,1

: what are the <attname1> and <attname2> for this rental?

4,1

: Please say a pickup location, for example “Boston Airport”?

4,1

: Where do you want to pick up the car?

4,1

: If you know the pickup location say it now, for example “Boston Airport”. Or say “transfer” or

“help” for assistance.

109

4 Spoken Dialog Management as a Feedback Control System

A dialog manager (DM) is responsible for driving the interaction with the user in a

conversational application. In a mixed initiative system, the dialog manager is re-

sponsible for reacting to the user’s utterances using an appropriate level of complex-

ity. In other words, the DM’s prompts need to be adjusted to the level of sophistica-

tion shown by the user. One can frame such interactions as Feedback Control Sys-

tems (FCS) in which the DM constantly computes and acts upon a measured error or

deviate between the user’s answers and the DM own expectations for such answers.

In this section, we describe first the work by Gorin et al. [2] on utilizing FCS in

speech understanding applications, then we present our proposal for an FCS-based

approach to DM.

4.1 Dialog-based interaction as Feedback Control System

In [2] Gorin described a way to frame the interaction between a human and a ma-

chine as a feedback control scheme. In his approach, the speech application con-

stantly tracks the likelihood of each action Ck in the space of possible actions, given

an observed sentence s

)|(log)(

lklk

sCPsa

−

At each point of the dialog, the best action to be taken by the system is determined

by the accumulation vector A, which is obtained by adding the vector containing the

weights (or likelihoods) of the interpretation of the observed sentence a(si) with the

previous sentence’s accumulation’s vector Ai-1 and the error feedback vector

)(

se :

)()()1(

1 llllll

sesaAA

+−=

−

The error feedback term in above equation can reflect confidence in the user’s in-

put, the user’s feedback, focus on specific action parameters, or combine user’s and

environmental feedback in a multisensory environment, for example [3].

4.2 Mixed-initiative Dialog Management as a FCS

We introduce here a Mixed Initiative based Dialog Manager based on the FCS

framework. Our manager aims to identify the state where the application should move

to and the interaction modality it should based on the interaction history and deviates

of the user’s input from system’s expectations. Figure 3 below depicts or proposed

dialog manager organization. The Dialog Manager’s functions are split into three

boxes: (I) Dialog Manager Semantic Interpretation, (II) DM Comparison operation

and (III) DM Parameter Determination & Prompt Generation. Variables of this sys-

tem are: action, context node, expected user’s proficiency, observed user proficiency,

input, prompt, and control differential. These variables are explained below.

110

Fig. 3. Block diagram of an FCS-based Mixed Initiative Dialog Manager

The process starts when the system observes the user’s input and based on the ex-

isting current context it executes a specific action. We assume that the system main-

tains at each moment an expected level of user performance based on previously

obtained user responses. The DM executes a comparison between this expected level

of response and the actual observed response and computes the difference of these

two parameters (we assume that the system is able to map the complexity of a user’s

utterance into a numeric quantity). The difference or deviate is sent to the DM-

Parameter Determination prompt, which generates a new prompt and establishes a

new context depending on the predicted level of proficiency of the user after deter-

mining which attribute or set of attributes should be obtained next from the user.

Ideally the DM will toggle between conversational and directed dialog modalities

based on the predicted user’s performance.

In contrast to the approach described by Gorin in [2], our approach does not try to

directly find a most-likely action based on the speaker’s utterance and topic or action

likelihoods. Instead, the DM tries to populate the attributes of the domain’s data

model and assign them values in an efficient way. To do this the system constantly

tracks the context of the dialog and establishes the best modality of interaction. Also,

in contrast to Gorin’s approach, this system does not follow a probabilistic (or maxi-

mum likelihood) formulation: the values of the system’s variables are measurements

of the observations and no probabilistic model is assumed. This is true not only for

the DM-Semantic Interpretation block, but also for the DM-Parameter Determination

& Prompt Generation block. In the next section we will tie this basic idea to a Do-

main Hierarchy model of the task.

5 Domain Hierarchy Predictive Traversal for Mixed Initiative

In essence, the goal of a domain hierarchy-based Dialog Manager is to aid the tra-

versal process of the domain hierarchy tree. When sufficient terminal nodes to exe-

cute a query or transaction are collected, the DM triggers the appropriate query or

transaction. In a directed dialog application, the tree is traversed as a predefined

DM-

Semantic

It t

DM-

Parameter

Determination &

Prompt

Gti

action,

leaves-values

control differential (error)

context,

expected level of user’s input

DM-

Comparison

111

sequence of terminal nodes (for example, in VXML the FIA (form interpretation

algorithm) is responsible to traverse the forms of the application and does so in a

deterministic way). The intermediate nodes serve to establish a hierarchy of leaves

(e.g., in a VXML document the terminal nodes are the fields and the intermediate

nodes the forms). In a mixed initiative task the dialog manager can be thought of

traversing the tree spanning both terminal nodes and non-terminal nodes depending

on the user’s input, allowing in this way for the collection of multiple tokens through

a single utterance (i.e., the user might take the initiative to include in its input more

that one token, or information not pertaining to the active node). We explain here

how a dialog manager can promote higher levels of interaction by avoiding terminal

nodes when a certain level of expertise or control by the user is inferred in a FCS type

of framework.

5.1 FCS based Predictive Tree traversal Algorithm

We now describe a simple method that incorporates the concept of FCS applied to

dialog management described in section 4.2 with the task hierarchy DM discussed in

section 5.1. In essence, the purpose of the DM, as we have mentioned, is to both

present the context (that in our case is associated with a node in the domain tree) and

determine the prompt to be presented to the user. The resulting utterance will be used

to: (a) populate pertinent leaves of the tree and (b) judge the level of proficiency of

the user in the application updating the context and the interaction level with the

user.

This process is iterated until a sufficient set of leaves are populated. We now pre-

sent a basic algorithm which implements a DM that operates on a domain tree, and

based on a FCS framework traverses the domain in an adaptive way toggling be-

tween Directed Dialog and Mixed Initiative:

1- Start: Initiate the context to be root node. Assume an initial

neutral DM policy.

2- Play prompt based on context.

3- Receive user’s input. Perform DM-Semantic interpretation and

populate relevant leaves.

4- Perform DM-comparison of expected vs. observed user’s profi-

ciency, adjust the deviate parameter.

5- DM-Parameter determination: Adjust DM-policy, update context.

6- If a sufficient set of leaves is populated then stop, otherwise

go to 2.

The algorithm above is a simple implementation of the process taking place in fig-

ure 3. Because the first prompt is always the same, the initial DM strategy is set to be

“neutral” which is not a third DM policy, but instead, denotes the undefined state of

the DM policy. After the first user’s utterance is observed, the DM can establish the

proficiency and move to determine whether directed dialog or mixed initiative should

be used. The value of deviate parameter can be set according to the following policy:

112

if observed_proficiency > expected_proficiency

then deviate = +1

else,

if observed_proficiency = expected_proficiency

then deviate = 0

else

deviate = -1

end.

Therefore, instead of allowing state variable deviate, which tracks the error, to

have a continuous value, it is only permitted to have 3 values: {+1,0,-1}, which

loosely correspond to: better than expected, expected and worse than expected an-

swers, respectivelly.

The policy to determine whether the observed proficiency is larger, equal to or

smaller than the user’s proficiency is described in the following policy table:

This condition is true When the following situation is observed:

observed_proficiency

> expected_proficiency

The system is in Directed Dialog and system capbures two or more

leaf (terminal nodes) values

observed_proficiency

= expected_proficiency

The system is in Directed Dialog and there is only one leaf filled, or,

The system is in Mixed Initiative and all prompted leaves are filled

observed_proficiency

< expected_proficiency

The number of leaves filled by the DM-Semantic Interpretation block

under the prompted context is either one, zero or less than the number

of leafs prompted in last interaction

The algorithm described in this section can be combined with a prompt scaffolding

strategy of the type discussed previously. Similarly to the Dialog Manager, a Prompt

Manager strategy can be incorporated into the FCS algorithm describe above. To do

this an additional error variable that tracks the response to various levels of verbosity

can be added. For simplicity, we avoid here introducing such variable.

6 Examples

In this section we provide examples of the FCS-based Predictive Tree traversal al-

gorithm described in Section 5.1. We illustrate the parameter value setting portion of

the algorithm in the context of dialog interactions. We refer to the Car Rental domain

illustrated in Figures 1 and 2. Table 1 and 2 below show interactions between a User

(U) and a System (S) that implements the algorithms described in section 5.1, and on

the Car Reservation subdomain introduced previously in this paper. Column 1 shows

each turn’s system prompts and user response and the other columns show the values

the dialog manager components assign to system variables.

Table 1 illustrates the scenario where the user is generally proficient in the domain

and traverses the domain primarily in mixed initiative modality. At the third re-

sponse, however, the user returns less than then expected number of attributes and the

deviate variable is set to -1. This switches the conversation into Directed Dialog

113

which produces a fourth dialog turn query specific for the missing attribute. Table 2,

illustrates an interaction where the user mostly interacts in directed dialog. In the last

turn the user provides a multiple-attribute utterance, allowing for the completion of

the query in that utterance. Not described is the execution of the transaction.

Table 1. Example of a interaction with a Predictive Tree Traversal-based DM

Table 2. Example of a interatcion with a Predictive Tree Traversal-based DM

Prompt/Response Context Leaf Semantic

Interpretation:

DM System

Parameters

S: Welcome. How may I help you?

context=

1,1

U: I’d like to rent a car in Pittsburgh’s

airport

action=rent

pckp_loca=PGH

modality= -

deviate=+1

S: What are the pickup date and pickup

time for this rental?

context=

3,1

U: Tomorrow at noon. pckp_date=today+1

pckp_time=noon

modality= MI

deviate=0

S: Please tell me the return information?.

context=

4,2

U: Same location, the morning after ret_location=PGH

ret_date =today+2

modality= MI

deviate=-1

S: What is the return time for this rental?

context=

4,1

U: Ten A. M. return_loca=PGH

modality= DD

deviate=0

S: I will reserve a car for tomorrow noon

in Pittsburgh International Airport

context=

completed_tree

Prompt/Response Context Leaf SI. DM Params.

S: Welcome. How may I help you?

context=

1,1

U: [silence]

modality= -

deviate=+1

S: Please choose one of the following: new

reservation, cancel, or information

context=

1,0

U: New Reservation. action=new

modality= DD

deviate=0

S: New Reservation. Let me ask you some

information about this reservation

context=

3,2

U: It’s for Pittsburgh international airport pckp_loca=PGH

modality= DD

deviate=-0

S: What are the pickup date and pickup

time for this rental?

context=

4,1

U: Tomorrow noon. And the return is the

day after, same location, at noon.

pckup_time=noon

ret_loca=PGH

ret_date=today+2

ret_time=noon

modality= DD

deviate=+1

S: I will reserve a car for tomorrow noon

M in Pittsburgh International Airport

context=

completed_tree

114

Conclusions

We presented in this paper a novel algorithm intended to implement a Mixed Initia-

tive Dialog Manager for conversational applications. Our technique uses the task

hierarchy description of a conversational domain. Prompts for the Directed Dialog

modality are incorporated into the terminal nodes of this hierarchy. Prompt scaffold-

ing, aimed to support self revealing help, can be easily integrated into this framework.

Thus we demonstrated that a tree representation of the domain is a versatile represen-

tation as it easily supports mixed initiative and directed dialog.

Furthermore, we presented a mechanism that based on the task hierarchy representa-

tions implements an adaptive Mixed Initiative interaction. A Mixed initiative dialog

is the result of simultaneously enabling Directed Dialog and Multi-token interaction.

Based on a feedback control framework our method predicts the interaction modality

that best fits the user abilities in every given part of a transaction based on the imme-

diate history of the dialog and other parameters of the interaction. We illustrated this

mechanism with a basic car rental application. The combination of a predictive

framework (FCS) and a tree representation of the hierarchical domain constitute a

concise and flexible framework for developing Mixed Initiative systems.

References

1. AT&T “Need Help? Ask Allie ™” Web Application.

2. Gorin, A. L., ``Spoken dialog as a feedback control system,'' Proc. ESCA Workshop on

spoken dialog systems, Visgo, Denmark, May 1995, pp. 173-176

3. Gorin, A. L., Levinson, S. E. and Sankar, A., ``An experiment in spoken language acquisi-

tion,'' IEEE Trans. Speech and Audio, vol. 2, no. 1, Part 2, Jan. 1994, pp. 224-240.

4. Hochberg J. , Kambhatla N., Roukos Salim, “A Flexible Framework for Developing

Mixed-Initiative Dialog Systems”, 3rd Sigdial Workshop on Discourse and Dialogue,

2002

5. Huerta “Graph based representations and techniques for NLU Application Development”

Proc. ICASSP 2003, Honk Kong.

6. Louwerse, M. Graesser A. Olney A. “Good Computational Manners”, AAAI Symposium

Technical Report, November 15-17, 2002, North Falmouth, Massachusetts

7. A.L. Gorin, G. Riccardi and J.H. Wright How May I Help you Speech Communication,

vol. 23, pp. 113-127, 1997

8. W3C, VXML 2.0 http://www.w3.org/TR/voicexml20/

9. The MIT Galaxy System http://www.sls.csail.mit.edu/GALAXY.html

10. Luo X., Papineni K, “IBM Darpa Communicator V 1.0”, Darpa Communicator Principal

Investigators Meeting, Philadelphia USA 2000.

11. Wright Jerry, Allen Gorin, Alicia Abella “Spoken Language Understanding within Dia-

logs Using a Graphical Model of Task Structure”, Proc. ICSLP, Sydney 1998.

12. Meng, Wai and Pieraccinni, “The Use of Belief Networks for Mixed Initiative dialog

Modeling” ICSLP 2000, Beijing China.

13. Guzdial, M. Software-realized Scaffolding to Facilitate Programming for Science Learn-

ing. Interactive Learning Environments, Vol. 4, No. 1, 1995, 1-44.

14. W3C, “XForms” http://www.w3.org/MarkUp/Forms/

115