2 POMDP-BASED INTELLIGENT
TUTORING SYSTEMS
Intelligent tutoring systems have been developed as
useful teaching aids in areas including mathematics
(Woolf, 2009), physics (VanLehn et al., 2010), medi-
cal science (Woolf, 2009), and many others (Cheung
et al., 2003). Numerous students have benefited from
one-to-one, adaptive tutoring offered by ITSs.
Adaptive tutoring is the teaching in which a
teacher chooses optimal teaching actions based on in-
formation about student knowledge states. It is impor-
tant for an ITS to store and trace the state information.
The major components in an ITS include a student
model, a teaching model, and a domain model. The
student model is for storing and tracing information
of student states. In each tutoring step, the tutoring
agent accesses this model for information of the stu-
dent’s current state, then consults the tutoring model
with the state information for a tutoring strategy, and
then based on the strategy retrieves the domain model
to get the knowledge to teach.
When states are completely observable to the tu-
toring agent, we can use a Markov decision process
(MDP) to model adaptive tutoring. An MDP may
model a decision-making process in which the agent
knows exactly what the current states are, and can
choose actions available in different states to maxi-
mize rewards. However, in adaptive tutoring, student
states are not always completely observable. Thus the
MDP model has limitations when applied in building
ITSs. The partially observable Markov decision pro-
cess (POMDP) model, an extension of MDP, may be
more suitable.
Major parts of an POMDP includes a set of states,
a set of actions, a set of observations, a reward func-
tion, and a policy. In a decision step, the agent is in a
state. The decision is to choose an action that is avail-
able in the state and maximizes the reward. Such an
action is referred to as the optimal action. When the
agent does not know exactly what the current state
is, it infers information of states from the current ob-
servation, and represents the information as a belief,
which is a set of probabilities that the agent is in dif-
ferent states. Based on the belief, the agent uses the
optimal policy to choose the optimal action. As men-
tioned, the calculation to find the optimal policy is
referred to as POMDP-solving.
We can build an ITS by casting its components
onto a POMDP: The student model is mapped to the
state space, with each POMDP state representing a
student knowledge state; The tutoring model is im-
plemented as the policy, which is a function of beliefs,
returning actions.
At a point in a tutoring process, the agent is in a
state, which represents the current knowledge state of
the student. The agent does not have exact informa-
tion about the state, but has a belief about the states.
Based on the belief, the agent chooses and takes a tu-
toring action. The action causes the agent to enter
a new state, where the agent has a new observation.
Then the agent updates its belief based on the previ-
ous belief, the immediate action, and the new obser-
vation. And then it starts the next step of tutoring.
Since 1980’s, researchers have applied the
POMDP model to handle uncertainty in intelligent tu-
toring, and developed POMDP-based ITSs to teach
in different areas (Cassandra, 1998; Williams et al.,
2005; Williams and Young, 2007; Theocharous
et al., 2009; Rafferty et al., 2011; Chinaei et al.,
2012; Folsom-Kovarik et al., 2013). In the systems,
POMDPs were used to model student states, and to
customize and optimize teaching. In a commonly
used structure, student states had a boolean attribute
for each of the subject contents, actions available to
a tutoring agent were various types of teaching tech-
niques, and observations were results of tests given
periodically. Researchers agreed that computational
complexity of POMDP-solving in ITSs was a major
difficulty in developing practical systems (Cassandra,
1998; Rafferty et al., 2011; Folsom-Kovarik et al.,
2013).
3 RELATED WORK
Since the early years of POMDP research, it has
been a major topic to develop efficient algorithms for
POMDP-solving (Braziunas, 2003). In the follow-
ing, we first review the work to develop efficient algo-
rithms for “general” POMDP problems, then the work
in building POMDP-based ITSs.
The method of policy trees is a practical one for
POMDP-solving (Kaelbling et al., 1998). In this
method, solving a POMDP is to evaluate a set of pol-
icy trees and choose the optimal. In a policy tree,
nodes are labeled with actions, and edges are labeled
with observations. After an action, the possible ac-
tions at the next decision step are those connected by
the edges of observations from it. Each policy tree
is associated with a value function. In choosing an
optimal tree, the value functions of a set of trees are
evaluated. Policy tree value functions and their evalu-
ation will be discussed in more details in the next sec-
tion. As will be seen that the number of policy trees
and the costs for evaluating individual trees grow ex-
ponentially. To achieve better efficiency, researchers
have developed algorithms, some were related to the
CSEDU 2019 - 11th International Conference on Computer Supported Education
16