proposed adaption schemes, we defined a utility
function to reflect the efficiency of each chosen
question. Specifically we considered the following
adapting schemes: (i) Q-learning (Sutton and Barto
2005, Rusell & Norvig 1995, Martin and Arroyo,
2004); (ii) Virtual Learning (Vreind, 1997); (iii)
Temporal Reasoning (Beck, Woolf and Beal 2000) ;
(iv) DVRL (Azoulay-Schwartz et al., 2013); (v)
Bayesian Inference (Conitzer and Garera, 2006); and
(vi) Gittins Based Method (Gittins, 1989). Most of
the algorithms have not been used for ITS, excluding
the Q-learning method which has been used for ITS,
as discussed in Section 2.
As this is a groundwork research to identify the
best adaptation scheme, at this stage of the study we
developed an artificial environment by simulating
students with ability distributions derived from a
normal distribution.
The performance of the various proposed meth-
ods were benchmarked with the optimal perfor-
mance of the system assuming the user model (abili-
ties) is completely known to the ITS. The results
show that the method that outperformed the others in
most of the environments we considered is based on
a Bayesian Inference which achieved more than 90%
of the optimal performance.
This paper is organized as follows. In Section 2
we provide a review of the current state of the art
methods used for choosing questions in ITS. In Sec-
tion 3 we present the ITS model, including the utility
function used in this study. In Section 4 we describe
the various adaption schemes, and present a detailed
description of the Bayesian inference algorithm
which we provide in Section 5. In Section 6 we de-
scribe the construction of the artificial environment
used for the evaluations and in Section 7 we present
the simulation results. Finally, we conclude and
discuss directions for future work in Section 8.
2 RELATED WORK
It is well known that a student learns much better by
one-on-one teaching methods than by common
classroom teaching. An Intelligent Tutoring System
(ITS) is one of the best instances of one-on-one
teaching (Woolf, 2009) that uses technology devel-
opment. A student who is supposed to learn a certain
topic by means of an ITS is assumed to do so by
solving problems given to him by the ITS.
The ITS evaluates a given answer by comparing
it to the predefined answer as it appears in its
knowledge base. The system keeps track of the us-
er's actions, and correspondingly builds and con-
stantly updates its student model. Moreover, it ob-
serves the topics that need more training and selects
the next question accordingly.
In this paper, we consider the way the student
model will be represented, how it will be used to
find the student's subsequent goals, and how it
should be updated according to the student's results.
In the current section, we survey several ITS sys-
tems that also contain a learning process to adapt to
the student's abilities.
Martin and Arroyo (Martin and Arroyo, 2004)
used Reinforcement Learning agents to dynamically
customize ITS systems to the student. The system
clusters students into learning levels, and chooses
appropriate hints for each student. The student’s
level is updated based on the answers they enter or
the hints they ask for. Their best success was by
using the e-greedy agent (e=0.1). Following Martin
and Arroyo, we also used a Q-learning algorithm
where the probability of trying a non-optimal level,
was fixed at 0.1.
Iglesias et al. (Iglesias et al., 2008) proposed a
knowledge representation based on RL that allows
the ITS system to adapt the tutoring to students’
needs. The system uses the experience previously
acquired from interactions with other students with
similar learning characteristics. In contrast to Iglesi-
as et al., in our work the learning process is done
individually for each student, in order to learn the
level of each student.
Malpani et al. (Malpani, Ravindran and Murthy,
2009) present a Personalized Intelligent Tutoring
System that uses Reinforcement Learning techniques
to learn teaching rules and provide instructions to
students based on their needs. They used RL to teach
the tutor the optimal way of presenting the instruc-
tions to students. Their RL has two components, the
Critic and the Actor. The Critic follows Q-learning,
and the actor follows a Policy Gradient approach
with parameters representing the preference of the
choosing actions.
Sarma et al. (Sarma and Ravindran, 2007) devel-
oped an ITS system using RL to teach autistic stu-
dents, who are unable to communicate well with
others. The ITS aimed to teach pattern classification
problems. The student has to classify the pattern
(question) given. This classification is used to vali-
date an ANN, but does not teach real children. The
pedagogical module used in (Sarma and Ravindran,
2007) selects the appropriate action to teach students
by updating Q-values.
Finally, Beck et al. (Beck, Woolf and Beal,
2000)
constructed a learning agent that models the
student behavior in the ITS. Rather than focusing on
ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence
246