own cognitive performance) lead to better manual se-
lection of appropriate learning resources. Although
(Baker, 1989) argues that the self-assessments are
skewed, the assessments still capture an overall trend.
The search queries for retrieving learning objects
are primarily constructed by the learners themselves
in search of new information. This means in prac-
tice that learners will frequently default to formulat-
ing search queries that yield simplified tutorial-style
resources. This is understandable considering the in-
formation overload and the temptation of ‘sticking to
what you know’, but it does create challenges for self-
directed learning approaches. Ideally the quality and
difficulty of resources will improveas a learner’s level
of understanding increases, short-lived fact-finding
queries are both effective and easy which may keep
the learner contained within a community of begin-
ners instead of slowly migrating towards a community
of experts. The amount of effort required by learners
to construct search queries for high quality resources
which support self-directed learning may prove to be
too cumbersome to maintain in the long term.
We would therefore like to automatically steer
learners towards resources that are both relevant and
slightly challenging such that they go beyond fact-
finding and move towards increased understanding of
the domain. This approach however requires an accu-
rate and up-to-date model of the subjects of interests
of a learner and an estimate of the current level of un-
derstanding of each subject. It is likely that the learner
will be unable to provide much detail on the concep-
tual decomposition of the difficulties he or she en-
countered when trying to understand certain learning
objects. Moreover, requesting too much additional in-
formation from a learner is likely to disrupt the exist-
ing workflow which in turn creates additional bound-
aries for adoption of this approach. Luckily present
day interaction using social networks and search en-
gines allows us to acquire a huge number of sim-
ple learner self-assessments. Each individual self-
assessment by itself may be skewed or wrong, but
generalizing from a larger collection will yield stable
trends. Naturally these trends will change over time
as the learner progresses which means that older self-
assessments should be properly discounted.
The aggregation of self-assessments needs inte-
grate well within a learner’s existing workflow and
should be simple and easy to use. A suitable candi-
date would be the 5-star rating process that is already
familiar to learners on the Internet which can be repur-
posed to capture a simplistic summary of a learner’s
assessment of a learning object. The advantage of us-
ing this type of simple and unspecific feedback is that
it takes very little effort on the learner’s side, which
increases the chances of the learner actually providing
enough feedback. The feedback could for example be
a simple likert scale which ranges from: 1 (too easy),
3 (just right), to 5 (too difficult).
The approach assumes that a learner is able to
judge whether a specific learning resource is too com-
plicated, but is unable to explain why. Only a maxi-
mally simplistic self-assessment is required from the
learner that can be provided with a single mouse-click
for each resource. Taking such a minimalist approach
with respect to the feedback providedby learners min-
imizes the amount of additional effort required from
learners which increases the likelihood of learners
providing a large number of such resource feedbacks.
A computer-based machine learning approach al-
lows us to analyze large amounts of data from each
learner without much effort. Machine learning can
be employed to automatically find complex patterns
in that data collection. Machine learning allows us to
build a model that links topics of interest to subjec-
tive levels of understanding. The system can then use
this model to predict the most likely self-assessment
for a new resource for a new particular learner. This
model, which can be automatically learned from the
self-assessments, can provide feedback which sup-
ports learners in their search for appropriate learn-
ing materials or can be used to recommend new re-
sources. The approach is largely data-driven and only
relies on the assumption that there is some level of
consistency in the learner provided feedback.
The rating of a resource as provided by the learner
says something about the two things that the docu-
ment is composed of: (1) The way the information in
the document is presented and structured (length of
sentences, clarity of the language, ...) and (2) The in-
formation in the document itself; a number of topics.
At present we are not addressing (1) which, although
important, is about readability measures (Crossley
et al., 2007). Incorporating a readability measure will
allow the system to differentiate between text read-
ability and conceptual complexity.
4 DECOMPOSING LEARNING
OBJECTS
The learners provide feedback at the document level,
and not separately for each of the individual subjects
covered in a particular document. In order to de-
termine a learner’s current level of understanding, it
is necessary to identify which subjects (topics) are
present in each document and what their relative pro-
portion is. Latent Drichilet Allocation (Blei et al.,
2003) (LDA) can be used to infer the distribution of
HIDDEN PATTERNS IN LEARNER FEEDBACK - Generalizing from Noisy Self-assessment during Self-directed
Learning
287