ON THE USE OF AN ON-LINE FREE-TEXT SCORING
SYSTEM INDIVIDUALLY OR COLLABORATIVELY
Diana Pérez-Marín, Ismael Pascual-Nieto and Pilar Rodríguez
Computer Science Department, Universidad Autónoma de Madrid
Francisco Tomás y Valiente, 11, 28049, Madrid, Spain
Keywords: Computer Assisted Assessment, Collaborative learning, Natural Language interface, e-Learning.
Abstract: Willow is an adaptive web-based application that allows students to review course material. The system
analyzes the students’ free-text answers providing immediate feedback to the students. In the past, Willow
has been used by individuals working alone. However, the trend of improving learning performance by
allowing students to cooperate inspired us to develop a collaborative version of Willow. Our hypothesis was
that students working together can reach understanding of ideas better than working individually with
Willow. Therefore, in this paper, we explore the collaborative use of the system. We describe from a
computer-science perspective, the minimum changes that have to be done to the system in order to permit a
collaborative review. Furthermore, we provide the preliminary results of an experiment in which 22 students
were given the possibility of using the individual or collaborative version of Willow.
1 INTRODUCTION
Automatic assessment of open-ended questions has
been studied since the sixties (Page, 1966). In spite
of the critics that the idea of automatically
generating students’ free-text answers has received
(Hearst, 2000), the progress made in the Natural
Language Processing (NLP) field, has made possible
the uptake of computer-based free-text scoring.
The goal of many of the currently available free-
text scoring systems is not to replace the teachers,
but to complement them. Automatic free-text scorers
can serve as double-checkers of the human scores, or
to provide more training to the students before their
exams (Valenti et al., 2003).
The core idea is usually to compare the student
answer to a set of correct answers provided by the
teachers, or other type of reference material (books,
Internet, etc.). The more similar they are, the higher
the score provided by the system is. Nevertheless,
given that the goal is not to replace the teacher score,
but to train the students to pass the final exam giving
them more possibilities to review, it is usually only
an orientative score.
In fact, typical feedback pages of free-text
scoring systems include not only the numerical
score, but also comments indicating the strong and
weak aspects of the answer (according to the
comparison performed by the system between the
student answer and the teachers’ correct answers).
Some free-text scoring systems that include
these possibilities are: E-tester (Guetl et al., 2005),
the extension of Didalect (Hermet & Szpakowicz,
2006), or SPEBC (Aguilar & Kaijiri, 2007).
However, these systems have traditionally been
designed to be only used individually.
Given the trend in e-learning systems to foster
collaborative work, we think that it could also be
interesting to have automatic collaborative free-text
scoring systems.
In this paper, we present how the Willow free-
text scoring system (Pérez-Marín et al., 2006), that
we have developed, can be used not only
individually but collaboratively. Our hypothesis is
that students working together can reach
understanding of ideas better than working
individually with Willow.
To test this hypothesis, we performed an
experiment with 22 volunteer non-technical students
in a lab using the system, 12 working individually,
and 10 in small groups.
We then analyzed the logs gathered to find out
how the students differed in their use of the system.
It has been noted that although in collaborative use
the students are able to answer less questions, this is
because they take longer to answer each question,
501
PÃl’rez-Marà n D., Pascual-Nieto I. and Rodrà guez P.
ON THE USE OF AN ON-LINE FREE-TEXT SCORING SYSTEM INDIVIDUALLY OR COLLABORATIVELY.
DOI: 10.5220/0001820305010506
In Proceedings of the Fifth International Conference on Web Information Systems and Technologies (WEBIST 2009), page
ISBN: 978-989-8111-81-4
Copyright
c
2009 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
and we could observe that this extra time was spent
discussing the concepts involved in the question,
which is beneficial and, thus supports our initial
hypothesis.
This paper is organized as follows: Section 2
focuses on the benefits of collaborative work;
Section 3 describes the automatic and adaptive
individual free-text scoring system; Section 4 details
the procedure to transform the individual version of
Willow to its collaborative version from a computer-
science perspective; Section 5 reports the
experiment performed and the results found; and,
finally Section 6 provides the main conclusions of
the paper together with some lines of future work.
2 BENEFITS OF
COLLABORATIVE LEARNING
As defined by Gokhale (1995), collaborative
learning refers to an instruction method in which
students at various performance levels work together
in small groups toward a common goal.
According to Vygotsky (1978), students are
capable of performing at higher intellectual levels
when they are asked to work collaboratively than
when they work individually.
Moreover, students have declared that the
computer environment facilitates collaboration, and
a research study carried out with 48 technical
university students have concluded that collaborative
learning fosters the development of critical thinking
through discussion, clarification of ideas, and
evaluation of others' ideas (Cicognani, 2000).
There is also evidence that cooperative teams
retain information longer than students who work
individually (Johnson & Johnson, 1986); and, that
the shared learning gives students an opportunity to
engage in discussion, and take responsibility for the
others and their own learning (Smith et al., 2005).
It can also be highlighted the benefits of using
distributed student models (Puntambekar et al.,
2003; Zapata-Rivera & Greer, 2001; Rueda et al.,
2004), which have been extensively studied when
the models are represented using concept maps,
according to the principles of the Ausubel’s
Meaningful Learning Theory (Ausubel, 1963).
Some applications that are able to manage group
student models represented using concept maps are:
COMPASS (Puntambekar et al., 2003), ConceptLab
(Zapata-Rivera & Greer, 2001), and DynMap+
(Rueda et al., 2004).
COMPASS (Puntambekar et al., 2003) is an
Adaptive Educational Hypermedia System. It
supports the assessment as well as the learning
process. Each concept is displayed on a separate
page, and the user is allowed to navigate between
the concepts using textual hyperlinks.
COMPASS student models are very simple only
based on their navigation behavior: which concepts
they have visited and in what order. These student
models can represent individuals or groups.
ConceptLab (Zapata-Rivera and Greer, 2001) is
a knowledge construction and navigation system that
uses XML-based concept maps to represent the
student’s view of the domain. It has three main
goals: to assess the student’s knowledge (it can be
done by comparing different maps visually or
through queries), to determine problems in the
learning process of a student or a group of them and
to promote reflection among a group of students in a
topic. The student model is based on a bayesian
network and a concept map.
According to the authors, the concept map has
been included as part of the student model in order
to facilitate sharing of knowledge among students
and assessment of students’ knowledge by teachers.
The concept maps are collaboratively built by the
students who can be helped by a guide concept map.
By clicking on a particular concept, it is possible to
access a variety of links, added by the teacher or
classmates, related to the concept of interest.
The knowledge built with ConceptLab can be
represented with the VisMod system (Zapata-Rivera,
2004). VisMod allows students and teachers to
experiment with the creation of Bayesian what-if
scenarios; providing not only a visualization tool,
but also an interactive tool for inspection of and
reflection on Bayesian student models.
DynMap+ (Rueda et al., 2004) is a graphical tool
to display the student model as a concept map.
Students introduce the concept map in the computer
using the Concept Map Editor provided. DynMap+
can show models not only of individuals but also of
groups. Both are overlay models that can be shown
to students and instructors.
The general purpose of showing the map to
instructors is to provide them with a view of the
knowledge and evolution of the students. The
general purpose of showing the map to students is to
foster reflective thinking about their own learning.
WEBIST 2009 - 5th International Conference on Web Information Systems and Technologies
502
Figure 1: A snapshot of Willow’s interface.
Figure 2: A snapshot of Willow’s collaborative interface.
3 WILLOW
Willow is a free-text Adaptive Computer Assisted
Assessment (ACAA) system (Pérez-Marín, 2007).
That is, it is able to automatically and adaptively
assess students’ answers written both in Spanish and
English languages. See Figure 1 for a snapshot of
Willow’s interface.
As can be seen, Willow’s interface follows a
dialogue metaphor in which Willow is represented
by an owl, and the student can choose one avatar
from several available to represent himself/herself.
ON THE USE OF AN ON-LINE FREE-TEXT SCORING SYSTEM INDIVIDUALLY OR COLLABORATIVELY
503
The question asked by Willow is one of the set
previously introduced by the teachers according to
the student’s level (as assessed by the system). For
this reason, the teachers also need to provide a level
of difficulty for each question: easy, medium or
difficult.
Additionally, students are promoted or demoted
a level based on their answers to a set of questions,
and thus teachers need to specify the percentage of
questions answered correctly or incorrectly which
lead to the student being promoted or demoted a
level of difficulty.
The system does not make assumptions as to the
student’s level of knowledge, and thus the first time
a student logs into the system for a particular lesson,
s/he is presented with questions of low difficulty.
When the student correctly answers the specified
percentage of questions, s/he is promoted to a higher
level of difficulty. Similarly, if the student fails a
sufficient proportion of the questions, s/he is
demoted to a lower level, and will thus receive
easier questions.
By keeping questions at a level the student can
handle, without being too easy, the level of
engagement of the student with the system is
maximized.
Whenever a student answers a question in
Willow, s/he is presented with immediate feedback.
If s/he passes the question, s/he is shown the
feedback page, and the question is recorded as
correctly answered. Otherwise, the system tries to
help the student to pass the question with a set of
clarification questions. If, even with this help, the
student is not able to pass, the feedback page is
displayed, and the question is marked to be asked
later.
Furthermore, Willow keeps track in the students’
answers of a set of concepts indicated by the
teachers to automatically generate each student
individual concept map, and the class concept map
as shown in Figure 2. The procedure is described in
detail in Pérez-Marín (2007).
4 FROM INDIVIDUAL TO
COLLABORATIVE
The individual version of Willow asks each student
to create an account the first time that s/he logs into
the system. That way, Willow is able to keep track
of how each student answers the questions, to
automatically update each student model and to
choose the most suitable question for him or her.
Therefore, the first change to allow the
collaborative use of Willow (and, in general of any
free-text scoring system) is to allow the creation of a
group account. That is, an account that represents
not only one student but a group of them. All
members of the group can have the same user and
password. Groups can be formed using:
Self-selection: students themselves choose the
members of the group. Thus, they have to
introduce their names into the computer when
registering within the same group account.
Random assignment: the free-text scoring system
randomly chooses the number of students
indicated to create the group account. Students
can be notified by an automatically generated
mail of the names of their group mates.
Criterion-based selection: a criterion to group the
students is decided by the teacher and introduced
into the free-text scoring system. That way, the
system only chooses the students who meet the
criterion. As before, students can be notified by
mail of their group mates.
In Willow, we have implemented the self-
selection option. It is because we did not want to
randomly group the students as we consider too
important the choice of group mates to leave it at
random decision. On the other hand, we did not want
to ask the teachers to think about a criterion to group
the students, and they did not provide one
voluntarily.
A minimum and maximum numbers of students
per group should also be established. We have fixed
the minimum as 2 to permit pairs, and the maximum
as four to maximize the possibility that all students
contribute to the discussion with their own ideas.
Moreover, students can be given the possibility
of choosing to work simultaneously, or at different
times. In any case, even if they work at different
times, given that the system stores the questions
already passed, and the previously given answers,
students can read the answers of their mates, and
look at the feedback generated for them.
When the students work together
simultaneously, the interface should be distributed
so that it permits all students to read together the
question. As can be seen in Figure 1, only one
question is presented each time (in the case of
Willow it is not a change from the individual use of
the system as the interface was already like that to
focus all the students on the same question).
It is advisable to record in a log how long the
students take to answer the question, until they ask
for feedback to find out the time they have devoted
WEBIST 2009 - 5th International Conference on Web Information Systems and Technologies
504
to discuss the most adequate answer to the question
(it cannot be guaranteed, when the students are at
home, that they have not devoted that time to any
other task, but in general it is a good indicator).
It is expected that all of the students contribute
with their answers. In Willow, we have not
implemented any penalization factor in case that the
teacher observes that one of the students has not
participated, because it was the first time that the
individual version was being transformed to a
collaborative version. On the other hand, a role
management procedure has been used in the free-
text scoring system.
This procedure assigns a role to each member of
the group so that all of the students have an specific
work to do. For instance, one student can be the
reader of the question provided by the system, a
second student can be in charge of typing the agreed
answer, a third student can read aloud the feedback
generated by the system, and a fourth student can act
as a moderator and coordinator of the group.
That way, none of the students has the same role
than any other student. Provided that there are not
enough students to create 4-student groups, then the
same student can do more than one of the previously
mentioned tasks. The roles should be changed each
time that they log into the system. Similarly, if in the
previous session, two roles were assigned to the
same student, it should be implemented a restriction
that prevents the same student to take so many roles
in the next session.
5 THE EXPERIMENT
In the 2007-2008 academic year, we asked a group
of 45 students of the English Studies degree at our
university to voluntarily use Willow to review their
Pragmatics course, in their own time.
The teachers of the subject introduced 4 lessons
with 49 questions and an average of 3 correct
answers per question. They encouraged their
students to use the system as a means of reviewing
the course. In total, 22 students (49%) agreed to take
part in the experiment.
First of all, before they began using the system,
the students were asked whether they prefer to
review the course alone or in group. Of the 22
students, 12 (55%) opted for individual use while 10
worked in groups (2 groups of 3, and 2 of 2). To
avoid influencing the manner of use of the system,
only a brief (5 minute) introduction to Willow was
given.
Although Willow is an on-line system, students
were asked to work during one class (50 minutes) in
the lab, because it was the first time that the
collaborative possibility was used, and we wanted to
directly observe the students.
In fact, observing the students using the system
was very useful as we noted that students working
collaboratively got more involved than students
working individually.
Students who were in a group discussed how to
answer each question and, upon receiving feedback,
they discussed it among themselves.
Moreover, as Willow has a log system that
registers how long each student or group of students
have been working on each question, and how many
questions have answered, we could analyze the logs
gathered to contrast what we have perceived with
the data registered by the computer.
Willow’s logs confirmed our observations.
Students working collaboratively answered fewer
questions than students working alone. In fact, the
average number of questions answered by individual
students was 25, while the average number of
questions answered by the group students was 8.
This is because the groups have indeed spent
more time on each question. While a student
working individually on average has taken around
two minutes to answer a question and review the
feedback, students working collaboratively have
spent on average 6 minutes. In fact, in one case, a
group spent 11 minutes just on one question.
6 CONCLUSIONS AND FUTURE
WORK
This paper has contrasted collaborative and
individual use of an automatic and adaptive free-text
scoring system. Our goal was to discover whether
students review better working individually or
collaboratively with an on-line free-text scoring
system.
We devised the list of changes necessary to
implement the collaborative version of Willow. All
of them were quite simple to apply: to permit the
creation of group account, to establish a minimum
and maximum number of students per group, and to
check that only one question is shown at each time.
Previous experiments with the system have all
involved individual use. In the present experiment,
with students from a Pragmatics course at our
university during the 2007-2008 academic year, both
collaborative and individual use was allowed. 22
ON THE USE OF AN ON-LINE FREE-TEXT SCORING SYSTEM INDIVIDUALLY OR COLLABORATIVELY
505
students participated in a session during one of their
lessons in a computer lab. 12 students worked
individually and 10 worked in groups of 2 or 3.
We observed that students working alone spent
less time on each question than students working in
groups and thus, they could complete more
questions.
On the other hand, it was also noticed that the
collaborative use of the system increased the level of
reflection given to the construction of each answer,
and also led to discussion of the generated feedback
reports. These results support our initial hypothesis
that collaborative use of Willow is beneficial.
As future work, we plan to repeat the experiment
with more students in order to do a more complete
evaluation. We would also like to analyze whether
the collaborative versus individual use of Willow
has an impact in the final exam scores.
Provided that the results achieved are
satisfactory, our plan is to devise a new extension of
Willow, which allows new communication
possibilities between groups of students. For
instance, an on-line chat or forum to talk about how
to answer a certain question, or whether they agree
with the automatic feedback provided by the system.
ACKNOWLEDGEMENTS
This work has been sponsored by Spanish Ministry
of Science and Technology, project TIN2007-64718.
REFERENCES
Aguilar, G., Kaijiri, K., 2007. Design overview of an
adaptive computer-based assessment system.
Interactive Educational Multimedia, 116-130.
Ausubel, D. (1963), The Psychology of meaningful verbal
learning, New York: Grune and Stratton.
Cicognani, A., 2000. Concept Mapping as a Collaborative
Tool for Enhanced Online Learning, Educational
Technology & Society 3(3), 150-158.
Gokhale, A., 1995. Collaborative learning enhances cri-
tical thinking, Journal of Technology Education 7(1).
Guetl, C., Dreher, H., Williams, R. 2005. E-tester: A
computer-based tool for autogenerated question and
answer assessment. E-Learn, AACE.
Hearst, M., 2000. The debate on automated essay grading.
IEEE Intelligent Systems 5(15), 22-37.
Hermet, M., Szpakowicz, S., 2006. Symbolic assessment
of free text answers in a second language tutoring
system. Proceedings of the 10th Computer Assisted
Assessment conference, Loughborough University.
Johnson, R., Johnson, D., 1986. Action research:
Cooperative learning in the science classroom. Science
and Children 24, 31-32.
McCafferty, S.; Jacobs, G. & Iddings, A. 2006,
Cooperative Learning and Second Language
Teaching, Cambridge University Press.
Page, E., 1966. The imminence of grading essays by
computer. Phi Delta Kappan 47(1), 238-243.
Pérez-Marín, D., Alfonseca, E., Rodríguez, P. & Pascual-
Nieto, I. (2006). Willow: Automatic and adaptive
assessment of students free-text answers, in
Proceedings of the 22nd International Conference of
the Spanish Society for the Natural Language
Processing (SEPLN), 367-368.
Pérez-Marín, D. 2007. Adaptive Computer Assisted
Assessment of free-text students’ answers: an
approach to automatically generate students’
conceptual models, PhD thesis, Universidad
Autónoma de Madrid, Spain.
Puntambekar, S.; Stylianou, A. & Hubscher, R. 2003.
Improving Navigation and Learning in Hypertext
Environments With Navigable Concept Maps,
Human-Computer Interaction 18(4), 395--428.
Smith, K.; Sheppard, S.; Johnson, D. & Johnson, R.
(2005), 'Pedagogies of Engagement: Classroom-Based
Practices', Journal of Engineering Education 94(1),
87-101.
Rueda, U.; Larrañaga, M.; Elorriaga, J. & Arruarte, A.
2004, Validating DynMap as a mechanism to visualize
the student's evolution through the learning process,
Lecture notes in computer science 3320, 864-866.
Valenti, S., Neri, F., Cucchiarelli, A., 2003. An Overview
of Current Research on Automated Essay Grading.
Journal of Information Technology Education 2.
Vygotsky, L., 1978. Mind in society: The development of
higher psychological processes, Cambridge: Harvard
University Press.
Zapata-Rivera, J. & Greer, J. 2001. Externalising Learner
Modelling Representations, Proceedings of Workshop
on External Representations of AIED: Multiple Forms
and Multiple Roles, 71-76.
Zapata-Rivera, J. 2004. Interacting with Inspectable
Bayesian Student Models, International Journal of
Artificial Intelligence in Education 14(2), 127-163.
WEBIST 2009 - 5th International Conference on Web Information Systems and Technologies
506