vantage, positive result Group C’s advantage) and the
“win rate” (Group P: 12, Group C: 20) was not sta-
tistically significant (sign test, p = 0.22). Further,
the proportion of solutions that passed all of the 11
tests given in the exam was practically equal for both
groups (Group P: 54.4%, Group C: 56.5%). Even
after a rigorous investigation, we were not able to
figure out a reason for the different behaviors be-
tween the Spring and Autumn experiments. Obvi-
ously, as the two course instances had different teach-
ers, there may have been some differences in teaching
emphases. However, despite lengthy discussions be-
tween the teachers, no clear causes were found.
4.3.2 Solutions Get Better after Switching on to
the Computer
In an earlier Rainfall study by Lakanen et al. (2015),
it was hypothesized that if the students had the op-
portunity to write their code using the computer, they
would be less prone to certain errors. In Figures 2 and
3, the success rates for the selected five tests (tests T1,
T2, T3, T5, and T31), are displayed. These five tests
stress on the important features of the problem. Four
out of the five tests were presented to the students in
the exam, while the fifth test (T31) was a “regular”
average The purpose of these bar charts is to high-
light how each group (Group P, Group C) succeeded
in the process
2
of moving from Assignment 1 to As-
signment 2. The striped bar presents the test success
rate for the paper-based phase of Assignment 1. The
white bars in each column present the pass rates af-
ter finishing the computer-based part of Assignment
1, and the black bars in each group present the pass
rates after completing Assignment 2.
As anticipated, the solutions written on paper
pass the fewest number of tests almost in each
group. Then, the pass rates increased—with some
rare exceptions—while reworking the solution on the
computer and using the test suite. Note, that the two
white bars in each column represent a similar situa-
tion after Assignment 1 from the students’ point of
view, and are, therefore, comparable between Group
P and Group C. The black bars are correspondingly
comparable. The two experiments are separated due
to the previous disparity in test passing rates and some
changes in the study setup (see Section 3.2). Again,
that the computer-based phase for Assignment 1 was
better justified in the Autumn exam, which unavoid-
ably improved the passing rates in the Autumn exam.
Further, the figures only display an average increase
(or decrease), but not how many students improved
2
Note that percentages cannot be higher than the total num-
ber of compiling solutions for each group.
and how many “worsen” their solution. Under almost
every test there were students who had reworked their
solution so that it did not pass a test in A2 that it
passed in A1, which in most cases was because A2
specified that in the case of zero rainfall the lower
limit should be returned. In its worst, this was the
case for 25% of the students in the test T11. Yet, while
even more students succeeded in passing the test, the
total difference in passing rate remains positive.
It is also somewhat surprising that providing the
test suite produced only a moderate increase in the
success rate. On average, Group P solutions for As-
signment 2 (A2) passed 15.7%-points (Spring) and
4.8%-points (Autumn) more tests compared to the
computer-based phase of Assignment 1, A1c. Like-
wise, on average, Group C’s test passing rates for
A2 increased 10.0%-points (Spring) and 7.5%-points
compared to A1c. Thus, writing own tests produced
nearly as good solutions as the test suite provided by
the teachers.
Some of the potential increase was diminished by
the slight change in the assignment text: A1 did not
define the return value in the case of zero count in
rainfall days (i.e., empty array), and while exceptions
were not a central learning objective, many students
returned a constant zero in their solution, which was
an acceptable return value according to our test suite.
In earlier Rainfall empirical studies, not defining what
to do in this corner case seemed to be more of a rule
than an exception. However, in A2, students were
specifically instructed to return the value of the lower
limit parameter in this corner case, which was also
tested by our test suite. Still, many students did not
pay heed to this instruction, which resulted in a dete-
rioration of the test cases that were related to corner
cases, such as zero-day count.
4.3.3 Detailed Analysis of the Five Selected Tests
Next, we take a look at the five selected tests in more
detail. Test T1 was based on the example data pre-
sented in the Spring exam. Group P in the Spring
exam managed to pass many more tests than Group
P in the Autumn exam. This is explained, at least
partly, by the change in the type of data input, which
was changed from double array to int array. Thus, in
the Spring exam, it was a natural choice to sum the
input into a floating point variable. However, in the
Autumn exam, many summed the input into an int
variable, and made the division without casting the
variables (sum or count) into a floating point. As stu-
dents proceeded to the computer-based phase, this er-
ror was mostly noticed and corrected.
In tests T2 and T3, the lower and upper limits were
changed, so the students had to take into account the
Towards Computer-based Exams in CS1
131