at Bergen University College, Norway. The class
consists of approximately 100 students. Entry
requirements for the program are fairly strict, so the
students are generally well qualified and highly
motivated.
Although not done deliberately for research
purposes, it is fortunate for comparison that the
course has been virtually identical for at least three
year, except from the flipped classroom experiment.
This includes the student group, the teacher, the
curriculum and the exam form, difficulty and
grading. Hence, we will argue that the flipped
classroom experiment is likely to be an important
factor behind any significant changes.
Our analysis is conducted based on three
different datasets. The first dataset consists of the
results from a specific survey among the students
regarding the “flipped classroom” experiment (see
Appendix 1 for a translated version of the survey).
Students were asked to complete this survey
electronically and anonymously after the course and
the exam. 45 students completed the survey.
Although a higher response rate would have been
beneficial, we still think the results are fairly
reliable. Each respondent should consider 14
different statements; only two of (45*14) = 630
values are missing. Also, most students have
supplemented their quantitative responses with
sensible and interesting verbal comments. These are
not directly included in our analysis, but the overall
impression is very well aligned with the quantitative
results.
Our second dataset is based on general student
evaluations of the course over a period of three years
– 2012, 2013 and 2014. These surveys are done
every year after classes are finished. The survey
consists of a number of different questions about
each course, the general learning environment etc,
but for our analysis we have singled out one single
question:
“How was your learning outcome from the
lectures in the course Strategic Management?”
(Scale from 1 to 5).
The “flipped classroom” experiment was
conducted in 2014, so we have results from two
years without the experiment and one year with the
experiment. As mentioned above, the course has
otherwise not changed, so the results should be
comparable. Again, the surveys were completed
electronically and anonymously, after all lectures,
but before the final exam, in order to avoid any
(dis)satisfaction with the exam to influence the
results. The response rates were similar to the
experiment-specific survey, with 46, 38, and 50
students completing the surveys in 2012, 2013 and
2014.
The third and final dataset simply consists of
historical exam results. For administrative purposes,
the Strategic Management grade is merged with
another grade before published as the official grade
for a larger course. The grades we have studied are
the unpublished, separate grades for Strategic
Management, i.e., the course area where the
experiment was conducted. We use the grade
distribution (A-F) as our measure of grades, and we
have grades from around 100 students each year.
There will of course always be fluctuations in exam
results, depending on both the exam itself and the
student group.
Hence, we think that this dataset is the least
reliable source of the three. However, the exam form
has remained the same, the student quality should be
fairly similar, and the exam has been made and
graded by the same person each year, aiming for the
same level across the three years. So we will argue
that although they are not conclucive on their own,
differences in grades could provide some useful
information as part of a larger picture. More
fundamentally, grades also seem a better measure of
what we really are interested in. It could be (and is)
discussed whether grades are a good reflection of
actual learning outcome, but we assume that there is
at least a clear positive correlation, such that a
student who has learned more in course on average
gets a better grade in the course. The fundamental
purpose of any pedagogical or technological
experiment should be to improve learning; not per se
to introduce methods students like.
1
Thus, it is
natural to include grade patterns if possible when
evaluating such experiments.
3 RESULTS
In this section, we will present the key findings from
the three datasets. First, consider the survey where
students specifically evaluate the experiment. Table
1: Students evaluation of the flipped classroom
experience.
The figure above presents the main results.
Overall, the numbers suggest that the experiment
was relatively successful, with most averages
between 3,2 and 4,2. First, a few comments to the
low scores for statement 1) and 6).
1
Student evaluations of experiments can be important for
further implementations; a factor more thoroughly
discussed in the conclusion.
CSEDU2015-7thInternationalConferenceonComputerSupportedEducation
206