Question 2: How to construct and deliver
personalized interventions via peer-to-peer off-line
communications?
Due to the lack of face-to-face interactions in
online courses, it is difficult to track student
involvement and early detecting their performance
decline via direct communications as we typically
practice in a classroom setting. Fortunately, students
usually leave a lot of digital footprints whenever they
take the courses, participate the online forum
discussions, submit homework, read online slides,
etc. Such digital footprints are very valuable
information for the instructors to understand student
behaviours, and make meaningful interpretations and
predictions therefrom. However, given the fact that
online courses are normally large classes, it will be a
huge workload if instructors manually analyse such
footprint data. In addition, the highly heterogeneous
student body makes any manual analysis highly
nontrivial. This is because any conclusions for an
individual student may or may not apply to others, for
example, from a different major.
3 STATE OF THE ART
Schools offering fully online, hybrid and web-
enhanced degree programs have seen substantial
growth over the past ten years and all signs show that
growth will continue at this rapid rate (“How
Prevalent is Online Learning”, 2017). In addition,
Massive Open Online Courses offer a wide range of
online educational programs from leading
universities (Combs & Mesko, 2015). One clear
advantage of an online course is that logs can provide
clues about learner experiences in relation to ease of
course navigation and perceived value of content
(Robyn, 2013). On the other hand, the flaw of
MOOCs were eagerly dissected – high dropout rates,
limited social interaction, heavy reliance on
instructivist teaching, poor results for
underrepresented student populations, and so on
(Bunk et al., 2015). For example, a program
introduced by San Jose State University and Udacity
to run remedial courses in popular subjects ended in
a failure rate of up to 71% percent (Devlin, 2013).
Despite of this, the amount of data generated from
online courses are skyrocketing. Researchers and
developers of online learning systems have begun to
explore analogous techniques for gaining insights
from learners’ activities online (U.S. Department of
Education, 2012).
EDM has been emerging into an individual
research area in recent years (Baker et al., 2010).
Several main research focuses are developed in EDM,
including student behaviour modelling, student
performance modelling, assessment, et. al. Bayes
theorem, Hidden Markov Model, decision trees et. al.
are among the most popular methods applied in these
researches (Pena-Ayala, 2014).
Methods such as Collaborative Filtering (CF)
(Ning, Desrosiers, & Karypis, 2015) and Matrix
Factorization (MF) (Koren, Bell, & Volinsky, 2009),
have attracted increasing attention in EDM
applications, due to their strong ability to deal with
sparse data for ranking, prediction or classification,
which is particularly common in EDM. For example,
Sweeney et. al. (2015, 2016) adopted developed
methods including SVD, SVD-kNN and
Factorization Machine (FM) to predict next-term
performance. Polyzou and Karypis (2013) addressed
the future course grade prediction problem with three
approaches: course-specific regression, student-
specific regression and course-specific matrix
factorization. Moreover, neighborhood-based CF is
one of the most popular methods in EDM. Many
existing approaches (Ray & Sharma, 2011;
Bydzovska, 2015; Denley, 2013) predict grades
based on the student similarities, that is, they first
identify similar students and use their grades to
estimate the grades of the students of interest.
In order to capture the change of student dynamics
over time, various dynamic models have been
developed in EDM. Sun et. al. (2012, 2014) modelled
student preference change using a state space model
on latent student factors, and estimated student
factors over time using noncausal Kalman filters.
Similarly, Chua et.al. (2013) applied Linear
Dynamical Systems (LDS) on Non-negative Matrix
Factorization (NMF) to model student dynamics.
Zhang et. al. (2014) learned an explicit transition
matrix over the latent factor for each student, and
solved for the student and course latent factors and
the transition matrices within a Bayesian framework.
4 METHODOLOGY
To answer question 1, we argue that applying DL and
ML tools to analyse the digital footprints of a
carefully chosen online course would be a good pilot.
We believe particular focus on the following
information is necessary: 1) time students spend on
slide reading and course video watching, 2) the
frequency that students log into the learning system,
3) the frequency that students participate in online
forum discussion and time they spend, 4) their
interactions with other students on the forum through