fication is focused on the differences in coding instead
of the structure of the task. Still, a certain complexity
is needed in the sense that it must be possible to ex-
press one’s own coding style. Simple output of data
and pure ’data container’ classes should be avoided.
The usefulness of the features depends on the data
set and it is not possible to construct a reduced feature
set that is useful for all data sets. But it is possible and
useful to reduce the features after computation for the
whole training by removing features with low vari-
ance and zero mutual information. Mutual informa-
tion describes the relation between two data sets, in
this case between each column (feature) in the train-
ing set and the label vector y (Ross, 2014).
Finally, it should be noted that all students must
agree that their tasks will be used for fraud detection.
This agreement can be obtained when students upload
their work.
Further research is necessary to rule out that pro-
grammers are assumed to be the authors of files al-
though they are not, i.e. a case of cheating is not de-
tected by the application. If the training set is big, it
is unlikely that this happens if it is assumed that in
case the real author is not in the training set, the file
will be randomly attributed to one of the classes. But
this will most likely not be the case, especially if the
students have knowledge about the cheating detection
process. This might cause them to obtain code from
a person which they think has a coding style that re-
sembles theirs.
The next step is the integration of the cheating de-
tection into an e-assessment framework. This also in-
cludes taking into account measurements that prevent
a cheating attempt while the exam takes place. Since
the first electronic examinations took place attempts
have been made to secure the computers on which the
exam is taken and prevent the use of forbidden soft-
ware and the internet. This is relatively easy for work-
stations that are under the control of the examiners
and whose configuration is known and homogenous
(Wyer and Eisenbach, 2001).
REFERENCES
Aleksi
´
c, V. and Ivanovi
´
c, M. (2016). Introductory program-
ming subject in european higher education. INFOR-
MATICS IN EDUCATION, 15(2):163–182.
Caliskan-Islam, A., Harang, R., Liu, A., Narayanan, A.,
Voss, C., Yamaguchi, F., and Greenstadt, R. (2015).
De-anonymizing programmers via code stylometry. In
24th USENIX Security Symposium (USENIX Security
15), pages 255–270, Washington, D.C. USENIX As-
sociation.
Heintz, A. (2017). Cheating at Digital Exams. Master’s the-
sis, Norwegian University of Science and Technology,
Norway.
Malpohl, G., Prechelt, L., and Philippsen, M. (2002).
Finding plagiarisms among a set of programs with
jplag. JUCS - Journal of Universal Computer Science,
8(11).
Noguera, I., Guerrero-Rold
´
an, A.-E., and Rodr
´
ıguez, M. E.
(2017). Assuring authorship and authentication across
the e-assessment process. In Technology Enhanced
Assessment, pages 86–92. Springer International Pub-
lishing.
Opgen-Rhein, J., K
¨
uppers, B., and Schroeder, U. (2018).
An application to discover cheating in digital exams.
In Proceedings of the 18th Koli Calling International
Conference on Computing Education Research, Koli
Calling ’18, pages 20:1–20:5, New York, NY, USA.
ACM.
Peltola, P., Kangas, V., Pirttinen, N., Nygren, H., and
Leinonen, J. (2017). Identification based on typing
patterns between programming and free text. In Pro-
ceedings of the 17th Koli Calling International Con-
ference on Computing Education Research, Koli Call-
ing ’17, pages 163–167, New York, NY, USA. ACM.
Ross, B. C. (2014). Mutual information between discrete
and continuous data sets. PLoS ONE, 9(2):e87357.
Schleimer, S., Wilkerson, D. S., and Aiken, A. (2003). Win-
nowing: Local algorithms for document fingerprint-
ing. In Proceedings of the 2003 ACM SIGMOD Inter-
national Conference on Management of Data, SIG-
MOD ’03, pages 76–85, New York, NY, USA. ACM.
Sheard, J. and Dick, M. (2012). Directions and dimen-
sions in managing cheating and plagiarism of it stu-
dents. In Proceedings of the Fourteenth Australasian
Computing Education Conference - Volume 123, ACE
’12, pages 177–186, Darlinghurst, Australia, Aus-
tralia. Australian Computer Society, Inc.
Siegfried, R. M., Siegfried, J. P., and Alexandro, G. (2016).
A longitudinal analysis of the reid list of first program-
ming languages. Information Systems Education Jour-
nal, 14(6):47–54.
Williams, K. M., Nathanson, C., and Paulhus, D. L. (2010).
Identifying and profiling scholastic cheaters: Their
personality, cognitive ability, and motivation. Jour-
nal of Experimental Psychology: Applied, 16(3):293–
307.
Wyer, M. and Eisenbach, S. (2001). Lexis: an exam invig-
ilation system. In Proceedings of the Fiftteenth Sys-
tems Administration Conference (LISA XV) (USENIX
Association: Berkeley, CA, p199, 2001. International
Conference on Engineering Education August 18–21,
2002.
Zobel, J. (2004). ”uni cheats racket”: A case study in pla-
giarism investigation. In Proceedings of the Sixth Aus-
tralasian Conference on Computing Education - Vol-
ume 30, ACE ’04, pages 357–365, Darlinghurst, Aus-
tralia, Australia. Australian Computer Society, Inc.
Requirements for Author Verification in Electronic Computer Science Exams
439