status, and disability. Beyond this, aspects like the
fact, whether a bachelor of business or bachelor of
applied science is aimed, are used. Therefore, the pre-
diction here is mainly independent of the students’ be-
havior during their study. The CART approach used
in (Kova
ˇ
ci
´
c, 2010) was based on 453 data records
and reaches an overall percentage of correct classi-
fication of 60.5%. In (Jishan et al., 2015) the goal is
a prediction model for the final grade. The used data
set contains 181 instances from a course titled Nu-
merical Analysis at North South University, Dhaka,
Bangladesh. The highest accuracy of about 75% was
reached using Artificial Networks and Naive Bayes
classification.
2 KNOWLEDGE DISCOVERY IN
EXAM DATA BASES AND USE
CASES
We use Knowledge Discovery in Data Bases accord-
ing to (Fayyad et al., 1996) and apply this concept to
the demands of exam data bases especially in Ger-
many or states with similar restrictions concerning
data protection and informational self-determination.
Germany has quite strict laws concerning which,
where and by whom data is processed. For example,
aspects like the social background or migration back-
ground is not covered for data collection and process-
ing by (Law of the FRG, 2016). Therefore, the sug-
gested framework tries to be most careful concerning
this issue and shows, how it is still possible to pro-
vide a tool for student counseling based on exam data
records with these limitations.
2.1 Suggested Framework
Because of the discussed limitations we developed the
following processing work flow. As figure 1 illus-
trates, the starting point is the data base of the ex-
amination office of a university. It contains all data
about a student that a university has. To keep the data
maximum secure in a first step, the data of relevant
features is anonymized and copied to a new data base.
This is a full automatic process that can be performed
on the computer system of the examination office in a
regular schedule. Therefore, the risk of stolen data or
illegal use is the same as before. At this point the sug-
gested process does not contain any new party or en-
vironment. The features are discussed more detailed
in section 2.2 as well as the necessary preprocessing
of the data. The last preprocession step yields a train-
ing set from which a prediction system is built. The
resulting system, e.g. a multilayer perceptron (MLP),
is trained. The important aspect is that most software
systems based on machine learning algorithms – after
they have been trained – can act independently of the
used training data base. For example, in artificial neu-
ral networks like multilayer perceptron knowledge is
compressed in the weights w
i j
of each layer, which
means a few matrices of double values. Concerning
data security it is impossible to reconstruct a single
record of the training data base from these matrices.
Therefore, a trained system can be distributed with-
out interfering with data protection issues among e.g.
other institutions of a university.
While after the completed training the software
unit itself does not contain personal data of the stu-
dents from the training set, it of course still needs the
input data vector of the student it should predict the
study success for. Thus, for this software system there
are at least two applications.
1. Use as personal advisor or alarm system for
the students themselves. If a student signs in for the
alarm system every exam period, his behavior can be
rated and he may receive a feedback in terms of a traf-
fic light rating system. Red would mean that he or
she should consider seeking for help, e.g. at a stu-
dent counseling office. Green means everything is
fine, and yellow is obviously in between. For the stu-
dent that might mean to watch carefully his own steps
and to consider what was different compared to the
last green semester. If the last exam period was red,
yellow could mean, that someone is on the right way
and things are getting better.
2. Use as tool for the counseling offices. In the
same way as a single student can use it for himself it
can also act as second opinion for professional coun-
selors. This is always possible, because a counselor
can access the student’s data during counseling.
If the current law allows it, of course it would be
possible to e.g. process the data of all students a coun-
selor is responsible for and to seek for candidates,
who might need additional support. In countries, in
which this is not automatically possible by law, there
is in general the option to ask students for a permis-
sion, when they sign in for the university. Because
this will be non-obligatory, only a subset of students
will be covered by this usage scenario.
2.2 Practical and Data Quality Issues
After the introduction of the bachelor and master de-
gree programs in association with the Bologna Pro-
cess the universities in Germany have built up a wide
range of different degree courses. All of them use
the European Credit Transfer and Accumulation Sys-
KDIR 2016 - 8th International Conference on Knowledge Discovery and Information Retrieval
182