ε δ |I |
1 0.05 8
1 0.1 8
1 0.3 8
0.95 0.05 13
0.95 0.1 11
0.95 0.3 9
ε δ |I |
0.90 0.05 13
0.90 0.1 10
0.90 0.3 8
0.85 0.05 14
0.85 0.1 12
0.85 0.3 7
Figure 2: Results for inconsistency detection for various ε
and δ in the PALMA dataset.
of lecturers. However, providing results for different
combinations of values of δ and ε (as in the figures 1
and 2) allows the teachers to gain better insight to the
evaluation process of their lectures.
4 CONCLUSIONS
Teachers should evaluate students’ solutions consis-
tently, however, this is not always the case. We pro-
posed a simple and easy to implement solution for de-
tecting inconsistencies in the evaluation process when
the textual review of two solutions provided for the
same task are very similar but the numerical grades
differ. Since, to the best of our knowledge, our work
is the first dealing with this issue, we also introduced
a formal model of the inconsistency detection prob-
lem. Experiments on two real-world datasets show
that even in a small scale we can found inconsistent
evaluations.
We provided our findings to the colleagues who
provided us with the datasets as well as to some
of our other colleagues at our university. Positive
feedbacks from these teachers show that the intro-
duced approach for evaluation inconsistency detec-
tion is helpful in the teaching process and worth fur-
ther investigation.
Our further research will focus on the relationship
between assessment methods and the learning out-
comes of students, as well as the investigation of uti-
lizing different feature extraction methods (Petz et al.,
2012; Holzinger et al., 2012) in our approach.
ACKNOWLEDGEMENTS
This work was supported by the grants VEGA
1/0832/12 and VVGS-PF-2012-22 at the Pavol Jozef
ˇ
Saf´arik Universityin Koˇsice, Slovakia. We would like
to thank to our colleagues Frantiˇsek Galˇc´ık for pro-
viding us the PAC dataset,
ˇ
Lubom´ır
ˇ
Snajder and J´an
Guniˇs for providing us the PALMA dataset.
REFERENCES
Banta, T. W., Jones, E. A., and Black, K. E. (2009). De-
signing Effective Assessment: Principles and Profiles
of Good Practice. John Wiley and Sons, 2nd edition.
Beck, H. P., Rorrer-Woody, S., and Pierce, L. G. (1991).
The relations of learning and grade orientations to aca-
demic performance. In Teaching of Psychology 18,
pages 35–37.
Carell, S. E. and West, J. E. (2010). Random assignment of
students to professors. Journal of Political Economy.
Holzinger, A., Yildirim, P., Geier, M., and Simonic, K.-
M. (2012). Quality-based knowledge discovery from
medical text on the Web Example of computational
methods in Web intelligence. Springer.
Jindal, N. and Liu., B. (2008). Opinion spam and analy-
sis. In Proceedings of First ACM International Con-
ference on Web Search and Data Mining. ACM New
York, USA.
Kohn, A. (1999). From degrading to de-grading. Rev. ed.
Boston: Houghton Mifflin.
Koren, Y., Bell, R., and Volinsky, C. (2009). Matrix factor-
ization techniques for recommender systems. Com-
puter, 42(8):30–37.
Milton, O. (2009). Making Sense of College Grades: Why
the Grading System Does Not Work and What Can be
Done About It. San Francisco: Jossey-Bass.
Milton, O., H. R., P., and J. A., E. (1986). Making Sense of
College Grades. San Francisco: Jossey-Bass.
Petz, G., Karpowicz, M., Frschu, H., Auinger, A., Winkler,
S., Schaller, S., and Holzinger, A. (2012). On text
preprocessing for opinion mining outside of labora-
tory environments. In Active Media Technology, pages
618–629. Springer.
Ramos, J. (2003). Using tf-idf to determine word relevance
in document queries.
Robertson, S. (2004). Understanding inverse document fre-
quency: On theoretical arguments for idf. In Journal
of Documentation, volume 60.
Rockoff, J. and Speroni, C. (2010). Subjective and objec-
tive evaluations of teacher effectiveness. In Labour
Economics, Volume 18, Issue 5, pages 687–696.
Sp¨arck Jones, K. (1972). A statistical interpretation of term
specificity and its application in retrieval.
Suskie, L. (2009). Assessing Student Learning: A common
sense guide. Malden: Jossey-Bass A Wiley Imprint,
San Francisco, 2nd edition.
Tan, P.-N., Steinbach, M., and Kumar, V. (2005). Introduc-
tion to Data Mining. Addison-Wesley.
Walvoord, B. E. and Anderson, V. J. (2009). Effective Grad-
ing: A Tool for Learning and Assessment in College.
Paperback, 2nd edition.
Wu, H. C., Luk, R. W. P., Wong, K. F., and Kwok, K. L.
(2008). Interpreting tf-idf term weights as making rel-
evance decisions. ACM Trans. Inf. Syst., 26(3):13:1–
13:37.
DetectionofInconsistenciesinStudentEvaluations
249