Table 16: Keyword analysis for page 24 from chapter nine
after clustering.
Chapter: 9, Page: 24, Cosine: 0.625
keywords select where group by group having ! min
count 2 1 2 2 1 2 2
tf 1 0.5 1 1 0.5 1 1
idf by 0.347 0.484 1.109 1.051 1.301 1.352 1.556
tf*idf 0.347 0.242 1.109 1.051 0.651 1.352 1.556
are regularly used in the English language without any
SQL context. Since the lecture slides are written in
English, problems arise when counting the number of
times these keywords are used in a SQL environment.
This is challenging for our recommendation process if
the encountered keyword rarely appears in the corpus,
as it would receive a high idf weight, leading to a great
influence in deciding the topic of the page. In the
case of a page populated with many SQL keywords,
the negative effect of one incorrectly recognized key-
word might be mitigated by the tf*idf values of the
other keywords. If the page only has a few keywords,
it might happen that a word like IN or AND not used
in any SQL context will mislead the recommendation
process for this page.
7 CONCLUSION
This research is aimed at improving students’ perfor-
mance by reducing the unsystematic trial-and-error
behavior during online SQL exercises task engage-
ment in our SQLValidator. To this end, we have im-
plemented a strategy in which suitable slides from
lecture materials are mapped to respective SQL ex-
ercise tasks and are recommended to students in the
form of hints during exercise task engagement. We
have described, evaluated, and further optimized our
strategy via join detection and clustering. Our imple-
mentation as shown in the evaluation section reaches
a precision value of 0.767 and F
β=0.5
value of 0.505
thus justifying our strategy. The next stage in this rec-
ommendation system track is the impact assessment
of the recommendation on students’ engagement and
SQL skill acquisition. Students tend to share solution
codes. While solution distribution among students
cannot be stopped as it is also a part of learning. A
future direction is the implementation of a plagiarism
discouragement feature.
ACKNOWLEDGMENTS
This work was supported by the German Federal
Ministry of Education and Research [grant number
16DHB 3008].
REFERENCES
Alvarez, S. A. (2002). An exact analytical relation among
recall, precision, and classification accuracy in infor-
mation retrieval. Boston College, Boston, Technical
Report BCCS-02-01, pages 1–22.
Bradley, A., Duin, R., Paclik, P., and Landgrebe, T.
(2006). Precision-recall operating characteristic (p-
roc) curves in imprecise environments. In ICPR, vol-
ume 4, pages 123–127. IEEE.
Charu, C. A. (2016). Recommender Systems: The Textbook.
Costello, E. (2013). Opening up to open source: look-
ing at how moodle was adopted in higher education.
Open Learning: The Journal of Open, Distance and
e-Learning, 28(3):187–200.
Dietrich, S. W. (1993). An educational tool for formal re-
lational database query languages. Computer Science
Education, 4(2):157–184.
Famili, A., Shen, W.-M., Weber, R., and Simoudis, E.
(1997). Data preprocessing and intelligent data anal-
ysis. Intelligent data analysis, 1(1):3–23.
Kleerekoper, A. and Schofield, A. (2018). SQL tester: an
online SQL assessment tool and its impact. In Pro-
ceedings of the Annual ACM Conference on Innova-
tion and Technology in Computer Science Education,
pages 87–92.
Leskovec, J., Rajaraman, A., and Ullman, J. D. (2014). Min-
ing of Massive Datasets. Cambridge University Press.
Machado, M. and Tao, E. (2007). Blackboard vs. moodle:
Comparing user experience of learning management
systems. In FIE, pages S4J–7. IEEE.
Manning, C. D., Raghavan, P., and Sch
¨
utze, H. (2008). In-
troduction to Information Retrieval. Cambridge Uni-
versity Press.
Mitrovi
´
c, A. (1998). Experiences in implementing
constraint-based modeling in SQL-Tutor. In ITS,
pages 414–423.
Nadkarni, P. M., Ohno-Machado, L., and Chapman, W. W.
(2011). Natural language processing: an introduction.
JAMIA, 18(5):544–551.
Obionwu, V., Broneske, D., Hawlitschek, A., K
¨
oppen, V.,
and Saake, G. (2021). Sqlvalidator–an online student
playground to learn sql. Datenbank-Spektrum, pages
1–9.
Obionwu, V., Broneske, D., and Saake, G. (2022). Topic
maps as a tool for facilitating collaborative work ped-
agogy in knowledge management systems. Interna-
tional Journal of Knowledge Engineering.
Ramos, J. et al. (2003). Using tf-idf to determine word rele-
vance in document queries. In Proceedings of the first
instructional conference on machine learning, volume
242, pages 29–48. Citeseer.
Sidorov, G., Gelbukh, A., G
´
omez-Adorno, H., and Pinto, D.
(2014). Soft similarity and soft cosine measure: Simi-
larity of features in vector space model. Computaci
´
on
y Sistemas, 18(3):491–504.
Sokolova, M. and Lapalme, G. (2009). A systematic analy-
sis of performance measures for classification tasks.
Information processing & management, 45(4):427–
437.
Wu, H. C., Luk, R. W. P., Wong, K. F., and Kwok, K. L.
(2008). Interpreting tf-idf term weights as making rel-
evance decisions. TOIS, 26(3):1–37.
DATA 2022 - 11th International Conference on Data Science, Technology and Applications
548