A Preprocessing Design Scheme for Sequential Pattern Analysis of a Student Database

R. Campagni, D. Merlini, M. C. Verri

Abstract

In a data mining project evolved on a relational database often a significant effort needs to be done to construct the data set for the analysis. In fact, usually the database contains a series of normalized tables that need to be joined, aggregated and processed in an appropriate way to build the data set. This process generates various SQL queries that are written independently of each other, in a disordered manner. In this way, the database grows with tables and views which are not present at the conceptual level and this can yield problems for the development of the database. In this paper we consider a typical database containing data about students, courses and exams and illustrate some SQL transformations to build a data set to perform a sequential pattern analysis eventually combined with clustering and classification. In particular, we introduce in the student database some interesting patterns representing relationship between the exams given by students in various periods and the career of each student. This is achieved by introducing a particular encoding of a the career of a student. The resulting table can be analyzed with clustering and classification algorithms. We present a case study following this organization.

References

  1. Baker, R. S. J. D. (2014). Educational data mining: an advance for intelligent systems in education. IEEE Intelligent Systems, 29(3):78-82.
  2. Campagni, R., Merlini, D., Sprugnoli, R., and Verri, M. C. (2015a). Data mining models for student careers. Expert Systems with Applications, 42(13):5508-5521.
  3. Campagni, R., Merlini, D., and Verri, M. C. (2014). Finding regularities in courses evaluation with k-means clustering. In Proceedings of CSEDU 2014 - the 6th International Conference on Computer Supported Education, volume 2, pages 26-33.
  4. Campagni, R., Merlini, D., and Verri, M. C. (2015b). An analysis of courses evaluation through clustering. In Zvacek, S., Restivo, M., Uhomoibhi, J., and Helfert, M., editors, Computer Supported Education, volume 510 of Communications in Computer and Information Science, pages 211-224. Springer International Publishing.
  5. D'Mello, S., Olney, A., and Person, N. (2010). Mining Collaborative Patterns in Tutorial Dialogues. Journal of Educational Data Mining, 2(1):1-37.
  6. Dong, G. and Pei, J. (2007). Sequence Data Mining, volume 33 of Advances in Database Systems. Springer.
  7. Martinez, R., Yacef, K., Kay, J., Al-Qaraghuli, A., and Kharrufa, A. (2011). Analysing frequent sequential patterns of collaborative learning activity around an interactive tabletop. In Proceedings of EDM 2011, 4th International Conference on Educational Data Mining, pages 111-120, Eindhoven, the Netherlands.
  8. Natek, S. and Zwilling, M. (2014). Student data mining solution-knowledge management system related to higher education institutions. Expert Systems with Applications, 41:6400-6407.
  9. Ohland, M. W., Zhang, G., Thorndyke, B., and Anderson, T. J. (2004). The creation of the multipleinstitution database for investigating engineering longitudinal development. In Proceedings of the 2004 American Society for Engineering Education Annual Conference & Exposition.
  10. Ordonez, C., Maabout, S., Matusevich, D. S., and Cabrera, W. (2014). Extending ER models to capture database transformations to build data sets for data mining. Data & Knowledge Engineering, 89:38-54.
  11. Pen˜a-Ayala, A. (2014). Educational data mining: a survey and a data mining-based analysis. Expert Systems with Applications, 41:1432-1462.
  12. Romero, C., Romero, J. R., and Ventura, S. (2014). A survey on pre-processing educational data. In Educational Data Mining. Studies in Computational Intelligence, volume 524, pages 29-64, A. Pen˜a-Ayala (Ed.), Springer.
  13. Romero, C. and Ventura, S. (2013). Data mining in education. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, 3(1):12-27.
  14. Soundranayagam, H. and Yacef, K. (2010). Can order of access to learning resources predict success? In Proceedings of EDM 2010, 3rd International Conference on Educational Data Mining, pages 323-324, Pittsburgh, PA, USA.
  15. Tan, P. N., Steinbach, M., and Kumar, V. (2006). Introduction to Data Mining. Addison-Wesley.
  16. Witten, I. H., Frank, E., and Hall, M. A. (2011). Data Mining: Practical Machine Learning Tools and Techniques, Third Edition. Morgan Kaufmann.
  17. Yan, X., Han, J., and Afshar, R. (2003). Clospan: Mining closed sequential patterns in large databases. In Proceedings of the Third SIAM International Conference on Data Mining, San Francisco, CA, USA.
  18. Zhang, G., Anderson, T. J., Ohland, M. W., and Thorndyke, B. (2004). Identifying factors influencing engineering student graduation: a longitudinal and crossinstitutional study. Journal of Engineering Education, 93(4):313-320.
Download


Paper Citation


in Harvard Style

Campagni R., Merlini D. and Verri M. (2016). A Preprocessing Design Scheme for Sequential Pattern Analysis of a Student Database . In Proceedings of the 8th International Conference on Computer Supported Education - Volume 2: CSEDU, ISBN 978-989-758-179-3, pages 99-106. DOI: 10.5220/0005789600990106


in Bibtex Style

@conference{csedu16,
author={R. Campagni and D. Merlini and M. C. Verri},
title={A Preprocessing Design Scheme for Sequential Pattern Analysis of a Student Database},
booktitle={Proceedings of the 8th International Conference on Computer Supported Education - Volume 2: CSEDU,},
year={2016},
pages={99-106},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005789600990106},
isbn={978-989-758-179-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Conference on Computer Supported Education - Volume 2: CSEDU,
TI - A Preprocessing Design Scheme for Sequential Pattern Analysis of a Student Database
SN - 978-989-758-179-3
AU - Campagni R.
AU - Merlini D.
AU - Verri M.
PY - 2016
SP - 99
EP - 106
DO - 10.5220/0005789600990106