DISTRIBUTED SYSTEM FOR DISCOVERING SIMILAR DOCUMENTS

Jan Kasprzak, Michal Brandejs, Miroslav Kripač, Pavel Šmerk

Abstract

One of the drawbacks of e-learning methods such as Web-based submission and evaluation of students’ papers and essays is that it has become easier for students to plagiarize the work of other people. In this paper we present a computer-based system for discovering similar documents, which has been in use at Masaryk University in Brno since August 2006, and which will also be used in the forthcoming Czech national archive of graduate theses. We also focus on practical aspects of this system: achieving near real-time response to newly imported documents, and computational feasibility of handling large sets of documents on commodity hardware. We also show the possibilities and problems with parallelization of this system for running on a distributed cluster of computers.

References

  1. Monostori, K., Finkel, R. A., Zaslavsky, A. B., Hodász, G., and Pataki, M. (2002). Comparison of overlap detection techniques. In ICCS 7802: Proceedings of the International Conference on Computational SciencePart I, pages 51-60, London, UK. Springer-Verlag.
  2. Pazdziora, J. and Brandejs, M. (2000). University information system fully based on www. In ICEIS 2000 Proceedings, pages 467-471. Escola Superior de Tecnologia do Instituto Politcnico de Setbal.
Download


Paper Citation


in Harvard Style

Kasprzak J., Brandejs M., Kripač M. and Šmerk P. (2008). DISTRIBUTED SYSTEM FOR DISCOVERING SIMILAR DOCUMENTS . In Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8111-36-4, pages 437-440. DOI: 10.5220/0001687604370440


in Bibtex Style

@conference{iceis08,
author={Jan Kasprzak and Michal Brandejs and Miroslav Kripač and Pavel Šmerk},
title={DISTRIBUTED SYSTEM FOR DISCOVERING SIMILAR DOCUMENTS},
booktitle={Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2008},
pages={437-440},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001687604370440},
isbn={978-989-8111-36-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - DISTRIBUTED SYSTEM FOR DISCOVERING SIMILAR DOCUMENTS
SN - 978-989-8111-36-4
AU - Kasprzak J.
AU - Brandejs M.
AU - Kripač M.
AU - Šmerk P.
PY - 2008
SP - 437
EP - 440
DO - 10.5220/0001687604370440