matically change the generated operations and the ex-
ecution profile, thus, as hard as implementing a solu-
tion from scratch.
7 CONCLUSION
This paper has proposed a robust approach for source
code plagiarism detection, combining static informa-
tion from software instructions and dynamic behavior
from acquired profile and traces. The semantic analy-
sis techniques were effective in detecting plagiarized
solutions in a real case study scenario. Despite minor
false positive cases, the X9 was able to detect all pla-
giarized solutions, even when they were heavily ob-
fuscated with a perfect accuracy level.
The source code similarity comparison is a chal-
lenging task and the proposed approach has plenty
space for improvements, such as: the definition of a
larger number of labeled solutions in experiments, in-
cluding interpreted and script based languages, which
can improve X9 evaluation; in semantic analysis, the
results can be improved by ignoring known algo-
rithms from lectures or mandatory library/system call
requests, which would avoid FP cases; and the release
of X9 tool for public usage as web service, allowing
further feedback to enhance the development.
REFERENCES
Aiken, A. (2017). Moss - measure of software similarity.
Bergroth, L., Hakonen, H., and Raita, T. (2000). A survey
of longest common subsequence algorithms. In Pro-
ceedings Seventh International Symposium on String
Processing and Information Retrieval. SPIRE 2000,
pages 39–48.
Cespedes, J. (2017). ltrace - a library call tracer.
Choi, Y., Park, Y., Choi, J., Cho, S.-j., and Han, H. (2013).
Ramc: Runtime abstract memory context based pla-
giarism detection in binary code. In Proceedings of the
7th International Conference on Ubiquitous Informa-
tion Management and Communication, ICUIMC ’13,
pages 67:1–67:7, New York, NY, USA. ACM.
Cosma, G. and Joy, M. (2012). An approach to source-code
plagiarism detection and investigation using latent se-
mantic analysis. IEEE Transactions on Computers,
61(3):379–394.
Gitchell, D. and Tran, N. (1999). Sim: A utility for detect-
ing similarity in computer programs. SIGCSE Bull.,
31(1):266–270.
GNU (2017). Gnu profiler.
Grune, D. (2017). The software and text similarity tester
sim.
Joy, M. and Luck, M. (1999). Plagiarism in program-
ming assignments. IEEE Transactions on Education,
42(2):129–133.
Joy, M. and Luck, M. (2017). The sherlock plagiarism de-
tector.
Kikuchi, H., Goto, T., Wakatsuki, M., and Nishino, T.
(2014). A source code plagiarism detecting method
using alignment with abstract syntax tree elements.
In 15th IEEE/ACIS International Conference on Soft-
ware Engineering, Artificial Intelligence, Networking
and Parallel/Distributed Computing (SNPD), pages
1–6.
KIT, K. I. o. T. (2017). Jplag - detecting software plagia-
rism.
Luo, L., Ming, J., Wu, D., Liu, P., and Zhu, S.
(2014). Semantics-based obfuscation-resilient binary
code similarity comparison with applications to soft-
ware plagiarism detection. In Proceedings of the 22Nd
ACM SIGSOFT International Symposium on Founda-
tions of Software Engineering, FSE 2014, pages 389–
400, New York, NY, USA. ACM.
Luo, L., Ming, J., Wu, D., Liu, P., and Zhu, S.
(2017). Semantics-based obfuscation-resilient binary
code similarity comparison with applications to soft-
ware and algorithm plagiarism detection. IEEE Trans-
actions on Software Engineering, PP(99):1–1.
Modiba, P., Pieterse, V., and Haskins, B. (2016). Evaluating
plagiarism detection software for introductory pro-
gramming assignments. In Proceedings of the Com-
puter Science Education Research Conference 2016,
CSERC ’16, pages 37–46, New York, NY, USA.
ACM.
Moodle (2017). Moodle - open-source learning platform.
Muddu, B., Asadullah, A., and Bhat, V. (2013). Cpdp:
A robust technique for plagiarism detection in source
code. In Proceedings of the 7th International Work-
shop on Software Clones, IWSC ’13, pages 39–45,
Piscataway, NJ, USA. IEEE Press.
Park, J., Son, D., Kang, D., Choi, J., and Jeon, G. (2015).
Software similarity analysis based on dynamic stack
usage patterns. In Proceedings of the 2015 Confer-
ence on Research in Adaptive and Convergent Sys-
tems, RACS, pages 285–290, New York, NY, USA.
ACM.
Prechelt, L., Malpohl, G., and Philippsen, M. (2002). Find-
ing plagiarisms among a set of programs with jplag.
j-jucs, 8(11):1016–1038.
Schleimer, S., Wilkerson, D. S., and Aiken, A. (2003). Win-
nowing: Local algorithms for document fingerprint-
ing. In Proceedings of the 2003 ACM SIGMOD Inter-
national Conference on Management of Data, SIG-
MOD ’03, pages 76–85, New York, NY, USA. ACM.
strace project (2017). strace - trace system calls and signals.
Stunnix (2017). Cxx-obfus: Stunnix c/c++ obfuscator.
Valgrind (2017). Valgrind: Instrumentation framework.
Yasaswi, J., Kailash, S., Chilupuri, A., Purini, S., and Jawa-
har, C. V. (2017). Unsupervised learning based ap-
proach for plagiarism detection in programming as-
signments. In Proceedings of the 10th Innovations
in Software Engineering Conference, ISEC ’17, pages
117–121, New York, NY, USA. ACM.
Zhao, J., Xia, K., Fu, Y., and Cui, B. (2015). An
ast-based code plagiarism detection algorithm. In
2015 10th International Conference on Broadband
and Wireless Computing, Communication and Appli-
cations (BWCCA), pages 178–182.
ICEIS 2018 - 20th International Conference on Enterprise Information Systems
524