JSIMIL - A Java Bytecode Clone Detector

Luis Quesada, Fernando Berzal, Juan Carlos Cubero


We present JSimil, a code clone detector that uses a novel algorithm to detect similarities in sets of Java programs at the bytecode level. The proposed technique emphasizes scalability and efficiency. It also supports customization through profiles that allow the user to specify matching rules, system behavior, pruning thresholds, and output details. Experimental results reveal that JSimil outperforms existing systems. It is even able to spot similarities when complex code obfuscation techniques have been applied.


  1. Apiwattanapong, T., Orso, A., and Harrold, M. J. (2007). JDiff: a differencing technique and tool for objectoriented programs. Automated Software Engineering, 14(1):3-36.
  2. Baker, B. S. and Manber, U. (1998). Deducing similarities in java sources from bytecodes. In Proc. of Usenix Annual Technical Conference, pages 179-190.
  3. Belkhouche, B., Nix, A., and Hassell, J. (2004). Plagiarism detection in software designs. In Proc. of the 42nd Annual Southeast Regional Conference, pages 207-211.
  4. Cosma, G. and Joy, M. (2006). Source-code plagiarism: a UK academic perspective. Technical Report 422, University of Warwick.
  5. Dunsmore, H. E. (1984). Software metrics: an overview of an evolving methodology. Information Processing and Management, 20(1-2):183-192.
  6. Jackson, D. and Ladd, D. A. (1994.). Semantic Diff: a tool for summarizing the effects of modifications. In Proc. of the International Conference on Software Maintenance, pages 243-252.
  7. Kamiya, T., Kusumoto, S., and Inoue, K. (2002). CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering, 28(7):654-670.
  8. Krinke, J. (2001). Identifying similar code with program dependence graphs. In Proc. of the 8th Working Conference on Reverse Engineering, pages 301-309.
  9. Li, Z., Lu, S., Myagmar, S., and Zhou, Y. (2006). CPMiner: Finding copy-paste and related bugs in largescale software code. IEEE Transactions on Software Engineering, 32(2):176-192.
  10. Liu, C., Chen, C., and Han, J. (2006). GPLAG: Detection of software plagiarism by program dependence graph analysis. In Proc. of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 872-881.
  11. Prechelt, L., Malpohl, G., and Philippsen, M. (2000). JPlag: Finding plagiarism among a set of programs. Technical Report 2000-1, University of Karlsruhe.
  12. Schleimer, S., Wilkerson, D. S., and Aiken, A. (2003). Winnowing: Local algorithms for document fingerprinting. In Proc. of the 22nd ACM SIGMOD International Conference on Management of Data, pages 76-85.
  13. Tairas, R. (2008). Clone maintenance through analysis and refactoring. In Proc. of the 2008 Foundations of Software Engineering Doctoral Symposium, pages 29-32.
  14. Weiser, M. (1981). Program slicing. In Proc. of the 5th International Conference on Software Engineering, pages 439-449.
  15. Wise, M. J. (1996). YAP3: Improved detection of similarities in computer program and other texts. In Proc. of the 27th SIGCSE Technical Symposium on Computer Science Education, pages 130-134.

Paper Citation

in Harvard Style

Quesada L., Berzal F. and Carlos Cubero J. (2010). JSIMIL - A Java Bytecode Clone Detector . In Proceedings of the 5th International Conference on Software and Data Technologies - Volume 2: ICSOFT, ISBN 978-989-8425-23-2, pages 333-336. DOI: 10.5220/0003013403330336

in Bibtex Style

author={Luis Quesada and Fernando Berzal and Juan Carlos Cubero},
title={JSIMIL - A Java Bytecode Clone Detector},
booktitle={Proceedings of the 5th International Conference on Software and Data Technologies - Volume 2: ICSOFT,},

in EndNote Style

JO - Proceedings of the 5th International Conference on Software and Data Technologies - Volume 2: ICSOFT,
TI - JSIMIL - A Java Bytecode Clone Detector
SN - 978-989-8425-23-2
AU - Quesada L.
AU - Berzal F.
AU - Carlos Cubero J.
PY - 2010
SP - 333
EP - 336
DO - 10.5220/0003013403330336