Georgia Frantzeskou, Stefanos Gritzalis, Stephen G. MacDonell


Cybercrime has increased in severity and frequency in the recent years and because of this, it has become a major concern for companies, universities and organizations. The anonymity offered by the Internet has made the task of tracing criminal identity difficult. One study field that has contributed in tracing criminals is authorship analysis on e-mails, messages and programs. This paper contains a study on source code authorship analysis. The aim of the research efforts in this area is to identify the author of a particular piece of code by examining its programming style characteristics. Borrowing extensively from the existing fields of linguistics and software metrics, this field attempts to investigate various aspects of computer program authorship. Source code authorship analysis could be implemented in cases of cyber attacks, plagiarism and computer fraud. In this paper we present the set of tools and techniques used to achieve the goal of authorship identification, a review of the research efforts in the area and a new taxonomy on source code authorship analysis.


  1. Aarmodt, A., and Plaza, E., 1994, Case-Based Reasoning: Foundational issues, Methodical Variations and System Approaches. AI Communications, vol 7(1).
  2. Bosch, R., and Smith, J., 1998, Separating hyperplanes and the authorship of the disputed federalist papers, American Mathematical Monthly, 105(7):601-608, 1998.
  3. Berghell, H., L., and Sallach, D., L., 1984, Measurements of Program Similarity in Identical Task Environments, SIGPLAN Notices 19(8), pp. 65-75.
  4. Clough, P., 2000, Plagiarism in natural and programming languages: an overview of current tools and technologies, Department of Computer Science, University of Sheffield.
  5. Diederich, J., Kindermann, J., Leopold, E., and Paass, G., 2000, Authorship attribution with Support Vector Machines, Applied Intelligence (Submitted).
  6. Ding, H., Samadzadeh, M., H., 2003, Extraction of Java program fingerprints for software authorship identification, The Journal of Systems and Software, article under press.
  7. Dunsmore, 1984, Software metrics: an overview of an evolving methodology, Information Processing and Management 20, pp. (183-192).
  8. Elliot, W., and. Valenza, R.,1991, Was the Earl of Oxford The True Shakespeare?, Notes and Queries, 38:501- 506.
  9. Foster, D., 2001, Author Unknown: On the Trail of Anonymous, Henry Holt, New York.
  10. Faidhi, J., A., and Robinson, S., K., 1987, An Approach for Detecting Program Similarity within a University Programming Environment, Computers and Education 11(1), pp. 11-19.
  11. Grier, S., 1981, A Tool that Detects Plagiarism in Pascal Programs, Twelfth SIGCSE Technical Symposium, St Louis, Missouri, pp. 15-20 (February 26-27, 1981) (SIGCSE Bulletin Vol. 13, No. 1, February 1981).
  12. Grune, D., 1991, Concise Report on Algorithms in Sim, (Report distributed with Sim software).
  13. Gray, A., Sallis, P., and MacDonell, S., 1997, Software forensics: Extending authorship analysis techniques to computer programs, in Proc. 3rd Biannual Conf. Int. Assoc. of Forensic Linguists (IAFL'97), pages 1-8.
  14. Gray, A., Sallis, P., and MacDonell, S., 1998, Identified (integrated dictionary-based extraction of nonlanguage-dependent token information for forensic identification, examination, and discrimination): A dictionary-based system for extracting source code metrics for software forensics. In Proceedings of SE:E&P'98 (Software Engineering: Education and Practice Conference), pages 252-259. IEEE Computer Society Press.
  15. Halstead, M., H., 1977, Elements of software science, North Holland, New York.
  16. Jankowitz, H. T., 1988, Detecting Plagiarism in Student Pascal Programs, Computer Journal, 31(1).
  17. Jones, E., L., 2001, Metrics Based Plagiarism Monitoring, in Proc. Consortium for Computing in Small Colleges
  18. Kilgour, R. I., Gray, A.R., Sallis, P. J., and MacDonell, S. G., 1997. A Fuzzy Logic Approach to Computer Software Source Code Authorship Analysis, Accepted for The Fourth International Conference on Neural Information Processing -- The Annual Conference of the Asian Pacific Neural Network Assembly (ICONIP'97). Dunedin. New Zealand
  19. Krsul, I., and Spafford, E. H., 1995, Authorship analysis: Identifying the author of a program, In Proc. 8th National Information Systems Security Conference, pages 514-524, National Institute of Standards and Technology.
  20. Krsul, I., and Spafford, E. H., 1996, Authorship analysis: Identifying the author of a program, Technical Report TR-96-052, 1996
  21. Longstaff, T. A., and Schultz, 1993, E. E., Beyond Preliminary Analysis of the WANK and OILZ Worms: A Case Study of Malicious Code, Computers and Security, 12:61-77.
  22. McCabe, T. J., 1976, A complexity measure, IEEE Transactions on Software Engineering, SE-2 (4), pp(308-320).
  23. MacDonell, S.G., Gray, A.R., MacLennan, G., Sallis, P.J., 1999.Software forensics for discriminating between program authors using case- based reasoning, feed forward neural networks, and multiple discriminant analysis. In: Proceedings of the 6th International Conference on Neural Information, vol. 1, Dunedin, New Zealand, pp. 66-71.
  24. Mair, C., Kadoda, G. Lefey, M., Phalp, K., Schofield , C., Shepperd, M., Webster, S., 2000, An investigation of machine learning based prediction systems The Journal of Systems and Software 53 23-29.
  25. Mosteller, F., and Wallace, D., 1964, Inference and Disputed Authorship: The Federalist, AddisonWesley, Reading, Mass.
  26. Oman, P., and Cook, C., Programming style authorship analysis. In Seventeenth Annual ACM Science Conference Proceedings, pages 320-326. ACM, 1989.
  27. Ottenstein, L., M., Quantitative estimates of debugging requirements, 1979, IEEE Transactions of Software Engineering, Vol. SE-5, pp(504-514).
  28. Prechelt, L., Malpohl, G., Philippsen, M., Finding Plagiarisms among a Set of Programs with JPlag, Journal of Universal Computer Science, vol. 8, no. 11 (2002), 1016-1038
  29. Sallis P., Aakjaer, A., and MacDonell, S., 1996, Software Forensics: Old Methods for a New Science. Proceedings of SE:E&P'96 (Software Engineering: Education and Practice). Dunedin, New Zealand, IEEE Computer Society Press, 367-371.
  30. SAS on line docs last accessed 12/1/2004
  31. Schank, R., 1982,. Dynamic Memory: A theory of reminding and learning in computers and people. Cambridge University Press.
  32. Spafford, E. H., 1989, The Internet Worm Program: An Analysis,” Computer Communications Review, 19(1): 17-49.
  33. Shepperd, M. J., and Schofield, C., 1997, Estimating software project effort using analogies, IEEE Transactions on Software Engineering, 23(11), 736- 743.
  34. Spafford, E. H., and Weeber, S. A., 1993, Software forensics: tracking code to its authors, Computers and Security, 12:585-595.
  35. Verco, K. K., and Wise, M. J., 1996, Software for detecting suspected plagiarism: Comparing structure and attribute-counting systems, In John Rosenberg, editor, Proc. of 1st Australian Conference on Computer Science Education, Sydney, ACM.
  36. Vel, O., Anderson, A., Corney, M., and Mohay, G., 2001, Mining E-mail Content for Author Identification Forensics, SIGMOD Record, 30(4): 55-64.
  37. Whale, G., 1990, Identification of Program Similarity in Large Populations, The Computer Journal 33(2), pp. 140-146.
  38. Wise, M., J., 1992, Detection of Similarities in Student Programs: YAP'ing may be Preferable to Plagueing, Proceedings, Twenty Third SCGCSE Technical Symposium, Kansas City, USA, 268-271.
  39. Wise, M. J., 1996, Improved Detection of Similarities in Computer Program and other Texts, Twenty-Seventh SIGCSE Technical Symposium, Philadelphia, U.S.A., pp. 130-134.
  40. Zheng, R., Qin, Y., Huang, Z., and Chen H., 2003, Authorship Analysis in Cybercrime Investigation Springer-Verlag Heidelberg, ISSN: 0302-9743, Volume 2665.

Paper Citation

in Harvard Style

Frantzeskou G., Gritzalis S. and G. MacDonell S. (2004). SOURCE CODE AUTHORSHIP ANALYSIS FOR SUPPORTING THE CYBERCRIME INVESTIGATION PROCESS . In Proceedings of the First International Conference on E-Business and Telecommunication Networks - Volume 2: ICETE, ISBN 972-8865-15-5, pages 85-92. DOI: 10.5220/0001390300850092

in Bibtex Style

author={Georgia Frantzeskou and Stefanos Gritzalis and Stephen G. MacDonell},
booktitle={Proceedings of the First International Conference on E-Business and Telecommunication Networks - Volume 2: ICETE,},

in EndNote Style

JO - Proceedings of the First International Conference on E-Business and Telecommunication Networks - Volume 2: ICETE,
SN - 972-8865-15-5
AU - Frantzeskou G.
AU - Gritzalis S.
AU - G. MacDonell S.
PY - 2004
SP - 85
EP - 92
DO - 10.5220/0001390300850092