Authors:
Thomas Karanikiotis
;
Kyriakos C. Chatzidimitriou
and
Andreas L. Symeonidis
Affiliation:
Dept. of Electrical and Computer Eng., Aristotle University of Thessaloniki, Thessaloniki, Greece
Keyword(s):
Source Code Formatting, Code Style, Source Code Readability, LSTM, SVM One-Class.
Abstract:
Source code readability and comprehensibility are continuously gaining interest, due to the wide adoption of component-based software development and the (re)use of software residing in code hosting platforms. Consistent code styling and formatting across a project tend to improve readability, while most code formatting approaches rely on a set of rules defined by experts, that aspire to model a commonly accepted formatting. This approach is usually based on the experts’ expertise, is time-consuming and does not take into account the way a team develops software. Thus, it becomes too intrusive and, in many cases, is not adopted. In this work we present an automated mechanism, that, given a set of source code files, can be trained to recognize the formatting style used across a project and identify deviations, in a completely unsupervised manner. At first, source code is transformed into small meaningful pieces, called tokens, which are used to train the models of our mechanism, in or
der to predict the probability of a token being wrongly positioned. Preliminary evaluation on various axes indicates that our approach can effectively detect formatting deviations from the project’s code styling and provide actionable recommendations to the developer.
(More)