pects, as the projects’ domain?
To answer the questions, we detect quality mea-
sures trends over six versions of six projects, and
compare the trends appearing in migrated or non-
migrated projects (RQ1) and in the different do-
mains (RQ2). The trend analysis has been per-
formed through the Mann-Kendall test, applied to
each quality indicator. From our results, it appears
that, considering also some peculiarities of the ex-
tracted data, no significant difference exists in trends
across migration/non-migration and domain factors.
The rest of the paper is organized as follows. Sec-
tion 2 introduces the related work. Section 3 describes
the experiment setup. Section 4 our main results. Sec-
tion 5 deals with the threats to validity. Section 6 con-
cludes and outlines future developments.
2 RELATED WORK
For what concerns quality analysis related to GitHub,
we found some recent works in the literature.
In (Jarczyk et al., 2014), the authors investigate if
there are any significant correlations between project
quality and the characteristics of the team members
using GitHub. To this aim, they defined two metrics,
one reflecting projects popularity, and one reflecting
the quality of support offered by team members to
users, obtained using survival analysis techniques ap-
plied to issues reported by users.
In (Yu et al., 2014), the authors analyze the suit-
ability of GitHub to support distributed software de-
velopment. They review different kinds of version
control systems and study the dynamics of GitHub,
i.e., its ability and scalability to process different re-
quests and to provide different services to different
GitHub projects and users.
In (Vasilescu et al., 2014), the authors found em-
pirical evidence of continuous integration in a social-
coding world from GitHub. They discovered that in
projects older than two years and projects with not too
many contributors, pull request are much more likely
to result in successful builds than direct commits.
In (Vendome et al., 2015), the authors performed
an empirical study to quantitatively and qualitatively
investigate when and why developer change software
licenses. They identify licenses changes in 1,731,828
commits, representing the entire history of 16,221
Java projects hosted on GitHub.
In (Alexandre Decan et al., 2016), the authors
explore how the use of GitHub influences the R
ecosystem, for the distribution of packages and for
inter-repository package dependency management.
They show that many R packages hosted on GitHub
Table 1: Summary characteristics of the systems.
System Version Date CF NOP NOC LOC
CODE ANALYSIS
Checkstyle
1
: Coding
standard
checker
5.3 2010-10-19 SF 22 360 23,045
5.5 2011-11-05 SF 22 364 23,369
5.6 2012-09-18 SF 22 364 23,416
5.7 2014-02-03 GH 22 368 23,891
5.8 2014-10-05 GH 22 380 25,727
6.14.1 2015-12-30 GH 24 482 28,056
Classycle
2
:
Dependency
analyser and checker
1.1.1 2005-06-05 SF 7 82 3,936
1.3.1 2007-05-12 SF 7 87 4,503
1.3.3 2008-05-24 SF 7 92 4,658
1.4 2011-04-10 SF 7 100 5,007
1.4.1 2012-09-10 SF 7 101 5,057
1.4.2 2014-11-01 SF 7 104 5,117
WEB CRAWLER
Heritrix
3
:
Extensible,
web-scale,
qualitative,
web-crawler
2.0.2 2008-11-08 SF 14 168 14,791
3.0.0 2009-12-05 SF 17 184 15,272
3.1.0 2011-10-21 SF 17 175 15,226
3.1.1 2012-08-08 GH 17 175 15,366
3.2.0 2014-01-11 GH 18 184 15,367
Master 2016-01-21 GH 18 181 15,318
Web
Harvest
4
:
Web data
extraction
tool
0.26 2006-09-28 SF 11 104 4,090
0.261 2006-10-12 SF 11 104 4,090
0.3 2006-10-27 SF 11 104 4,126
0.5 2007-01-16 SF 12 108 4,416
1.0 2007-10-17 SF 15 248 10,977
2.0 2010-02-17 SF 18 355 19,090
ORM
Hibernate
5
:
Idiomatic
persistence
for Java and
DB
3.0.1 2007-06-29 SF 51 969 69,557
3.5.0 2010-03-31 SF 166 2,448 107,972
3.6.0 CR2 2010-09-29 SF 173 2,677 132,295
3.6.0 F 2010-10-13 GH 173 2,678 132,751
3.6.2 F 2011-03-10 GH 173 2,699 133,771
5.0.5 2015-12-03 GH 189 3,148 213,094
OrmLite
6
:
Lightweight
Object
Relational
Mapping
4.22 2011-05-19 SF 11 156 8,527
4.29 2011-10-25 SF 11 161 10,287
4.37 2012-03-21 SF 11 160 11,241
4.41 2012-06-06 SF 11 167 11,729
4.48 2013-12-16 SF 11 177 12,699
4.49 S 2015-02-18 SF 11 193 13,307
Legend GH: GitHub; SF: Sourceforge.
1
http://checkstyle.sourceforge.net/
2
http://classycle.sourceforge.net/
3
https://webarchive.jira.com/wiki/display/Heritrix
4
http://web-harvest.sourceforge.net/
5
http://hibernate.org/orm/
6
http://ormlite.com/
have inter-repository dependency problems prohibit-
ing their automatic installation.
3 EXPERIMENT SETUP
For this experiment, we selected six projects from
three domains, reported in Table 1: Code Analysis,
Web Crawler, ORM. The table reports also the an-
alyzed versions (and the respective date), the cor-
responding code forge (CF), number of packages
(NOP), classes (NOC) and lines of code (LOC).
Moreover, projects have been chosen to be compile-
ready and with binaries, to avoid errors during the
analysis made by the tools. The projects are written
in Java and the analyzed versions are available at their
respective code forge.
ENASE 2017 - 12th International Conference on Evaluation of Novel Approaches to Software Engineering
294