Identifying Logical Dependencies from Co-Changing Classes
Adelina Diana Stana and Ioana S¸ora
Department of Computer and Information Technology, Politehnica University Timisoara, Romania
Keywords:
Software Evolution, Logical Dependencies, Structural Dependencies.
Abstract:
Emerging software engineering approaches support the idea that logical dependencies should be included next
to structural dependencies in general methods and tools for dependency management. However, logical de-
pendencies are still hard to identify, as not all co-changes during the system evolution represent true logical
dependencies. Our work identifies a set of factors that can be used to filter the recordings of class co-changes
in order to find valid logical dependencies. In order to find the characteristics of logical dependencies, we
analyze the quantitative relationships between the sets of logical and structural dependencies and their inter-
section and differences. We present results obtained through an experimental study on a set of 27 open source
software projects written in Java and C# with their historical evolutions which sum up to over 70000 com-
mit transactions. Identifying valid logical dependencies from co-changing classes will enhance dependency
models used in various software analysis activities.
1 INTRODUCTION
Coupling reflects the degree of interdependence be-
tween different software modules, being a measure of
how closely connected they are. Coupling should be
low in order to ensure the testability, reusability, and
evolvability properties of modules. The traditional
approach on coupling was to quantify the structural
dependencies or interactions between modules, which
both can be determined by source code analysis.
The state of the art has found that modules may
present different kinds and degrees of interdepen-
dence, even if no structural dependencies can be
found by analyzing the source code. Gall (Gall
et al., 1998) identified as logical coupling between
two modules the fact that these modules repeatedly
change together during the historical evolution of the
software system. This can be an indicator of a logical
dependency between these modules.
The concepts of logical coupling and logical de-
pendencies were first used in different analysis tasks,
all related to changes: for software change impact
analysis (Ren et al., 2005), for identifying the po-
tential ripple effects caused by software changes dur-
ing software maintenance and evolution (Oliva and
Gerosa, 2015), (Oliva and Gerosa, 2011), (Poshy-
vanyk et al., 2009), (Kagdi et al., 2010) or for their
link to deffects (Wiese et al., 2015), (Zimmermann
et al., 2004).
The current trend recommends that general depen-
dency management methods and tools should also in-
clude logical dependencies besides the structural de-
pendencies (Oliva and Gerosa, 2011), (Ajienka and
Capiluppi, 2017). Different applications based on
dependency analysis could be improved if, beyond
structural dependencies, they also take into account
the hidden non-structural dependencies. For exam-
ple, works which investigate different methods for ar-
chitectural reconstruction (S¸ora et al., 2010), (Sora,
2013), (S¸ora, 2015), all of them based on the informa-
tion provided by structural dependencies, could en-
rich their dependency models by taking into account
also logical dependencies. However, a thorough sur-
vey (Ducasse and Pollet, 2009) shows that historical
information has been rarely used in architectural re-
construction. Another survey (Shtern and Tzerpos,
2012) mentions one possible explanation why histor-
ical information have been rarely used in architec-
tural reconstruction: the size of the extracted infor-
mation. One problem is the size of the extraction pro-
cess, which has to analyze many versions from the
historical evolution of the system. Another problem
is the big number of pairs of classes which record co-
changes and how they relate to the number of pairs
of classes with structural dependencies. Logical de-
pendencies should integrate harmoniously with struc-
tural dependencies in an unitary dependency model:
valid logical dependencies should not be omitted from
the dependency model, but structural dependencies
should not be engulfed by questionable logical depen-
dencies generated by casual co-changes. Thus, in or-
der to add logical dependencies besides structural de-
pendencies in dependency models, class co-changes
must be filtered until they remain only a reduced but
486
Stana, A. and ¸Sora, I.
Identifying Logical Dependencies from Co-Changing Classes.
DOI: 10.5220/0007758104860493
In Proceedings of the 14th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2019), pages 486-493
ISBN: 978-989-758-375-9
Copyright
c
2019 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
relevant set of valid logical dependencies.
In the next section we analyze the state of the art
results for determining logical dependencies from the
point of view of their quantitative relationship with
structural dependencies. Starting from this analysis,
in Section 3 we identify a set of factors that can be
used to filter the recordings of class co-changes such
that valid logical dependencies are identified and we
formulate the research questions. In order to answer
these research questions, we have built a tool that ex-
tracts structural and logical dependencies in differ-
ent scenarios. We have analyzed several open-source
software systems of different sizes with our tool, ob-
taining the experimental results presented in Section
4. Section 5 discusses the experimental results and
answers the research questions.
2 STATE OF THE ART
There are researches that investigated quantitative as-
pects of logical dependencies and their interplay with
structural dependencies. Oliva and Gerosa (Oliva
and Gerosa, 2011), (Oliva and Gerosa, 2015) have
found first that the set of co-changed classes was
much larger compared to the set of structurally cou-
pled classes. They identified structural and logical de-
pendencies from 150000 revisions from the Apache
Software Foundation SVN repository. Also they con-
cluded that in at least 91% of the cases, logical depen-
dencies involve files that are not structurally related.
This implies that not all of the change dependencies
are related to structural dependencies and there could
be other reasons for software artifacts to be change
dependent.
Ajienka and Capiluppi also studied the interplay
between logical and structural coupling of software
classes. In (Ajienka and Capiluppi, 2017) they per-
form experiments on 79 open source systems: for
each system, they determine the sets of structural
dependencies, the set of logical dependencies and
the intersections of these sets. They quantify the
overlapping or intersection of these sets, coming to
the conclusion that not all co-changed class pairs
(classes with logical dependencies) are also linked
by structural dependencies. One other interesting as-
pect which has not been investigated by the authors in
(Ajienka and Capiluppi, 2017) is the total number of
logical dependencies, reported to the total number of
structural dependencies of a software systems. How-
ever, they provide the raw data of their measurements
and we calculated the ratio between the number of
logical dependencies and the number of structural de-
pendencies for all the projects analyzed by them: the
average ratio resulted 12. This means that, using their
method of detecting logical dependencies for a sys-
tem, the number of logical dependencies outnumbers
by one order of magnitude the number of structural
dependencies. We consider that such a big number of
logical dependencies needs additional filtering.
Another kind of non-structural dependencies are
the semantic or conceptual dependencies (Poshy-
vanyk et al., 2009), (Kagdi et al., 2010). Seman-
tic coupling is given by the degree to which the
identifiers and comments from different classes are
similar to each other. Semantic coupling could be
an indicator for logical dependencies, as studied by
Ajienka et al in (Ajienka et al., 2018). The exper-
iments showed that a large number of co-evolving
classes do not present semantic coupling, adding to
the earlier research which showed that a large number
of co-evolving classes do not present structural cou-
pling. All these experimental findings rise the ques-
tion whether it is a legitimate approach to accept all
co-evolving classes as logical coupling.
Changes made to two components in the same
commit do not necessarily indicate the co-evolution
of the two. These changes could be completely un-
related. The study (Yu, 2007) acknowledges the fact
that evolutionary coupling could also be determined
accidentally by two components changing in the same
commit (independent evolution, as it is called) and
this will bring noise to the measurement of evolution-
ary coupling.
Zimmermann et al (Zimmermann et al., 2004) in-
troduced data mining techniques to obtain association
rules from version histories. The mined association
rules have a probabilistic interpretation based on the
amount of evidence in the transactions they are de-
rived from. This amount of evidence is determined
by two measures: support and confidence. They de-
veloped a tool to predict future or missing changes.
In order to add logical dependencies besides struc-
tural dependencies as inputs for methods and tools
for dependency management and analysis, class co-
changes must be filtered until they remain only a re-
duced but relevant set of valid logical dependencies.
3 RESEARCH QUESTIONS
In this work, we explore several ways of filtering log-
ical dependencies. We identify following factors that
could be used to filter logical dependencies: the max-
imum size of commit transactions which are accepted
to generate logical dependencies, the minimum num-
ber of occurrences for a co-change to be considered
a logical dependency, and accepting changes in com-
Identifying Logical Dependencies from Co-Changing Classes
487
ments as a source of logical dependencies.
We will address the following research questions:
Question 1. Which is the most frequent size for a
commit transaction?
Motivation: We calculate the size for a commit
transaction as the total number of source code files
that have changed. Even though the versioning sys-
tems best practices encourage developers to commit
often which implies small size commit transactions,
the size of the commit transaction relies also on the
developers culture. We think that finding the most
frequent size for a commit transaction could help into
setting ranges for what is a normal size commit trans-
action for the systems. And also to set a target commit
transaction group from which we can extract logical
dependencies.
Question 2. Is it necessary to set a threshold on
the size of commit transactions which are considered
to generate valid logical dependencies ?
Motivation: A big commit transaction can indi-
cate that a merge with another branch or a folder re-
naming has been made. In this case, a series of ir-
relevant logical dependencies can be introduced since
not all the files are updated in the same time for a de-
velopment reason. Different works have chosen fixed
threshold values for the maximum number of files ac-
cepted in a commit. Cappiluppi and Ajienka, in their
works (Ajienka and Capiluppi, 2017), (Ajienka et al.,
2018) only take into consideration commits with less
then 10 source code files changed in building the log-
ical dependencies. The research of Beck et al (Beck
and Diehl, 2011) only takes in consideration trans-
actions with up to 25 files. The research (Oliva and
Gerosa, 2011) provided also a quantitative analysis of
the number of files per revision; Based on the analysis
of 40,518 revisions, the mean value obtained for the
number of files in a revision is 6 files. However, stan-
dard deviation value shows that the dispersion is high.
Based on all these considerations, we will experiment
with different threshold values for the maximum size
of commit transactions which are accepted to gener-
ate logical dependencies.
Question 3. Considering changes which are only
in comments as valid can lead to additional logical
dependencies? How many logical dependencies are
introduced by considering comment changes as valid
changes and in what percentage can this influence the
analysis?
Motivation: Not all the commits that have source
code files changed include real code changes, some
of them can be only comments changes. We consider
that there is probably no logical dependency between
two classes that change in the same time only by com-
ments changes. It could be that someone is adding
implementation documentation or copyright or own-
ership information. Some studies have not considered
this aspect, so we will analyse the impact of consid-
ering/not considering changes in comments as valid
logical dependencies.
Question 4. How many occurrences of a logical
dependency are needed to consider it a valid logical
dependency?
Motivation: One occurrence of a logical depen-
dency between two classes can be a valid logical de-
pendency, but can also be a coincidence. Taking into
consideration only logical dependencies with multiple
occurrences as valid dependencies can lead to more
accurate logical dependencies and more accurate re-
sults. On the other hand, if the project studied has
a relatively small amount of commits, the probabil-
ity to find multiple updates of the same classes in the
same time can be small, so filtering after the num-
ber of occurrences can lead to filtering all the logical
dependencies extracted. Giving the fact that we will
study multiple projects of different sizes and number
of commits, we will analyze also the impact of this
filtering on different projects.
Question 5. How does filtering affect the overlap
between structural and logical dependencies?
Motivation: Traditional software engineering con-
siders coupling as the cause for co-changes, thus logi-
cal and structural dependencies should present a very
big overlap. However, in (Oliva and Gerosa, 2011)
and (Ajienka and Capiluppi, 2017) it has been ex-
perimentally determined that a very large number of
logical dependencies are outside the intersection with
structural dependencies. We will investigate the influ-
ence of different filtering degrees on the intersections
between logical and structural dependencies.
4 EXPERIMENTAL RESULTS
We have analyzed a set of open-source projects found
on GitHub
1
(Kalliamvakou et al., 2016) in order to ex-
tract the structural and logical dependencies between
classes. Table 1 enumerates all the systems studied.
The 1st column assigns the projects IDs; 2nd column
shows the project name; 3rd column shows the num-
ber of entities (classes and interfaces) extracted; 4th
column shows the number of most recent commits an-
alyzed from the active branch of each project and the
5th column shows the language in which the project
was developed.
In a first experiment, we determined the commit
sizes cs for all commit transactions for all projects and
1
http://github.com/
ENASE 2019 - 14th International Conference on Evaluation of Novel Approaches to Software Engineering
488
Table 1: Summary of open source projects studied.
ID Project Nr. of Nr. of Type
entites commits
1 bluecove 586 894 java
2 aima-java 987 818 java
3 powermock 1084 893 java
4 restfb 783 1188 java
5 rxjava 2673 2468 java
6 metro-jax-ws 1103 2222 java
7 mockito 1409 1572 java
8 grizzly 1592 3122 java
9 shipkit 242 1483 java
10 OpenClinica 1653 3749 java
11 robolectric 2050 5029 java
12 aeron 541 5101 java
13 antlr4 1381 3449 java
14 mcidasv 805 3668 java
15 ShareX 919 2505 C#
16 aspnetboilerplate 2353 1615 C#
17 orleans 3485 3353 C#
18 cli 767 2397 C#
19 cake 2250 1853 C#
20 Avalonia 1677 2445 C#
21 EntityFramework 7107 2443 C#
22 jellyfin 2179 4065 C#
23 PowerShell 861 2033 C#
24 WeiXinMPSDK 2029 2723 C#
25 ArchiSteamFarm 117 2181 C#
26 VisualStudio 1016 4417 C#
27 CppSharp 259 3882 C#
grouped them into 4 categories: small transactions
(ST), when cs 5; medium transactions (MT), when
5 < cs 10; large transactions (LT), when 10 < cs
20; and very large transactions (VLT), when 20 < cs.
Also, we counted how many logical dependencies are
generated by transactions from each category. The
results are presented in Tables 2 and 3 as percent dis-
tributions.
In the main series of experiments, for each system,
we extracted its structural dependencies, its logical
dependencies and determined the overlap between the
two dependencies sets, in various experimental condi-
tions.
One variable experimental condition is whether
changes located in comments contribute towards log-
ical dependencies. This condition distinguishes be-
tween two different cases:
with comments: a change in source code files is
counted towards a logical dependency, even if the
change is inside comments in all files
without comments: commits that changed source
code files only by editing comments are ignored
as logical dependencies
In all cases, we varied the following threshold val-
Table 2: The percent distribution of commit transactions in
4 categories according to their size.
ST MT LT VLT
1 82.55 10.85 4.14 2.46
2 71.39 13.08 7.82 7.7
3 73.91 13.33 6.27 6.49
4 84.51 8.5 3.11 3.87
5 75.2 11.26 5.92 7.62
6 87.8 6.35 2.57 3.29
7 78.18 11.96 5.73 4.13
8 79.63 9.67 5.77 4.93
9 83.82 9.58 4.18 2.43
10 82.58 9.66 5.31 2.45
11 82.96 8.55 4.89 3.6
12 87.69 8.51 2.96 0.84
13 81.19 8.23 5.54 5.03
14 96.7 1.94 0.71 0.65
15 89.27 7.11 2.17 1.45
16 77.28 12.76 5.51 4.46
17 70.3 12.53 9.48 7.69
18 73.93 12.27 6.63 7.18
19 69.99 14.41 6.91 8.69
20 68.79 10.1 7.44 13.66
21 60.66 17.63 10.04 11.66
22 73.97 12.63 6.94 6.47
23 83.13 6.64 4.18 6.05
24 79.43 8.56 5.66 6.35
25 94.54 3.62 1.1 0.73
26 76.21 9.74 5.84 8.22
27 86.17 8.53 4.12 1.18
Avg 79.7 9.93 5.22 5.16
ues:
commit size (cs): the maximum size of commit
transactions which are accepted to generate log-
ical dependencies. The values for this threshold
were 5, 10, 20 and no threshold (infinity).
number of occurrences (occ): the minimum num-
ber of repeated occurrences for a co-change to be
counted as logical dependency. The values for this
threshold were 1, 2, 3 and 4.
The six tables below present the synthesis of our
experiments. We have computed the following val-
ues:
the mean ratio of the number of logical dependen-
cies (LD) to the number of structural dependen-
cies (SD)
the mean percentage of structural dependencies
that are also logical dependencies (calculated
from the number of overlaps divided to the num-
ber of structural dependencies)
the mean percentage of logical dependencies that
Identifying Logical Dependencies from Co-Changing Classes
489
Table 3: The percent distribution of logical dependencies
generated by commit transactions from each size category.
ST MT LT VLT
1 9,70 2,61 4,12 83,57
2 1,50 1,87 3,59 93,03
3 3,75 5,02 5,97 85,26
4 31,40 7,64 8,84 52,12
5 1,01 3,92 4,67 90,41
6 0,37 0,22 0,47 98,94
7 1,48 1,86 2,48 94,18
8 1,44 2,01 3,73 92,82
9 6,77 7,88 11,99 73,36
10 12,53 17,77 21,59 48,10
11 6,80 8,25 13,74 71,22
12 22,09 21,73 20,51 35,67
13 10,46 20,48 8,08 60,98
14 1,90 0,90 1,29 95,91
15 1,14 1,25 1,86 95,76
16 1,89 2,47 3,12 92,52
17 2,13 2,19 5,25 90,44
18 1,77 3,66 6,51 88,06
19 0,59 0,68 1,57 97,17
20 0,41 0,73 1,42 97,45
21 1,50 1,22 37,85 59,43
22 2,00 4,12 5,95 87,92
23 1,02 1,22 0,94 96,82
24 0,71 0,74 1,63 96,91
25 37,86 16,51 11,12 34,50
26 2,86 3,22 6,79 87,13
27 23,43 21,56 28,28 26,73
Avg 6,98 5,99 8,27 78,76
are also structural dependencies (calculated from
the number of overlaps divided to the number of
logical dependencies)
In all the six tables, 4, 5, 6, 7, 8, 9 we have on
columns the values used for the commit size cs, while
on rows we have the values for the number of occur-
rences threshold occ. The tables contain median val-
ues obtained for experiments done under all combina-
tions of the two threshold values, on all test systems.
In all tables, the upper right corner corresponds to
the most relaxed filtering conditions, while the lower
left corner corresponds to the most restrictive filtering
conditions.
Table 4: Ratio of number of LD to number of SD, case with
comments.
cs 5 cs 10 cs 20 cs <
occ 1 3,39 5,67 9,00 80,31
occ 2 2,24 3,47 5,02 60,14
occ 3 1,04 2,53 3,52 44,68
occ 4 0,90 2,16 2,88 33,47
Table 5: Ratio of number of LD to number of SD, case
without comments.
cs 5 cs 10 cs 20 cs <
occ 1 3,24 5,33 7,90 67,16
occ 2 1,35 3,27 4,72 47,39
occ 3 1,00 1,67 2,49 32,39
occ 4 0,43 1,26 1,93 22,15
Table 6: Percentage of SD that are also LD, case with com-
ments.
cs 5 cs 10 cs 20 cs <
occ 1 19,75 29,86 39,29 76,59
occ 2 12,50 20,20 27,68 66,11
occ 3 8,49 14,22 19,94 55,99
occ 4 6,58 10,95 15,76 47,12
Table 7: Percentage of SD that are also LD, case without
comments.
cs 5 cs 10 cs 20 cs <
occ 1 18,88 28,47 37,44 71,12
occ 2 11,87 19,03 25,93 59,58
occ 3 8,00 13,09 18,15 48,65
occ 4 5,85 9,94 14,27 39,07
Table 8: Percentage of LD that are also SD, case with com-
ments.
cs 5 cs 10 cs 20 cs <
occ 1 12,02 8,86 6,72 1,79
occ 2 15,05 11,71 9,38 2,21
occ 3 17,45 13,97 11,57 2,86
occ 4 18,96 15,28 12,94 3,67
Table 9: Percentage of LD that are also SD, case without
comments.
cs 5 cs 10 cs 20 cs <
occ 1 12,05 9,02 6,98 1,93
occ 2 15,08 12,03 9,66 2,42
occ 3 17,78 14,37 12,24 3,28
occ 4 19,22 15,59 13,30 4,21
5 DISCUSSION
This section uses the experimental results to answer
the research questions outlined in section 3.
Question 1. Which is the most frequent size for a
commit transaction ?
Table 2 presents the size distribution for commit
transactions in percentage relative to the total number
of commits for each system presented in Table 1. The
small commit transactions (with less than 5 source
code files)represent in average 78.76% from the to-
tal number of transactions. On the opposite side are
ENASE 2019 - 14th International Conference on Evaluation of Novel Approaches to Software Engineering
490
the very large commit transactions (with more than 20
source code files) which represent an average percent-
age of 5.99% from the total number of transactions.
Based on these results we can say that the vast ma-
jority of the commit transactions have no more than 5
source code files.
Question 2. Is it necessary to set a threshold on
the size of commit transactions which are consid-
ered to generate valid logical dependencies ? Logi-
cal dependencies are generated for all pairs of classes
which have changed in the same commit transaction.
The number of logical dependencies generated from a
commit transaction is proportional with the square of
the number of participating classes. Table 3 presents
how many logical dependencies are extracted from
commit transactions of different sizes. Based on the
results from Table 3 and Table 2 we see that the com-
mit transactions with less than 5 files, which are the
most frequent types of commits, produce in average
only 6.98% of the total logical dependencies extracted
from the systems. On the other hand, a small amount
of very large commits (those with more than 20 source
code files) can lead to a vast amount of logical de-
pendencies. But very large commit transactions can
be caused by merging development branches into the
main branch. In this case the very large commit trans-
action is actually the sum of many other commit trans-
actions made into a different branch and we cannot
consider them as one single commit and definitely we
cannot consider the logical dependencies extracted as
valid logical dependencies. So a threshold to filter this
kind of commit transactions is required.
Based on the results presented in Tables 4 and 5,
the number of changed files taken into consideration
has an important influence over the ratio of the num-
ber of logical dependencies to the number of struc-
tural dependencies. If no threshold is set for the num-
ber of files in a commit (the cases in the last column
in Tables 4 and 5 ) then the number of logical depen-
dencies outnumbers the structural dependencies with
a factor of up to 80. The maximum factor is measured
in the case when no filtering is done on the number
of occurrences (first row). In this case, we can not
talk about logical dependencies, but about classes that
happened to once change in the same time, by various
reasons. The number of pairs of classes that happen to
once change in the same time is up to 80 times bigger
than the number of pairs of classes presenting struc-
tural dependencies.
When filtering is done according to conditions on
the number of occurrences, we observe in Tables 4
and 5 that the values on the last column still do not
fall below 20. This number is still too big to accept
for logical dependencies. It is clear that it is necessary
to put a threshold on the number of files accepted in a
commit in order to filter out noise.
If we refer to the overlap between structural and
logical dependencies, we can see in Tables 6 and 7
that the percentage of structural dependencies which
are also logical dependencies is as well affected by
setting a threshold on the number of files accepted in
a commit. Setting a threshold leads to a smaller num-
ber of logical dependencies overall and this is what
affects also the smaller number of structural depen-
dencies that are also logical dependencies. However,
we can see that the percentage of dependencies in the
overlap decreases much slower than the total number
of logical dependencies. For example, when setting
the cs threshold at 10, we see in Table 4 that the total
number of logical dependencies decreases approx 15
times compared with no threshold. In the same time,
we can see in Table 6 that the overlap between the log-
ical and structural dependencies decreases less, only
approx 3 times. This confirms the fact that the logical
dependencies filtered out were not true dependencies.
It is clear that setting a threshold on the maximum
number of files accepted in a commit is essential for
the quality of finding true logical dependencies.
Question 3. Considering changes only in com-
ments as valid can lead to additional logical depen-
dencies? How many logical dependencies are in-
troduced by considering comment changes as valid
changes and in what percentage this can influence the
analysis?
In order to assess the influence of comments, we
compare pairwise Tables 4 and 5, Tables 6 and 7
and Tables 8 and 9. We observe that, although there
are some differences between pairs of measurements
done in similar conditions with and without com-
ments, the differences are not significant.
In the case of the ratio of the number of logical de-
pendencies to the number of structural dependencies,
from Tables 4 and 5 we can see that the maximum
difference is for the values from the position of the
first row, last column. Without comments, the value
of the ratio is 67.1, compared to the value with com-
ments which is 80.3. The decrease represents 13% of
the value with comments. In the case of the percent-
age of structural dependencies that are also logical de-
pendencies, from Tables 6 and 7, we can see that the
maximum difference is also for the values from the
first row, last column. Without comments, the overlap
is 76.5, compared to the value with comments which
is 71.1. The decrease represents less than 6% of the
value with comments. We notice that the differences
between the two cases are very small.
Question 4. How many occurrences of a logical
dependency are needed to consider it a valid logical
Identifying Logical Dependencies from Co-Changing Classes
491
dependency?
If we look at consecutive rows in Table 4 or in Ta-
ble 5, corresponding to increased threshold values for
the number of occurrences, we can roughly say that
increasing by 1 the occurrence threshold while main-
taining the other conditions reduces with one third the
total number of logical dependencies.
In order to find the appropriate level of filtering
out false logical dependencies, we assume as a rule of
thumb that the number of logical dependencies should
not be bigger that the number of structural dependen-
cies. Choosing the most restrictive combination of
thresholds (a commit size threshold of 5 files com-
bined with an occurrence threshold of 4) leads to a
number of logical dependencies which comes near to
the number of structural dependencies.
Question 5. How does filtering affect the overlap
between structural and logical dependencies?
The overlap between structural and logical depen-
dencies is given by the number of pairs of classes that
have both structural and logical dependencies. We
evaluate this overlap as a percentage relative to the
number of structural dependencies in Tables 6 and 7,
respectively as a percentage relative to the number of
logical dependencies in Tables 8 and 9.
A first observation from Tables 6 and 7 is that not
all pairs of classes with structural dependencies co-
change. The biggest value for the percentage of struc-
tural dependencies that are also logical dependencies
is 76.5% obtained in the case when no filterings are
done.
From Tables 8 and 9 we notice that the percent-
age of logical dependencies which are also structural
is always low to very low. This means that most co-
changes are recorded between classes that have no
structural dependencies to each other.
6 FUTURE WORK
We consider that in the future, the validation of ex-
tracted logical dependencies will occur by using them
to enhance dependency models used by different ap-
plications such as architectural reconstruction (S¸ora
et al., 2010), (S¸ora, 2015), and evaluating the positive
impact on their results.
In this work we have extracted logical dependen-
cies from all the revisions of the system, and structural
dependencies from the last revision of the system. In
future work we will take into account also structural
dependencies from all the revisions of the system, in
order to filter out the old logical dependencies. Some
logical dependencies may have been also structural in
previous revisions of the system but not in the current
one. Another way to investigate this problem could
be to study the trend of occurrencies of co-changes: if
co-changes between a pair of classes used to happen
more often in the remote past than in the more recent
past, it may be a sign that the problem causing the
logical coupling has been removed in the mean time.
7 CONCLUSION
In this work we experimentally define methods to
filter out the valid logical dependencies from co-
changing classes.
Our experiments show that the most important
factors which affect the quality of logical dependen-
cies are: the maximum size of commit transactions
which are accepted to generate logical dependencies,
and the minimum number of repeated occurrences for
a co-change to be counted as logical dependency.
We conclude that it is important to put a threshold
on the maximum size of commit transactions which
are accepted to generate logical dependencies. Only
small commit transactions (changing up to 5 source
code files) can be reliably used for introducing logi-
cal dependencies. We have also determined that small
commit transactions are the most frequent kind of
transactions, representing in average 80% of all com-
mit transactions. Under these conditions, we have de-
termined that increasing the threshold for the mini-
mum number of repeated occurrences for a co-change
to be counted as a logical dependency reduces signif-
icantly the number of logical dependencies. In aver-
age, increasing with 1 the threshold for repeated oc-
currences determines a reduction to half for the num-
ber of logical dependencies. A value of 4 for the
threshold for repeated occurrences, combined with
the condition of accepting only small commit trans-
actions, already keeps the number of logical depen-
dencies in the same range as the number of structural
dependencies. Future work will investigate further
the issue of repeated occurrences, analyzing also their
trend in time.
The analysis of the experimental data shows that
logical dependencies are distinct from structural de-
pendencies. Even after filtering, a very big percent-
age of logical dependencies are between classes with-
out structural dependencies. This leads to the conclu-
sion that including into dependency models also logi-
cal dependencies besides structural dependencies has
the potential to improve analysis applications based
on dependency models.
ENASE 2019 - 14th International Conference on Evaluation of Novel Approaches to Software Engineering
492
ACKNOWLEDGEMENTS
This work was partially supported by a grant
of the Romanian National Authority for Scientific
Research and Innovation, CNCS/CCCDI UEFIS-
CDI, project number PN-III-P2-2.1-PED-2016-0999,
within PNCDI III.
REFERENCES
Ajienka, N. and Capiluppi, A. (2017). Understanding the
interplay between the logical and structural coupling
of software classes. Journal of Systems and Software,
134:120–137.
Ajienka, N., Capiluppi, A., and Counsell, S. (2018). An em-
pirical study on the interplay between semantic cou-
pling and co-change of software classes. Empirical
Software Engineering, 23(3):1791–1825.
Beck, F. and Diehl, S. (2011). On the congruence of mod-
ularity and code coupling. In Proceedings of the 19th
ACM SIGSOFT Symposium and the 13th European
Conference on Foundations of Software Engineering,
ESEC/FSE ’11, pages 354–364, New York, NY, USA.
ACM.
S¸ ora, I. (2015). Helping program comprehension of large
software systems by identifying their most important
classes. In Evaluation of Novel Approaches to Soft-
ware Engineering - 10th International Conference,
ENASE 2015, Barcelona, Spain, April 29-30, 2015,
Revised Selected Papers, pages 122–140. Springer In-
ternational Publishing.
S¸ ora, I., Glodean, G., and Gligor, M. (2010). Soft-
ware architecture reconstruction: An approach based
on combining graph clustering and partitioning. In
Computational Cybernetics and Technical Informatics
(ICCC-CONTI), 2010 International Joint Conference
on, pages 259–264.
Ducasse, S. and Pollet, D. (2009). Software architecture
reconstruction: A process-oriented taxonomy. IEEE
Transactions on Software Engineering, 35(4):573–
591.
Gall, H., Hajek, K., and Jazayeri, M. (1998). Detection of
logical coupling based on product release history. In
Proceedings of the International Conference on Soft-
ware Maintenance, ICSM ’98, pages 190–, Washing-
ton, DC, USA. IEEE Computer Society.
Kagdi, H., Gethers, M., Poshyvanyk, D., and Collard, M. L.
(2010). Blending conceptual and evolutionary cou-
plings to support change impact analysis in source
code. In 2010 17th Working Conference on Reverse
Engineering, pages 119–128.
Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., Ger-
man, D. M., and Damian, D. (2016). An in-depth
study of the promises and perils of mining github. Em-
pirical Software Engineering, 21(5):2035–2071.
Oliva, G. A. and Gerosa, M. A. (2011). On the interplay
between structural and logical dependencies in open-
source software. In Proceedings of the 2011 25th
Brazilian Symposium on Software Engineering, SBES
’11, pages 144–153, Washington, DC, USA. IEEE
Computer Society.
Oliva, G. A. and Gerosa, M. A. (2015). Experience
report: How do structural dependencies influence
change propagation? an empirical study. In 26th
IEEE International Symposium on Software Relia-
bility Engineering, ISSRE 2015, Gaithersbury, MD,
USA, November 2-5, 2015, pages 250–260.
Poshyvanyk, D., Marcus, A., Ferenc, R., and Gyim
´
othy, T.
(2009). Using information retrieval based coupling
measures for impact analysis. Empirical Software En-
gineering, 14(1):5–32.
Ren, X., Ryder, B. G., Stoerzer, M., and Tip, F. (2005).
Chianti: a change impact analysis tool for java pro-
grams. In Proceedings. 27th International Conference
on Software Engineering, 2005. ICSE 2005., pages
664–665.
Shtern, M. and Tzerpos, V. (2012). Clustering method-
ologies for software engineering. Adv. Soft. Eng.,
2012:1:1–1:1.
Sora, I. (2013). Software architecture reconstruction
through clustering: Finding the right similarity fac-
tors. In Proceedings of the 1st International Work-
shop in Software Evolution and Modernization - Vol-
ume 1: SEM, (ENASE 2013), pages 45–54. INSTICC,
SciTePress.
Wiese, I. S., Kuroda, R. T., Re, R., Oliva, G. A., and Gerosa,
M. A. (2015). An empirical study of the relation be-
tween strong change coupling and defects using his-
tory and social metrics in the apache aries project. In
Damiani, E., Frati, F., Riehle, D., and Wasserman,
A. I., editors, Open Source Systems: Adoption and Im-
pact, pages 3–12, Cham. Springer International Pub-
lishing.
Yu, L. (2007). Understanding component co-evolution with
a study on linux. Empirical Software Engineering,
12(2):123–141.
Zimmermann, T., Weisgerber, P., Diehl, S., and Zeller, A.
(2004). Mining version histories to guide software
changes. In Proceedings of the 26th International
Conference on Software Engineering, ICSE ’04, pages
563–572, Washington, DC, USA. IEEE Computer So-
ciety.
Identifying Logical Dependencies from Co-Changing Classes
493