Identifying Logical Dependencies from Co-Changing Classes

Adelina Diana Stana and Ioana S¸ora

Department of Computer and Information Technology, Politehnica University Timisoara, Romania

Keywords:

Software Evolution, Logical Dependencies, Structural Dependencies.

Abstract:

Emerging software engineering approaches support the idea that logical dependencies should be included next

to structural dependencies in general methods and tools for dependency management. However, logical de-

pendencies are still hard to identify, as not all co-changes during the system evolution represent true logical

dependencies. Our work identiﬁes a set of factors that can be used to ﬁlter the recordings of class co-changes

in order to ﬁnd valid logical dependencies. In order to ﬁnd the characteristics of logical dependencies, we

analyze the quantitative relationships between the sets of logical and structural dependencies and their inter-

section and differences. We present results obtained through an experimental study on a set of 27 open source

software projects written in Java and C# with their historical evolutions which sum up to over 70000 com-

mit transactions. Identifying valid logical dependencies from co-changing classes will enhance dependency

models used in various software analysis activities.

1 INTRODUCTION

Coupling reﬂects the degree of interdependence be-

tween different software modules, being a measure of

how closely connected they are. Coupling should be

low in order to ensure the testability, reusability, and

evolvability properties of modules. The traditional

approach on coupling was to quantify the structural

dependencies or interactions between modules, which

both can be determined by source code analysis.

The state of the art has found that modules may

present different kinds and degrees of interdepen-

dence, even if no structural dependencies can be

found by analyzing the source code. Gall (Gall

et al., 1998) identiﬁed as logical coupling between

two modules the fact that these modules repeatedly

change together during the historical evolution of the

software system. This can be an indicator of a logical

dependency between these modules.

The concepts of logical coupling and logical de-

pendencies were ﬁrst used in different analysis tasks,

all related to changes: for software change impact

analysis (Ren et al., 2005), for identifying the po-

tential ripple effects caused by software changes dur-

ing software maintenance and evolution (Oliva and

Gerosa, 2015), (Oliva and Gerosa, 2011), (Poshy-

vanyk et al., 2009), (Kagdi et al., 2010) or for their

link to deffects (Wiese et al., 2015), (Zimmermann

et al., 2004).

The current trend recommends that general depen-

dency management methods and tools should also in-

clude logical dependencies besides the structural de-

pendencies (Oliva and Gerosa, 2011), (Ajienka and

Capiluppi, 2017). Different applications based on

dependency analysis could be improved if, beyond

structural dependencies, they also take into account

the hidden non-structural dependencies. For exam-

ple, works which investigate different methods for ar-

chitectural reconstruction (S¸ora et al., 2010), (Sora,

2013), (S¸ora, 2015), all of them based on the informa-

tion provided by structural dependencies, could en-

rich their dependency models by taking into account

also logical dependencies. However, a thorough sur-

vey (Ducasse and Pollet, 2009) shows that historical

information has been rarely used in architectural re-

construction. Another survey (Shtern and Tzerpos,

2012) mentions one possible explanation why histor-

ical information have been rarely used in architec-

tural reconstruction: the size of the extracted infor-

mation. One problem is the size of the extraction pro-

cess, which has to analyze many versions from the

historical evolution of the system. Another problem

is the big number of pairs of classes which record co-

changes and how they relate to the number of pairs

of classes with structural dependencies. Logical de-

pendencies should integrate harmoniously with struc-

tural dependencies in an unitary dependency model:

valid logical dependencies should not be omitted from

the dependency model, but structural dependencies

should not be engulfed by questionable logical depen-

dencies generated by casual co-changes. Thus, in or-

der to add logical dependencies besides structural de-

pendencies in dependency models, class co-changes

must be ﬁltered until they remain only a reduced but

486

Stana, A. and ¸Sora, I.

Identifying Logical Dependencies from Co-Changing Classes.

DOI: 10.5220/0007758104860493

In Proceedings of the 14th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2019), pages 486-493

ISBN: 978-989-758-375-9

relevant set of valid logical dependencies.

In the next section we analyze the state of the art

results for determining logical dependencies from the

point of view of their quantitative relationship with

structural dependencies. Starting from this analysis,

in Section 3 we identify a set of factors that can be

used to ﬁlter the recordings of class co-changes such

that valid logical dependencies are identiﬁed and we

formulate the research questions. In order to answer

these research questions, we have built a tool that ex-

tracts structural and logical dependencies in differ-

ent scenarios. We have analyzed several open-source

software systems of different sizes with our tool, ob-

taining the experimental results presented in Section

4. Section 5 discusses the experimental results and

answers the research questions.

2 STATE OF THE ART

There are researches that investigated quantitative as-

pects of logical dependencies and their interplay with

structural dependencies. Oliva and Gerosa (Oliva

and Gerosa, 2011), (Oliva and Gerosa, 2015) have

found ﬁrst that the set of co-changed classes was

much larger compared to the set of structurally cou-

pled classes. They identiﬁed structural and logical de-

pendencies from 150000 revisions from the Apache

Software Foundation SVN repository. Also they con-

cluded that in at least 91% of the cases, logical depen-

dencies involve ﬁles that are not structurally related.

This implies that not all of the change dependencies

are related to structural dependencies and there could

be other reasons for software artifacts to be change

dependent.

Ajienka and Capiluppi also studied the interplay

between logical and structural coupling of software

classes. In (Ajienka and Capiluppi, 2017) they per-

form experiments on 79 open source systems: for

each system, they determine the sets of structural

dependencies, the set of logical dependencies and

the intersections of these sets. They quantify the

overlapping or intersection of these sets, coming to

the conclusion that not all co-changed class pairs

(classes with logical dependencies) are also linked

by structural dependencies. One other interesting as-

pect which has not been investigated by the authors in

(Ajienka and Capiluppi, 2017) is the total number of

logical dependencies, reported to the total number of

structural dependencies of a software systems. How-

ever, they provide the raw data of their measurements

and we calculated the ratio between the number of

logical dependencies and the number of structural de-

pendencies for all the projects analyzed by them: the

average ratio resulted 12. This means that, using their

method of detecting logical dependencies for a sys-

tem, the number of logical dependencies outnumbers

by one order of magnitude the number of structural

dependencies. We consider that such a big number of

logical dependencies needs additional ﬁltering.

Another kind of non-structural dependencies are

the semantic or conceptual dependencies (Poshy-

vanyk et al., 2009), (Kagdi et al., 2010). Seman-

tic coupling is given by the degree to which the

identiﬁers and comments from different classes are

similar to each other. Semantic coupling could be

an indicator for logical dependencies, as studied by

Ajienka et al in (Ajienka et al., 2018). The exper-

iments showed that a large number of co-evolving

classes do not present semantic coupling, adding to

the earlier research which showed that a large number

of co-evolving classes do not present structural cou-

pling. All these experimental ﬁndings rise the ques-

tion whether it is a legitimate approach to accept all

co-evolving classes as logical coupling.

Changes made to two components in the same

commit do not necessarily indicate the co-evolution

of the two. These changes could be completely un-

related. The study (Yu, 2007) acknowledges the fact

that evolutionary coupling could also be determined

accidentally by two components changing in the same

commit (independent evolution, as it is called) and

this will bring noise to the measurement of evolution-

ary coupling.

Zimmermann et al (Zimmermann et al., 2004) in-

troduced data mining techniques to obtain association

rules from version histories. The mined association

rules have a probabilistic interpretation based on the

amount of evidence in the transactions they are de-

rived from. This amount of evidence is determined

by two measures: support and conﬁdence. They de-

veloped a tool to predict future or missing changes.

In order to add logical dependencies besides struc-

tural dependencies as inputs for methods and tools

for dependency management and analysis, class co-

changes must be ﬁltered until they remain only a re-

duced but relevant set of valid logical dependencies.

3 RESEARCH QUESTIONS

In this work, we explore several ways of ﬁltering log-

ical dependencies. We identify following factors that

could be used to ﬁlter logical dependencies: the max-

imum size of commit transactions which are accepted

to generate logical dependencies, the minimum num-

ber of occurrences for a co-change to be considered

a logical dependency, and accepting changes in com-

Identifying Logical Dependencies from Co-Changing Classes

487

ments as a source of logical dependencies.

We will address the following research questions:

Question 1. Which is the most frequent size for a

commit transaction?

Motivation: We calculate the size for a commit

transaction as the total number of source code ﬁles

that have changed. Even though the versioning sys-

tems best practices encourage developers to commit

often which implies small size commit transactions,

the size of the commit transaction relies also on the

developers culture. We think that ﬁnding the most

frequent size for a commit transaction could help into

setting ranges for what is a normal size commit trans-

action for the systems. And also to set a target commit

transaction group from which we can extract logical

dependencies.

Question 2. Is it necessary to set a threshold on

the size of commit transactions which are considered

to generate valid logical dependencies ?

Motivation: A big commit transaction can indi-

cate that a merge with another branch or a folder re-

naming has been made. In this case, a series of ir-

relevant logical dependencies can be introduced since

not all the ﬁles are updated in the same time for a de-

velopment reason. Different works have chosen ﬁxed

threshold values for the maximum number of ﬁles ac-

cepted in a commit. Cappiluppi and Ajienka, in their

works (Ajienka and Capiluppi, 2017), (Ajienka et al.,

2018) only take into consideration commits with less

then 10 source code ﬁles changed in building the log-

ical dependencies. The research of Beck et al (Beck

and Diehl, 2011) only takes in consideration trans-

actions with up to 25 ﬁles. The research (Oliva and

Gerosa, 2011) provided also a quantitative analysis of

the number of ﬁles per revision; Based on the analysis

of 40,518 revisions, the mean value obtained for the

number of ﬁles in a revision is 6 ﬁles. However, stan-

dard deviation value shows that the dispersion is high.

Based on all these considerations, we will experiment

with different threshold values for the maximum size

of commit transactions which are accepted to gener-

ate logical dependencies.

Question 3. Considering changes which are only

in comments as valid can lead to additional logical

dependencies? How many logical dependencies are

introduced by considering comment changes as valid

changes and in what percentage can this inﬂuence the

analysis?

Motivation: Not all the commits that have source

code ﬁles changed include real code changes, some

of them can be only comments changes. We consider

that there is probably no logical dependency between

two classes that change in the same time only by com-

ments changes. It could be that someone is adding

implementation documentation or copyright or own-

ership information. Some studies have not considered

this aspect, so we will analyse the impact of consid-

ering/not considering changes in comments as valid

logical dependencies.

Question 4. How many occurrences of a logical

dependency are needed to consider it a valid logical

dependency?

Motivation: One occurrence of a logical depen-

dency between two classes can be a valid logical de-

pendency, but can also be a coincidence. Taking into

consideration only logical dependencies with multiple

occurrences as valid dependencies can lead to more

accurate logical dependencies and more accurate re-

sults. On the other hand, if the project studied has

a relatively small amount of commits, the probabil-

ity to ﬁnd multiple updates of the same classes in the

same time can be small, so ﬁltering after the num-

ber of occurrences can lead to ﬁltering all the logical

dependencies extracted. Giving the fact that we will

study multiple projects of different sizes and number

of commits, we will analyze also the impact of this

ﬁltering on different projects.

Question 5. How does ﬁltering affect the overlap

between structural and logical dependencies?

Motivation: Traditional software engineering con-

siders coupling as the cause for co-changes, thus logi-

cal and structural dependencies should present a very

big overlap. However, in (Oliva and Gerosa, 2011)

and (Ajienka and Capiluppi, 2017) it has been ex-

perimentally determined that a very large number of

logical dependencies are outside the intersection with

structural dependencies. We will investigate the inﬂu-

ence of different ﬁltering degrees on the intersections

between logical and structural dependencies.

4 EXPERIMENTAL RESULTS

We have analyzed a set of open-source projects found

on GitHub

(Kalliamvakou et al., 2016) in order to ex-

tract the structural and logical dependencies between

classes. Table 1 enumerates all the systems studied.

The 1st column assigns the projects IDs; 2nd column

shows the project name; 3rd column shows the num-

ber of entities (classes and interfaces) extracted; 4th

column shows the number of most recent commits an-

alyzed from the active branch of each project and the

5th column shows the language in which the project

was developed.

In a ﬁrst experiment, we determined the commit

sizes cs for all commit transactions for all projects and

http://github.com/

ENASE 2019 - 14th International Conference on Evaluation of Novel Approaches to Software Engineering

488

Table 1: Summary of open source projects studied.

ID Project Nr. of Nr. of Type

entites commits

1 bluecove 586 894 java

2 aima-java 987 818 java

3 powermock 1084 893 java

4 restfb 783 1188 java

5 rxjava 2673 2468 java

6 metro-jax-ws 1103 2222 java

7 mockito 1409 1572 java

8 grizzly 1592 3122 java

9 shipkit 242 1483 java

10 OpenClinica 1653 3749 java

11 robolectric 2050 5029 java

12 aeron 541 5101 java

13 antlr4 1381 3449 java

14 mcidasv 805 3668 java

15 ShareX 919 2505 C#

16 aspnetboilerplate 2353 1615 C#

17 orleans 3485 3353 C#

18 cli 767 2397 C#

19 cake 2250 1853 C#

20 Avalonia 1677 2445 C#

21 EntityFramework 7107 2443 C#

22 jellyﬁn 2179 4065 C#

23 PowerShell 861 2033 C#

24 WeiXinMPSDK 2029 2723 C#

25 ArchiSteamFarm 117 2181 C#

26 VisualStudio 1016 4417 C#

27 CppSharp 259 3882 C#

grouped them into 4 categories: small transactions

(ST), when cs ≤ 5; medium transactions (MT), when

5 < cs ≤ 10; large transactions (LT), when 10 < cs ≤

20; and very large transactions (VLT), when 20 < cs.

Also, we counted how many logical dependencies are

generated by transactions from each category. The

results are presented in Tables 2 and 3 as percent dis-

tributions.

In the main series of experiments, for each system,

we extracted its structural dependencies, its logical

dependencies and determined the overlap between the

two dependencies sets, in various experimental condi-

tions.

One variable experimental condition is whether

changes located in comments contribute towards log-

ical dependencies. This condition distinguishes be-

tween two different cases:

• with comments: a change in source code ﬁles is

counted towards a logical dependency, even if the

change is inside comments in all ﬁles

• without comments: commits that changed source

code ﬁles only by editing comments are ignored

as logical dependencies

In all cases, we varied the following threshold val-

Table 2: The percent distribution of commit transactions in

4 categories according to their size.

ST MT LT VLT

1 82.55 10.85 4.14 2.46

2 71.39 13.08 7.82 7.7

3 73.91 13.33 6.27 6.49

4 84.51 8.5 3.11 3.87

5 75.2 11.26 5.92 7.62

6 87.8 6.35 2.57 3.29

7 78.18 11.96 5.73 4.13

8 79.63 9.67 5.77 4.93

9 83.82 9.58 4.18 2.43

10 82.58 9.66 5.31 2.45

11 82.96 8.55 4.89 3.6

12 87.69 8.51 2.96 0.84

13 81.19 8.23 5.54 5.03

14 96.7 1.94 0.71 0.65

15 89.27 7.11 2.17 1.45

16 77.28 12.76 5.51 4.46

17 70.3 12.53 9.48 7.69

18 73.93 12.27 6.63 7.18

19 69.99 14.41 6.91 8.69

20 68.79 10.1 7.44 13.66

21 60.66 17.63 10.04 11.66

22 73.97 12.63 6.94 6.47

23 83.13 6.64 4.18 6.05

24 79.43 8.56 5.66 6.35

25 94.54 3.62 1.1 0.73

26 76.21 9.74 5.84 8.22

27 86.17 8.53 4.12 1.18

Avg 79.7 9.93 5.22 5.16

ues:

• commit size (cs): the maximum size of commit

transactions which are accepted to generate log-

ical dependencies. The values for this threshold

were 5, 10, 20 and no threshold (inﬁnity).

• number of occurrences (occ): the minimum num-

ber of repeated occurrences for a co-change to be

counted as logical dependency. The values for this

threshold were 1, 2, 3 and 4.

The six tables below present the synthesis of our

experiments. We have computed the following val-

ues:

• the mean ratio of the number of logical dependen-

cies (LD) to the number of structural dependen-

cies (SD)

• the mean percentage of structural dependencies

that are also logical dependencies (calculated

from the number of overlaps divided to the num-

ber of structural dependencies)

• the mean percentage of logical dependencies that

Identifying Logical Dependencies from Co-Changing Classes

489

Table 3: The percent distribution of logical dependencies

generated by commit transactions from each size category.

ST MT LT VLT

1 9,70 2,61 4,12 83,57

2 1,50 1,87 3,59 93,03

3 3,75 5,02 5,97 85,26

4 31,40 7,64 8,84 52,12

5 1,01 3,92 4,67 90,41

6 0,37 0,22 0,47 98,94

7 1,48 1,86 2,48 94,18

8 1,44 2,01 3,73 92,82

9 6,77 7,88 11,99 73,36

10 12,53 17,77 21,59 48,10

11 6,80 8,25 13,74 71,22

12 22,09 21,73 20,51 35,67

13 10,46 20,48 8,08 60,98

14 1,90 0,90 1,29 95,91

15 1,14 1,25 1,86 95,76

16 1,89 2,47 3,12 92,52

17 2,13 2,19 5,25 90,44

18 1,77 3,66 6,51 88,06

19 0,59 0,68 1,57 97,17

20 0,41 0,73 1,42 97,45

21 1,50 1,22 37,85 59,43

22 2,00 4,12 5,95 87,92

23 1,02 1,22 0,94 96,82

24 0,71 0,74 1,63 96,91

25 37,86 16,51 11,12 34,50

26 2,86 3,22 6,79 87,13

27 23,43 21,56 28,28 26,73

Avg 6,98 5,99 8,27 78,76

are also structural dependencies (calculated from

the number of overlaps divided to the number of

logical dependencies)

In all the six tables, 4, 5, 6, 7, 8, 9 we have on

columns the values used for the commit size cs, while

on rows we have the values for the number of occur-

rences threshold occ. The tables contain median val-

ues obtained for experiments done under all combina-

tions of the two threshold values, on all test systems.

In all tables, the upper right corner corresponds to

the most relaxed ﬁltering conditions, while the lower

left corner corresponds to the most restrictive ﬁltering

conditions.

Table 4: Ratio of number of LD to number of SD, case with

comments.

cs ≤ 5 cs ≤ 10 cs ≤ 20 cs < ∞

occ ≥ 1 3,39 5,67 9,00 80,31

occ ≥ 2 2,24 3,47 5,02 60,14

occ ≥ 3 1,04 2,53 3,52 44,68

occ ≥ 4 0,90 2,16 2,88 33,47

Table 5: Ratio of number of LD to number of SD, case

without comments.

cs ≤ 5 cs ≤ 10 cs ≤ 20 cs < ∞

occ ≥ 1 3,24 5,33 7,90 67,16

occ ≥ 2 1,35 3,27 4,72 47,39

occ ≥ 3 1,00 1,67 2,49 32,39

occ ≥ 4 0,43 1,26 1,93 22,15

Table 6: Percentage of SD that are also LD, case with com-

ments.

cs ≤ 5 cs ≤ 10 cs ≤ 20 cs < ∞

occ ≥ 1 19,75 29,86 39,29 76,59

occ ≥ 2 12,50 20,20 27,68 66,11

occ ≥ 3 8,49 14,22 19,94 55,99

occ ≥ 4 6,58 10,95 15,76 47,12

Table 7: Percentage of SD that are also LD, case without

comments.

cs ≤ 5 cs ≤ 10 cs ≤ 20 cs < ∞

occ ≥ 1 18,88 28,47 37,44 71,12

occ ≥ 2 11,87 19,03 25,93 59,58

occ ≥ 3 8,00 13,09 18,15 48,65

occ ≥ 4 5,85 9,94 14,27 39,07

Table 8: Percentage of LD that are also SD, case with com-

ments.

cs ≤ 5 cs ≤ 10 cs ≤ 20 cs < ∞

occ ≥ 1 12,02 8,86 6,72 1,79

occ ≥ 2 15,05 11,71 9,38 2,21

occ ≥ 3 17,45 13,97 11,57 2,86

occ ≥ 4 18,96 15,28 12,94 3,67

Table 9: Percentage of LD that are also SD, case without

comments.

cs ≤ 5 cs ≤ 10 cs ≤ 20 cs < ∞

occ ≥ 1 12,05 9,02 6,98 1,93

occ ≥ 2 15,08 12,03 9,66 2,42

occ ≥ 3 17,78 14,37 12,24 3,28

occ ≥ 4 19,22 15,59 13,30 4,21

5 DISCUSSION

This section uses the experimental results to answer

the research questions outlined in section 3.

Question 1. Which is the most frequent size for a

commit transaction ?

Table 2 presents the size distribution for commit

transactions in percentage relative to the total number

of commits for each system presented in Table 1. The

small commit transactions (with less than 5 source

code ﬁles)represent in average 78.76% from the to-

tal number of transactions. On the opposite side are

ENASE 2019 - 14th International Conference on Evaluation of Novel Approaches to Software Engineering

490

the very large commit transactions (with more than 20

source code ﬁles) which represent an average percent-

age of 5.99% from the total number of transactions.

Based on these results we can say that the vast ma-

jority of the commit transactions have no more than 5

source code ﬁles.

Question 2. Is it necessary to set a threshold on

the size of commit transactions which are consid-

ered to generate valid logical dependencies ? Logi-

cal dependencies are generated for all pairs of classes

which have changed in the same commit transaction.

The number of logical dependencies generated from a

commit transaction is proportional with the square of

the number of participating classes. Table 3 presents

how many logical dependencies are extracted from

commit transactions of different sizes. Based on the

results from Table 3 and Table 2 we see that the com-

mit transactions with less than 5 ﬁles, which are the

most frequent types of commits, produce in average

only 6.98% of the total logical dependencies extracted

from the systems. On the other hand, a small amount

of very large commits (those with more than 20 source

code ﬁles) can lead to a vast amount of logical de-

pendencies. But very large commit transactions can

be caused by merging development branches into the

main branch. In this case the very large commit trans-

action is actually the sum of many other commit trans-

actions made into a different branch and we cannot

consider them as one single commit and deﬁnitely we

cannot consider the logical dependencies extracted as

valid logical dependencies. So a threshold to ﬁlter this

kind of commit transactions is required.

Based on the results presented in Tables 4 and 5,

the number of changed ﬁles taken into consideration

has an important inﬂuence over the ratio of the num-

ber of logical dependencies to the number of struc-

tural dependencies. If no threshold is set for the num-

ber of ﬁles in a commit (the cases in the last column

in Tables 4 and 5 ) then the number of logical depen-

dencies outnumbers the structural dependencies with

a factor of up to 80. The maximum factor is measured

in the case when no ﬁltering is done on the number

of occurrences (ﬁrst row). In this case, we can not

talk about logical dependencies, but about classes that

happened to once change in the same time, by various

reasons. The number of pairs of classes that happen to

once change in the same time is up to 80 times bigger

than the number of pairs of classes presenting struc-

tural dependencies.

When ﬁltering is done according to conditions on

the number of occurrences, we observe in Tables 4

and 5 that the values on the last column still do not

fall below 20. This number is still too big to accept

for logical dependencies. It is clear that it is necessary

to put a threshold on the number of ﬁles accepted in a

commit in order to ﬁlter out noise.

If we refer to the overlap between structural and

logical dependencies, we can see in Tables 6 and 7

that the percentage of structural dependencies which

are also logical dependencies is as well affected by

setting a threshold on the number of ﬁles accepted in

a commit. Setting a threshold leads to a smaller num-

ber of logical dependencies overall and this is what

affects also the smaller number of structural depen-

dencies that are also logical dependencies. However,

we can see that the percentage of dependencies in the

overlap decreases much slower than the total number

of logical dependencies. For example, when setting

the cs threshold at 10, we see in Table 4 that the total

number of logical dependencies decreases approx 15

times compared with no threshold. In the same time,

we can see in Table 6 that the overlap between the log-

ical and structural dependencies decreases less, only

approx 3 times. This conﬁrms the fact that the logical

dependencies ﬁltered out were not true dependencies.

It is clear that setting a threshold on the maximum

number of ﬁles accepted in a commit is essential for

the quality of ﬁnding true logical dependencies.

Question 3. Considering changes only in com-

ments as valid can lead to additional logical depen-

dencies? How many logical dependencies are in-

troduced by considering comment changes as valid

changes and in what percentage this can inﬂuence the

analysis?

In order to assess the inﬂuence of comments, we

compare pairwise Tables 4 and 5, Tables 6 and 7

and Tables 8 and 9. We observe that, although there

are some differences between pairs of measurements

done in similar conditions with and without com-

ments, the differences are not signiﬁcant.

In the case of the ratio of the number of logical de-

pendencies to the number of structural dependencies,

from Tables 4 and 5 we can see that the maximum

difference is for the values from the position of the

ﬁrst row, last column. Without comments, the value

of the ratio is 67.1, compared to the value with com-

ments which is 80.3. The decrease represents 13% of

the value with comments. In the case of the percent-

age of structural dependencies that are also logical de-

pendencies, from Tables 6 and 7, we can see that the

maximum difference is also for the values from the

ﬁrst row, last column. Without comments, the overlap

is 76.5, compared to the value with comments which

is 71.1. The decrease represents less than 6% of the

value with comments. We notice that the differences

between the two cases are very small.

Question 4. How many occurrences of a logical

dependency are needed to consider it a valid logical

Identifying Logical Dependencies from Co-Changing Classes

491

dependency?

If we look at consecutive rows in Table 4 or in Ta-

ble 5, corresponding to increased threshold values for

the number of occurrences, we can roughly say that

increasing by 1 the occurrence threshold while main-

taining the other conditions reduces with one third the

total number of logical dependencies.

In order to ﬁnd the appropriate level of ﬁltering

out false logical dependencies, we assume as a rule of

thumb that the number of logical dependencies should

not be bigger that the number of structural dependen-

cies. Choosing the most restrictive combination of

thresholds (a commit size threshold of 5 ﬁles com-

bined with an occurrence threshold of 4) leads to a

number of logical dependencies which comes near to

the number of structural dependencies.

Question 5. How does ﬁltering affect the overlap

between structural and logical dependencies?

The overlap between structural and logical depen-

dencies is given by the number of pairs of classes that

have both structural and logical dependencies. We

evaluate this overlap as a percentage relative to the

number of structural dependencies in Tables 6 and 7,

respectively as a percentage relative to the number of

logical dependencies in Tables 8 and 9.

A ﬁrst observation from Tables 6 and 7 is that not

all pairs of classes with structural dependencies co-

change. The biggest value for the percentage of struc-

tural dependencies that are also logical dependencies

is 76.5% obtained in the case when no ﬁlterings are

done.

From Tables 8 and 9 we notice that the percent-

age of logical dependencies which are also structural

is always low to very low. This means that most co-

changes are recorded between classes that have no

structural dependencies to each other.

6 FUTURE WORK

We consider that in the future, the validation of ex-

tracted logical dependencies will occur by using them

to enhance dependency models used by different ap-

plications such as architectural reconstruction (S¸ora

et al., 2010), (S¸ora, 2015), and evaluating the positive

impact on their results.

In this work we have extracted logical dependen-

cies from all the revisions of the system, and structural

dependencies from the last revision of the system. In

future work we will take into account also structural

dependencies from all the revisions of the system, in

order to ﬁlter out the old logical dependencies. Some

logical dependencies may have been also structural in

previous revisions of the system but not in the current

one. Another way to investigate this problem could

be to study the trend of occurrencies of co-changes: if

co-changes between a pair of classes used to happen

more often in the remote past than in the more recent

past, it may be a sign that the problem causing the

logical coupling has been removed in the mean time.

7 CONCLUSION

In this work we experimentally deﬁne methods to

ﬁlter out the valid logical dependencies from co-

changing classes.

Our experiments show that the most important

factors which affect the quality of logical dependen-

cies are: the maximum size of commit transactions

which are accepted to generate logical dependencies,

and the minimum number of repeated occurrences for

a co-change to be counted as logical dependency.

We conclude that it is important to put a threshold

on the maximum size of commit transactions which

are accepted to generate logical dependencies. Only

small commit transactions (changing up to 5 source

code ﬁles) can be reliably used for introducing logi-

cal dependencies. We have also determined that small

commit transactions are the most frequent kind of

transactions, representing in average 80% of all com-

mit transactions. Under these conditions, we have de-

termined that increasing the threshold for the mini-

mum number of repeated occurrences for a co-change

to be counted as a logical dependency reduces signif-

icantly the number of logical dependencies. In aver-

age, increasing with 1 the threshold for repeated oc-

currences determines a reduction to half for the num-

ber of logical dependencies. A value of 4 for the

threshold for repeated occurrences, combined with

the condition of accepting only small commit trans-

actions, already keeps the number of logical depen-

dencies in the same range as the number of structural

dependencies. Future work will investigate further

the issue of repeated occurrences, analyzing also their

trend in time.

The analysis of the experimental data shows that

logical dependencies are distinct from structural de-

pendencies. Even after ﬁltering, a very big percent-

age of logical dependencies are between classes with-

out structural dependencies. This leads to the conclu-

sion that including into dependency models also logi-

cal dependencies besides structural dependencies has

the potential to improve analysis applications based

on dependency models.

ENASE 2019 - 14th International Conference on Evaluation of Novel Approaches to Software Engineering

492

ACKNOWLEDGEMENTS

This work was partially supported by a grant

of the Romanian National Authority for Scientiﬁc

Research and Innovation, CNCS/CCCDI UEFIS-

CDI, project number PN-III-P2-2.1-PED-2016-0999,

within PNCDI III.

REFERENCES

Ajienka, N. and Capiluppi, A. (2017). Understanding the

interplay between the logical and structural coupling

of software classes. Journal of Systems and Software,

134:120–137.

Ajienka, N., Capiluppi, A., and Counsell, S. (2018). An em-

pirical study on the interplay between semantic cou-

pling and co-change of software classes. Empirical

Software Engineering, 23(3):1791–1825.

Beck, F. and Diehl, S. (2011). On the congruence of mod-

ularity and code coupling. In Proceedings of the 19th

ACM SIGSOFT Symposium and the 13th European

Conference on Foundations of Software Engineering,

ESEC/FSE ’11, pages 354–364, New York, NY, USA.

ACM.

S¸ ora, I. (2015). Helping program comprehension of large

software systems by identifying their most important

classes. In Evaluation of Novel Approaches to Soft-

ware Engineering - 10th International Conference,

ENASE 2015, Barcelona, Spain, April 29-30, 2015,

Revised Selected Papers, pages 122–140. Springer In-

ternational Publishing.

S¸ ora, I., Glodean, G., and Gligor, M. (2010). Soft-

ware architecture reconstruction: An approach based

on combining graph clustering and partitioning. In

Computational Cybernetics and Technical Informatics

(ICCC-CONTI), 2010 International Joint Conference

on, pages 259–264.

Ducasse, S. and Pollet, D. (2009). Software architecture

reconstruction: A process-oriented taxonomy. IEEE

Transactions on Software Engineering, 35(4):573–

591.

Gall, H., Hajek, K., and Jazayeri, M. (1998). Detection of

logical coupling based on product release history. In

Proceedings of the International Conference on Soft-

ware Maintenance, ICSM ’98, pages 190–, Washing-

ton, DC, USA. IEEE Computer Society.

Kagdi, H., Gethers, M., Poshyvanyk, D., and Collard, M. L.

(2010). Blending conceptual and evolutionary cou-

plings to support change impact analysis in source

code. In 2010 17th Working Conference on Reverse

Engineering, pages 119–128.

Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., Ger-

man, D. M., and Damian, D. (2016). An in-depth

study of the promises and perils of mining github. Em-

pirical Software Engineering, 21(5):2035–2071.

Oliva, G. A. and Gerosa, M. A. (2011). On the interplay

between structural and logical dependencies in open-

source software. In Proceedings of the 2011 25th

Brazilian Symposium on Software Engineering, SBES

’11, pages 144–153, Washington, DC, USA. IEEE

Computer Society.

Oliva, G. A. and Gerosa, M. A. (2015). Experience

report: How do structural dependencies inﬂuence

change propagation? an empirical study. In 26th

IEEE International Symposium on Software Relia-

bility Engineering, ISSRE 2015, Gaithersbury, MD,

USA, November 2-5, 2015, pages 250–260.

Poshyvanyk, D., Marcus, A., Ferenc, R., and Gyim

othy, T.

(2009). Using information retrieval based coupling

measures for impact analysis. Empirical Software En-

gineering, 14(1):5–32.

Ren, X., Ryder, B. G., Stoerzer, M., and Tip, F. (2005).

Chianti: a change impact analysis tool for java pro-

grams. In Proceedings. 27th International Conference

on Software Engineering, 2005. ICSE 2005., pages

664–665.

Shtern, M. and Tzerpos, V. (2012). Clustering method-

ologies for software engineering. Adv. Soft. Eng.,

2012:1:1–1:1.

Sora, I. (2013). Software architecture reconstruction

through clustering: Finding the right similarity fac-

tors. In Proceedings of the 1st International Work-

shop in Software Evolution and Modernization - Vol-

ume 1: SEM, (ENASE 2013), pages 45–54. INSTICC,

SciTePress.

Wiese, I. S., Kuroda, R. T., Re, R., Oliva, G. A., and Gerosa,

M. A. (2015). An empirical study of the relation be-

tween strong change coupling and defects using his-

tory and social metrics in the apache aries project. In

Damiani, E., Frati, F., Riehle, D., and Wasserman,

A. I., editors, Open Source Systems: Adoption and Im-

pact, pages 3–12, Cham. Springer International Pub-

lishing.

Yu, L. (2007). Understanding component co-evolution with

a study on linux. Empirical Software Engineering,

12(2):123–141.

Zimmermann, T., Weisgerber, P., Diehl, S., and Zeller, A.

(2004). Mining version histories to guide software

changes. In Proceedings of the 26th International

Conference on Software Engineering, ICSE ’04, pages

563–572, Washington, DC, USA. IEEE Computer So-

ciety.

Identifying Logical Dependencies from Co-Changing Classes

493