Complexity
The first step of Live-NJ is to run NJ. The running
time of NJ is O(n
3
). After that, each one of the O(n)
steps of Live-NJ takes O(n
2
) to calculate distances in
T
0
, constant time to verify if a leaf satisfies the condi-
tion stated by Theorem 2 (or, if b, c < a), and O(n
2
)
time to apply Equation 1 to calculate Q
0
. Thus, the
running time of Live-NJ is O(n
3
).
4 RESULTS AND DISCUSSION
In this section we present some preliminary validation
of Live-NJ, comparing its performance with that of
NJ when increasing nonadditivity. By performance
we mean the ability to minimize the score Q(M,d),
according to Equation 1.
The experiments were made using sets of nonad-
ditive matrices, grouped according to three parame-
ters: the number of objects, the index of nonadditivity
(explained below), and the percentage of live inter-
nal nodes. Two datasets have been built: in the first
we assessed the performance of NJ by increasing the
number of objects and the nonadditivity index. In the
second one we assessed the performance of Live-NJ
by increasing the three parameters.
Index of nonadditivity
A distance matrix M is additive only if its set of ob-
jects satisfies the properties of a metric space and also
the 4-point condition (4PC) (Setubal and Meidanis,
1997). In particular, by being a metric space, for any
triple of objects i, j,k, M
i j
≤ M
ik
+ M
k j
. This is the
well-known triangular inequality. 4PC states that, gi-
ven any quadruple of objects, we can label them i, j,
k and l such that M
i j
+ M
kl
= M
ik
+ M
jl
≥ M
il
+ M
jk
.
Let M be a distance matrix. Let α
0
be the number
of triples of M not satisfying the triangular inequa-
lity and β
0
be the number of quadruples not satisfying
4PC. We define the index of nonadditivity I
N
of M as
I
N
= (α
0
/α + β
0
/β)/2, where α and β are the total
number of triples and quadruples of M, respectively.
Notice that 0 ≤ I
N
≤ 1.
Performance assessment
The dataset built to assess the performance of
NJ consists of sets of nonadditive matrices, grou-
ped according to their number N of objects,
N = 10, 20, . . . 100 and their I
N
, in the ranges
(0,0.25],(0.25,0.5],(0.5,0.75],(0.75,1]. For each
value of N and each range of I
N
, a bucket of 100 ma-
trices was built.
Each input matrix is generated by first producing
a random tree, then generating a matrix from the tree.
By construction, such matrix is additive. Then the
matrix is disturbed, basically by choosing a random
triple i, j,k and making M
i j
= M
ik
+ M
k j
+ δ. This
alteration obviously changes α
0
and possibly chan-
ges β
0
, consequently modifying the nonadditivity in-
dex I
N
. For these experiments we used δ = 1.
As expected, the higher is I
N
, the worse the per-
formance of NJ is. Figure 10 shows the variation of
Q(M,d) for the trees built by NJ as I
N
increases, for
10, 50 and 100 objects. Another highlight is that NJ
scores tend to increase faster as N grows.
Figure 10: NJ scores given I
N
, for N = 10,50 e 100 objects.
To evaluate Live-NJ we used the same method
that was used in the evaluation of NJ, but we added
another parameter for the construction of the data-
set: the percentage of live internal nodes. So, be-
sides N and I
N
, we also used the percentage P =
20%,40%,60%,80% of live internal nodes over the
number of leaves. This time the additive matrices
were generated from random trees containing N =
10,20,...100 leaves plus P percent (over N) of live
internal nodes. Thus, for each N = 10, 20, . . . 100, I
N
in (0,0.25], (0.25,0.5], (0.5,0.75], (0.75,1] and
P = 20%,40%,60%,80% over N, a bucket of 100
nonadditive matrices was built.
The results for N = 10, 50 and 100 are shown
in Figures 11, 12 and 13, respectively. Each figure
shows the Live-NJ scores for all values of P. Taking
the same intervals of I
N
, Live-NJ presents a better per-
formance when compared to NJ, even with higher per-
centages of live internal nodes.
BIOINFORMATICS 2017 - 8th International Conference on Bioinformatics Models, Methods and Algorithms
200