When we have a biological sequences with high
degree of similarity, the number of all diagonals and
also consistent diagonals will be huge, and this influ-
ences and require a higher execution time, because,
after some iteration to delete an inconsistent diago-
nals, the set of consistent diagonals becomes large,
so the graph becomes very large too, and this makes
the treatment methods as ”Inconsistent path” slower.
View the number of paths that must be tested before
finding the right one.
6.1 Comparison with DIALIGN 2.2
We made two implementations of our approach, the
first one called DiaWay (Diagonal Way), in this im-
plementation when we extract a diagonal we extract
also all its sub diagonals, if d
i
is inconsistent we delete
it completely, even if just a few residue of this diago-
nal are inconsistent with the current alignment. And
their sub diagonals will have a chance to be aligned.
Our second implementation called ”NewDi-
aWay”, in this implementation we do not extract all
sub diagonals, if we have some inconsistent residues
from a diagonal we can directly cut the part consistent
(the consistent residues ), and consider it as a new di-
agonal.
We compared our two implementations with DI-
ALIGN 2.2 (Morgenstern, 1999), the code is available
here http://bibiserv2.cebitec.uni-bielefeld.de/dialign/.
The execution time depends on the number of bi-
ological sequences, and their average length, see 2.
Table 2: Execution time for DIALIGN 2.2, DiaWay (DW),
and NewDiaWay (NDW).
|S|
¯
S NDW DW DIALIGN 2.2
3 530 63 421 1312
4 250 31 125 766
5 200 47 125 *
6 119 32 1203 1750
7 195 125 422 1640
8 294 437 1453 4062
9 325 1047 6828 6297
10 476 3204 14782 16094
11 190 875 3656 3281
The * in the table 2 means that the application
crashes and does not work for this example.
• We note that: the time to align 4 and 5 se-
quences is small compared to the time to align 3
sequences, because their length is much smaller
than 3 sequences.
• Note that the method DiaWay (DW) has a better
execution time than DIALIGN 2.2 for a small set
of sequences, and relatively small average length,
but for a large number of sequences its execution
time is very bad because DiaWay (DW) extracts
for each diagonal all its sub diagonals, hence the
set of diagonals becomes huge, and that implies a
very important execution time.
• The time execution for our implementation ”New
DiaWay” (NDW) is much smaller than the other 2
methods, DIALIGN and DiaWay (DW).
In the next test (see table 3), we will take just
2 biological sequences and we change every time
the length in order to see the influence of sequences
length on th execution time.
Table 3: Execution time in millisecond according to se-
quences length.
|S|
¯
S NDW DW DIALIGN 2.2
2 2000 203 57640 6796
2 3000 437 172359 14843
2 4032 812 392703 25718
2 5000 1235 710984 38828
2 7499 2828 2223297 81281
In this table we can see that: the execution time
of our NDW method is much smaller than DIALIGN
2.2.
7 CONCLUSION
In this paper we have presented an efficient and robust
algorithm for dealing with diagonals consistency con-
cept in DAILIGN method. This improvement comes
into the phase of the construction of all consistent di-
agonals, and then how they are going to be aligned.
We used graph theory modelling in order to represent
diagonals position and solve the problem of consis-
tency. Unlike previous approaches, which show that
a diagonal is consistent, our approach does the oppo-
site, it proves that a diagonal is inconsistent with di-
agonals already tested and accepted as consistent. We
made two implementations of our approach DiaWay
and NewDiaWay, and the results show that our imple-
mentations have a very good execution time compar-
ing to DIALIGN 2.2.
REFERENCES
Abdeddam, S. (1997). On incremental computation of tran-
sitive closure and greedy alignment. In Apostolico, A.
and Hein, J., editors, Combinatorial Pattern Match-
ing, volume 1264 of Lecture Notes in Computer Sci-
ence, pages 167–179. Springer Berlin Heidelberg.
BIOINFORMATICS2015-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms
230