plexity of single algorithms.
Since BNDM-EDS gain in comparison to SopANG
clearly depends on variability in the processed EDS,
we included another experiment comparing both al-
gorithms on synthetic data with different probability
of degenerate EDS segment. Figure 3 depicts the
achieved search time in seconds. The probability of
degenerate EDS segment in single files starts from
0.01 and continues up to 0.5. The size of the files
is 1 600000 bases and we searched for a randomly
chosen pattern of length m = 16. The results show
that starting from 0.05 probability of degenerate seg-
ment, the length of processed elements (of the solid
segments) is shorter than the pattern length m = 16.
This implies that BNDM part of BNDM-EDS is not uti-
lized for most of the processed elements.
5 CONCLUSION AND FUTURE
WORK
We proposed BNDM-EDS algorithm which is the first
backward pattern matching algorithm for Elastic De-
generate Strings. Its worst-case time is bounded
by O(Nmd
m
w
e). Moreover, for small patterns (m ≤
w), BNDM-EDS achieves average time complexity
O(
N(1−v)log
σ
m
m
) + O(Nvm(1 −v)) + O(Nvβδ) which
is optimal. Our goal was to design a simple (easy-to-
implement) algorithm achieving fast search times on
real-world data. This expectation was confirmed by
experiments on real-world data set. BNDM-EDS proves
its superiority especially on data with lower degener-
ate rate and for longer patterns.
We plan to extend BNDM-EDS for protein alpha-
bet in future work. Furthermore, we want to focus
on processing more general variants of Elastic De-
generate Strings, such as recursive Elastic Degenerate
Strings or colored Elastic Degenerate Strings (allow-
ing to map elements to individuals of the sequenced
population).
ACKNOWLEDGEMENTS
The research was partially supported by the Czech
Science Foundation as project No. 19-20759S.
REFERENCES
Aoyama, K., Nakashima, Y., I, T., Inenaga, S., Bannai, H.,
and Takeda, M. (2018). Faster Online Elastic Degen-
erate String Matching. In Navarro, G., Sankoff, D.,
and Zhu, B., editors, Annual Symposium on Combi-
natorial Pattern Matching (CPM 2018), volume 105
of Leibniz International Proceedings in Informatics
(LIPIcs), pages 9:1–9:10, Dagstuhl, Germany. Schloss
Dagstuhl–Leibniz-Zentrum fuer Informatik.
Bernardini, G., Gawrychowski, P., Pisanti, N., Pissis, S. P.,
and Rosone, G. (2019). Even faster elastic-degenerate
string matching via fast matrix multiplication. In 46th
International Colloquium on Automata, Languages,
and Programming, ICALP 2019, July 9-12, 2019, Pa-
tras, Greece., pages 21:1–21:15.
Bernardini, G., Pisanti, N., Pissis, S. P., and Rosone, G.
(2017). Pattern matching on elastic-degenerate text
with errors. In Fici, G., Sciortino, M., and Ven-
turini, R., editors, String Processing and Information
Retrieval, pages 74–90, Cham. Springer International
Publishing.
Church, D. M. et al. (2015). Extending reference assembly
models. Genome Biology, 16(13).
Cisłak, A., Grabowski, S., and Holub, J. (2018). Sopang:
online text searching over a pan-genome. Bioinfor-
matics, page bty506.
Consortium, T. C. P.-G. (2016). Computational pan-
genomics: status, promises and challenges. Briefings
in Bioinformatics, 19(1):118–135.
Consortium, The 1000 Genomes Project (2011). A map
of human genome variation from population-scale se-
quencing. Nature, 473:544 EP –. Corrigendum.
Consortium, The UK10K (2015). The uk10k project identi-
fies rare variants in health and disease. Nature, 526:82
EP –.
Cooley, J. W. and Tukey, J. W. (1965). An algorithm for the
machine calculation of complex fourier series. Math-
ematics of Computation, 19(90):297–301.
Crochemore, M., Hancart, C., and Lecroq, T. (2007). Algo-
rithms on Strings. Cambridge University Press, New
York, NY, USA.
Crochemore, M. and Rytter, W. (1994). Text algorithms.
Oxford University Press.
Danecek, P., Auton, A., Abecasis, G., Albers, C. A., Banks,
E., DePristo, M. A., Handsaker, R. E., Lunter, G.,
Marth, G. T., Sherry, S. T., McVean, G., and Durbin,
R. (2011). The variant call format and vcftools. Bioin-
formatics, 27(15):2156–2158.
Dilthey, A. et al. (2015). Improved genome inference in
the mhc using a population reference graph. Nature
Genetics, 47:682.
D
¨
om
¨
olki, B. (1964). An algorithm for syntactical analy-
sis. Computational Linguistics, 3:29–46. Hungarian
Academy of Science, Budapest.
Grossi, R., Iliopoulos, C. S., Liu, C., Pisanti, N., Pissis,
S. P., Retha, A., Rosone, G., Vayani, F., and Versari, L.
(2017). On-line pattern matching on similar texts. In
28th Symposium on Combinatorial Pattern Matching,
CPM 2017, July 4-6, 2017, Warsaw, Poland, pages
9:1–9:14.
Iliopoulos, C. S., Kundu, R., and Pissis, S. P. (2017). Ef-
ficient pattern matching in elastic-degenerate texts.
In Drewes, F., Mart
´
ın-Vide, C., and Truthe, B., ed-
itors, Language and Automata Theory and Applica-
BIOINFORMATICS 2021 - 12th International Conference on Bioinformatics Models, Methods and Algorithms
58