onds and sublinear average time complexity. This was
proven on real pangenomic data from 1000 Genomes
Projects (Consortium, 2011).
We plan to design search algorithms for other
forms of pangenomic data in our future work. This
includes the algorithms optimized for so-called Elas-
tic Degenerate Strings which is another form of rep-
resentation genomic data for a population of the same
species.
ACKNOWLEDGEMENTS
The research was partially supported by OP
VVV project Research Center for Informatics no.
CZ.02.1.01/0.0/0.0/16 019/0000765.
REFERENCES
Baeza-yates, R. A. (1992). Text retrieval: Theory and prac-
tice. In In 12th IFIP World Computer Congress, vol-
ume I, pages 465–476. Elsevier Science.
Bernardini, G., Pisanti, N., Pissis, S. P., and Rosone, G.
(2017). Pattern matching on elastic-degenerate text
with errors. In Fici, G., Sciortino, M., and Ven-
turini, R., editors, String Processing and Information
Retrieval, pages 74–90, Cham. Springer International
Publishing.
Boyer, R. S. and Moore, J. S. (1977). A fast string searching
algorithm. Commun. ACM, 20(10):762–772.
Cisłak, A., Grabowski, S., and Holub, J. (2018). Sopang:
online text searching over a pan-genome. Bioinfor-
matics, page bty506.
Consortium, T. . G. P. (2011). A map of human genome
variation from population-scale sequencing. Nature,
473:544 EP –. Corrigendum.
Consortium, T. U. (2015). The uk10k project identifies rare
variants in health and disease. Nature, 526:82 EP –.
Crochemore, M., Iliopoulos, C. S., Kundu, R., Mohamed,
M., and Vayani, F. (2015). Linear algorithm for
conservative degenerate pattern matching. CoRR,
abs/1506.04559.
Crochemore, M. and Rytter, W. (1994). Text Algorithms.
Oxford University Press, Inc., New York, NY, USA.
D
¨
om
¨
olki, B. (1964). An algorithm for syntactical analy-
sis. Computational Linguistics, 3:29–46. Hungarian
Academy of Science, Budapest.
Grossi, R., Iliopoulos, C. S., Liu, C., Pisanti, N., Pissis,
S. P., Retha, A., Rosone, G., Vayani, F., and Versari, L.
(2017). On-line pattern matching on similar texts. In
28th Symposium on Combinatorial Pattern Matching,
CPM 2017, July 4-6, 2017, Warsaw, Poland, pages
9:1–9:14.
Holub, J., Smyth, W., and Wang, S. (2008). Fast pattern-
matching on indeterminate strings. Journal of Dis-
crete Algorithms, 6(1):37 – 50. Selected papers from
AWOCA 2005.
Horspool, R. N. (1980). Practical fast searching in strings.
Software: Practice and Experience, 10(6):501–506.
Iliopoulos, C. S., Kundu, R., and Pissis, S. P. (2017). Ef-
ficient pattern matching in elastic-degenerate texts.
In Drewes, F., Mart
´
ın-Vide, C., and Truthe, B., ed-
itors, Language and Automata Theory and Applica-
tions, pages 131–142, Cham. Springer International
Publishing.
Iliopoulos, C. S., Mouchard, L., and Rahman, M. S. (2008).
A new approach to pattern matching in degenerate
DNA/RNA sequences and distributed pattern match-
ing. Mathematics in Computer Science, 1(4):557–569.
Knuth, D. E., Morris, J. H., and Pratt, V. R. (1977). Fast
Pattern Matching in Strings. SIAM Journal on Com-
puting, 6(2):323–350.
Manber, U. (1997). A text compression scheme that allows
fast searching directly in the compressed file. ACM
Trans. Inf. Syst., 15(2):124–136.
Marschall, T. (2018). Computational pan-genomics: status,
promises and challenges. Briefings in Bioinformatics,
19(1):118–135.
Navarro, G. and Raffinot, M. (1998). A bit-parallel ap-
proach to suffix automata: Fast extended string match-
ing. In Proceedings of the 9th Annual Symposium
on Combinatorial Pattern Matching, CPM ’98, pages
14–33, London, UK, UK. Springer-Verlag.
Navarro, G. and Raffinot, M. (2002). Frontmatter, pages
i–iv. Cambridge University Press.
Proch
´
azka, P. and Holub, J. (2017). Byte-aligned pattern
matching in encoded genomic sequences. In 17th
Int. Workshop on Algorithms in Bioinformatics, WABI
2017, August 21-23, 2017, Boston, MA, USA, pages
20:1–20:13.
Puglisi, S. J., Smyth, W. F., and Turpin, A. (2006). Inverted
Files Versus Suffix Arrays for Locating Patterns in Pri-
mary Memory, pages 122–133. Springer Berlin Hei-
delberg, Berlin, Heidelberg.
Sunday, D. M. (1990). A very fast substring search algo-
rithm. Commun. ACM, 33(8):132–142.
Wu, S. and Manber, U. (1992). Agrep - a fast approximate
pattern-matching tool. In In Proc. of USENIX Techni-
cal Conference, pages 153–162.
On-line Searching in IUPAC Nucleotide Sequences
77