
statistics with multi-genome references. In Data Com-
pression Conference (DCC), pages 193–202. IEEE.
B
¨
uchler, T., Olbrich, J., and Ohlebusch, E. (2023). Effi-
cient short read mapping to a pangenome that is rep-
resented by a graph of ED strings. Bioinformatics,
39(5):btad320.
Burrows, M. and Wheeler, D. (1994). A block-sorting loss-
less data compression algorithm. Research Report
124, Digital Systems Research Center.
Chain, P. et al. (2003). An applications-focused review of
comparative genomics tools: Capabilities, limitations
and future challenges. Briefings in Bioinformatics,
4(2):105–123.
Cormen, T., Leiserson, C., and Rivest, R. (1990). Introduc-
tion to Algorithms. MIT Press, Cambridge, MA.
Gog, S. et al. (2014). From theory to practice: Plug and
play with succinct data structures. In Experimental
Algorithms (SEA), pages 326–337. Springer.
H
¨
ohl, M., Kurtz, S., and Ohlebusch, E. (2002). Effi-
cient multiple genome alignment. Bioinformatics,
18:S312–S320.
Jacobson, G. and Vo, K. (1992). Heaviest increas-
ing/common subsequence problems. In Combinato-
rial Pattern Matching (CPM), pages 52–66. Springer.
Kasai, T. et al. (2001). Linear-time longest-common-prefix
computation in suffix arrays and its applications. In
Combinatorial Pattern Matching (CPM), pages 181–
192. Springer.
Li, H. (2021). New strategies to improve minimap2 align-
ment accuracy. Bioinformatics, 37(23):4572–4574.
Liao, W. et al. (2023). A draft human pangenome reference.
Nature, 617(7960):312–324.
Logsdon, G. et al. (2024). The variation and evolution of
complete human centromeres. Nature, 629:136–145.
Louza, F., Gog, S., and Telles, P. (2017). Inducing enhanced
suffix arrays for string collections. Theoretical Com-
puter Science, 678:22–39.
Nakamura, T. et al. (2018). Parallelization of MAFFT for
large-scale multiple sequence alignments. Bioinfor-
matics, 34(14):2490–2492.
Ohlebusch, E. and Kurtz, S. (2008). Space efficient com-
putation of rare maximal exact matches between mul-
tiple sequences. Journal of Computational Biology,
15(4):357–377.
Olbrich, J., Ohlebusch, E., and B
¨
uchler, T. (2024). Generic
non-recursive suffix array construction. ACM Trans-
actions on Algorithms, 20(2):18.
Porubsky, D. et al. (2021). Fully phased human genome as-
sembly without parental data using single-cell strand
sequencing and long reads. Nature Biotechnology,
39(3):302–308.
Puglisi, S., Smyth, W., and Turpin, A. (2007). A taxonomy
of suffix array construction algorithms. ACM Comput-
ing Surveys, 39(2):Article 4.
Tang, F. et al. (2022). HAlign 3: fast multiple alignment of
ultra-large numbers of similar DNA/RNA sequences.
Molecular Biology and Evolution, 39(8):msac166.
Tettelin, H. et al. (2005). Genome analysis of multiple
pathogenic isolates of streptococcus agalactiae: im-
plications for the microbial ”pan-genome”. Proceed-
ings of the National Academy of Sciences of the United
States of America, 102(39):13950–13955.
The 1000 Genomes Project Consortium (2015). A global
reference for human genetic variation. Nature,
526(7571):68–74.
Treangen, T. et al. (2014). The Harvest suite for rapid core-
genome alignment and visualization of thousands of
intraspecific microbial genomes. Genome Biology,
15(11):524.
Zhang, P. et al. (2024). FMAlign2: a novel fast multiple
nucleotide sequence alignment method for ultralong
datasets. Bioinformatics, 40(1):btae014.
APPENDIX
A Enumeration of LCP-Intervals
(Kasai et al., 2001) presented a linear time algorithm
to simulate the bottom-up traversal of a suffix tree
with a suffix array and its LCP-array (which, given
the suffix array, can be constructed in linear time).
The following algorithm is a slight modification of
their algorithm TraverseWithArray, cf. (Abouelhoda
et al., 2004). It computes all lcp-intervals of the
LCP-array with the help of a stack. The elements
on the stack are lcp-intervals represented by tuples
⟨lcp,lb,rb⟩, where lcp is the lcp-value of the inter-
val, lb is its left boundary, and rb is its right bound-
ary. In Algorithm 1, push (pushes an element onto the
stack) and pop (pops an element from the stack and
returns that element) are the usual stack operations,
while top provides a pointer to the topmost element
of the stack. Furthermore, ⊥ stands for an undefined
value. We assume that array indexing starts at 1 and
that LCP[1] = −1 = LCP[n + 1].
Function Enumerate(LCP):
push(⟨0,1, ⊥⟩);
for k = 2 → n + 1 do
lb ← k − 1;
while LCP[k] < top().lcp do
top().rb ← k − 1;
interval ← pop();
report(interval);
lb ← interval.lb;
end
if LCP[k] > top().lcp then
push(⟨LCP[k], lb,⊥⟩);
end
end
Algorithm 1: Given the LCP-array of a string of length n,
this algorithm enumerates all lcp-intervals.
BIOINFORMATICS 2025 - 16th International Conference on Bioinformatics Models, Methods and Algorithms
468