and Lonardi, 2016), and we shown that a significant
speedup can be obtained, thus resolving the slowdown
introduced by the use of spaced seeds.
The approaches that we presented rely on a greedy
preprocessing, similarly to the one adopted by ISSH.
Possible future extensions could focus on improving
this preprocessing: instead of using a greedy pol-
icy for selecting the group of previously computed
hashes from which to extract the positions to reuse,
it could be beneficial to investigate global optimiza-
tion schemes in order to make the computation even
faster.
REFERENCES
Apostolico, A., Guerra, C., Landau, G. M., and Pizzi,
C. (2016). Sequence similarity measures based on
bounded hamming distance. Theoretical Computer
Science, 638:76 – 90.
A.Zielezinski, Vinga, S., Almeida, J., and et al. (2017).
Alignment-free sequence comparison: benefits, appli-
cations, and tools. Genome Biol, 18:186.
B
ˇ
rinda, K., Sykulski, M., and Kucherov, G. (2015). Spaced
seeds improve k-mer-based metagenomic classifica-
tion. Bioinformatics, 31(22):3584.
Dencker, T., Leimeister, C.-A., Gerth, M., Bleidorn, C.,
Snir, S., and Morgenstern, B. (2019). ‘Multi-SpaM’:
a maximum-likelihood approach to phylogeny recon-
struction using multiple spaced-word matches and
quartet trees. NAR Genomics and Bioinformatics,
2(1). lqz013.
Girotto, S., Comin, M., and Pizzi, C. (2017a). Fast spaced
seed hashing. In Proceedings of the 17th Workshop
on Algorithms in Bioinformatics (WABI), volume 88
of Leibniz International Proceedings in Informatics,
pages 7:1–7:14.
Girotto, S., Comin, M., and Pizzi, C. (2017b). Metage-
nomic reads binning with spaced seeds. Theoretical
Computer Science, 698:88–99.
Girotto, S., Comin, M., and Pizzi, C. (2018a). Efficient
computation of spaced seed hashing with block index-
ing. BMC Bioinformatics, 19(15):441.
Girotto, S., Comin, M., and Pizzi, C. (2018b). Fsh: fast
spaced seed hashing exploiting adjacent hashes. Al-
gorithms for Molecular Biology, 13(1):8.
Hahn, L., Leimeister, C.-A., Ounit, R., Lonardi, S.,
and Morgenstern, B. (2016). Rasbhari: Optimiz-
ing spaced seeds for database searching, read map-
ping and alignment-free sequence comparison. PLOS
Computational Biology, 12(10):1–18.
Harris, R. S. (2007). Improved Pairwise Alignment of Ge-
nomic Dna. PhD thesis, University Park, PA, USA.
Kucherov, G., No
´
e, L., and Roytberg, M. A. (2006). A uni-
fying framework for seed sensitivity and its applica-
tion to subset seeds. Journal of Bioinformatics and
Computational Biology, 4(2):553–569.
Leimeister, C. and Morgenstern, B. (2014). Kmacs: the
k-mismatch average common substring approach to
alignment-free sequence comparison. Bioinformat-
ics., 30(14):2000–8.
Leimeister, C.-A., Boden, M., Horwege, S., Lindner, S., and
Morgenstern, B. (2014). Fast alignment-free sequence
comparison using spaced-word frequencies. Bioinfor-
matics, 30(14):1991.
Li, Y. and Ilie, L. (2017). Sprint: ultrafast protein–protein
interaction prediction of the entire human interac-
tome. BMC Bioinformatics, 18(485).
Ma, B., Tromp, J., and Li, M. (2002). Patternhunter: faster
and more sensitive homology search. Bioinformatics,
18(3):440.
Marc¸ais, G., Solomon, B., Patro, R., and Kingsford, C.
(2019). Sketching and sublinear data structures in ge-
nomics. Annual Review of Biomedical Data Science,
2(1):93–118.
Mohamadi, H., Chu, J., Vandervalk, B. P., and Birol, I.
(2016). ntHash: recursive nucleotide hashing. Bioin-
formatics, page btw397.
No
´
e, L. and Martin, D. E. K. (2014). A coverage criterion
for spaced seeds and its applications to support vector
machine string kernels and k-mer distances. Journal
of Computational Biology, 21(12):947–963.
Onodera, T. and Shibuya, T. (2013). The gapped spectrum
kernel for support vector machines. In Proceedings
of the 9th Conference on Machine Learning and Data
Mining in Pattern Recognition, MLDM’13, pages 1–
15. Springer-Verlag.
Ounit, R. and Lonardi, S. (2016). Higher classification
sensitivity of short metagenomic reads with clark-s.
Bioinformatics, 32(24):3823.
Ounit, R., Wanamaker, S., Close, T. J., and Lonardi, S.
(2015). Clark: fast and accurate classification of
metagenomic and genomic sequences using discrim-
inative k-mers. BMC Genomics, 16(1):1–13.
Petrucci, E., No
´
e, L., Pizzi, C., and Comin, M. (2020). It-
erative spaced seed hashing: Closing the gap between
spaced seed hashing and k-mer hashing. Journal of
Computational Biology, 27(2):223–233.
Rumble, S. M., Lacroute, P., Dalca, A. V., Fiume, M.,
Sidow, A., and Brudno, M. (2009). Shrimp: Accurate
mapping of short color-space reads. PLOS Computa-
tional Biology, 5(5):1–11.
R
¨
ohling, S., Linne, A., Schellhorn, J., Hosseini, M.,
Dencker, T., and Morgenstern, B. (2020). The num-
ber of k-mer matches between two dna sequences as
a function of k and applications to estimate phyloge-
netic distances. PLoS One, 15.
Wood, D., Lu, J., and Langmead, B. (2019). Improved
metagenomic analysis with kraken 2. Genome Biol,
20(257).
Wood, D. E. and Salzberg, S. L. (2014). Kraken: ultra-
fast metagenomic sequence classification using exact
alignments. Genome Biology, 15:R46.
BIOINFORMATICS 2023 - 14th International Conference on Bioinformatics Models, Methods and Algorithms
162