
seed generates two binary seeds: the #-seed requires
both R and M bits, while the @-seed uses only R bits.
The final “signature” number is created by concate-
nating the results from both seeds. Alternatively, full
32-bit blocks can be combined first, followed by in-
complete blocks.
4 CONCLUSION
We have developed algorithms to calculate hash val-
ues for spaced seeds and genetic sequences. These al-
gorithms are designed to leverage SIMD instructions,
enabling the formation of numbers using as few oper-
ations as possible. We started with a straightforward
method for compacting strings with gaps, which in-
volves shifting and masking operations.
Public codes to generate these functions are
at https://github.com/vtman/comBiTeS. Examples of
codes to pre-align reads using these functions are at
https://github.com/vtman/perlotSeeds, and the results
of their application to real data are at (Titarenko and
Titarenko, 2024).
The next step is to profile our developed code
against existing alignment solutions. Additionally, we
will explore advanced shuffling techniques and data
interleaving operations for further investigation.
REFERENCES
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and
Lipman, D. J. (1990). Basic local alignment search
tool. Journal of Molecular Biology, 215(3):403–410.
Baichoo, S. and Ouzounis, C. A. (2017). Computational
complexity of algorithms for sequence comparison,
short-read assembly and genome alignment. Biosys-
tems, 156-157:72–85.
Brejov
´
a, B., Brown, D. G., and Vina
ˇ
r, T. (2005). Vector
seeds: an extension to spaced seeds. Journal of Com-
puter and System Sciences, 70(3):364–380.
Buhler, J. (2001). Efficient large-scale sequence compar-
ison by locality-sensitive hashing. Bioinformatics,
17(5):419–428.
Delcher, A. L., Kasif, S., Fleischmann, R. D., Peterson,
J., White, O., and Salzberg, S. L. (1999). Align-
ment of whole genomes. Nucleic Acids Research,
27(11):2369–2376.
Feng, D., Johnson, M., and Doolittle, R. (1985). Align-
ing amino acid sequences: Comparison of com-
monly used methods. Journal of Molecular Evolution,
21(2):112–125.
Firtina, C., Park, J., Alser, M., Kim, J. S., Cali, D.,
Shahroodi, T., Ghiasi, N., Singh, G., Kanellopoulos,
K., Alkan, C., and Mutlu, O. (2023). BLEND: a
fast, memory-efficient and accurate mechanism to find
fuzzy seed matches in genome analysis. NAR Ge-
nomics and Bioinformatics, 5(1):lqad004.
Girotto, S., Comin, M., and Pizzi, C. (2018). Efficient com-
putation of spaced seed hashing with block indexing.
BMC Bioinformatics, 19(15):441.
Gotoh, O. (1982). An improved algorithm for matching
biological sequences. Journal of Molecular Biology,
162(3):705–708.
Graur, D. and Li, W.-H. (2000). Fundamentals of Molecular
Evolution. Sinauer, Sunderland, MA, 2 edition.
Hamming, R. W. (1950). Error detecting and error cor-
recting codes. The Bell System Technical Journal,
29(2):147–160.
Intel (2023). Intel intrinsics guide. www.intel.com/content/
www/us/en/docs/intrinsics-guide/index.html.
Li, H. and Durbin, R. (2009). Fast and accurate short read
alignment with Burrows–Wheeler transform. Bioin-
formatics, 25(14):1754–1760.
Ma, B., Tromp, J., and Li, M. (2002). PatternHunter: faster
and more sensitive homology search. Bioinformatics,
18(3):440–445.
Mak, D., Gelfand, Y., and Benson, G. (2006). Indel seeds
for homology search. Bioinformatics, 22(14):e341–
e349.
Myers, E. W. and Miller, W. (1988). Optimal alignments in
linear space. Bioinformatics, 4(1):11–17.
Needleman, S. B. and Wunsch, C. D. (1970). A gen-
eral method applicable to the search for similarities
in the amino acid sequence of two proteins. Journal
of Molecular Biology, 48(3):443–453.
No
´
e, L. and Kucherov, G. (2004). Improved hit criteria for
DNA local alignment. BMC Bioinformatics, 5(1):149.
Pearson, W. R. and Lipman, D. J. (1988). Improved tools
for biological sequence comparison. Proc. Natl. Acad.
Sci. USA, 85(8):2444–2448.
Sessions, S. (2013). Genome size. In Maloy, S. and Hughes,
K., editors, Brenner’s Encyclopedia of Genetics (Sec-
ond Edition), pages 301–305. Academic Press, San
Diego, 2 edition.
Smith, T. F. and Waterman, M. S. (1981). Identification of
common molecular subsequences. Journal of Molec-
ular Biology, 147(1):195–197.
Titarenko, V. and Titarenko, S. (2023). PerFSeeB: Design-
ing long high-weight single spaced seeds for full sen-
sitivity alignment with a given number of mismatches.
BMC Bioinformatics, 24:396.
Titarenko, V. and Titarenko, S. (2024). Examples of se-
quence alignment with contiguous, binary and ternary
seeds. 10.5281/zenodo.10645042.
Waterman, M., Smith, T., and Beyer, W. (1976). Some bi-
ological sequence metrics. Advances in Mathematics,
20(3):367–387.
Wilbur, W. J. and Lipman, D. J. (1983). Rapid similarity
searches of nucleic acid and protein data banks. Proc.
Natl. Acad. Sci. USA, 80(3):726–730.
Xu, J., Brown, D., Li, M., and Ma, B. (2006). Optimizing
multiple spaced seeds for homology search. Journal
of Computational Biology, 13(7):1355–1368.
BIOINFORMATICS 2025 - 16th International Conference on Bioinformatics Models, Methods and Algorithms
626