USTAR2: Fast and Succinct Representation of k-mer Sets Using De Bruijn Graphs
Enrico Rossignolo, Matteo Comin
2024
Abstract
A fundamental operation within the realm of computational genomics revolves around the reduction of input sequences into their constituent k-mers. The development of space-efficient methods to represent a collection of k-mers assumes significant importance in advancing the scalability of bioinformatics analyses. One prevalent strategy involves transforming the set of k-mers into a de Bruijn graph and subsequently devising a streamlined representation of this graph by identifying the smallest path cover. In this article, we introduce USTAR2, a novel algorithm for the compression of k-mers. USTAR2 harnesses the principles of node connectivity in the de Bruijn graph, for a more efficient selection of paths for constructing the path cover. We performed a series of test on the compression of real read datasets, and compared USTAR2 with several other tools. USTAR2 achieved the best performance in terms of compression, it requires less memory and it is also considerably faster (up to 96x). The code of USTAR2 is available at the repository https://github.com/CominLab/USTAR2.
DownloadPaper Citation
in Harvard Style
Rossignolo E. and Comin M. (2024). USTAR2: Fast and Succinct Representation of k-mer Sets Using De Bruijn Graphs. In Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 1: BIOINFORMATICS; ISBN 978-989-758-688-0, SciTePress, pages 368-378. DOI: 10.5220/0012423100003657
in Bibtex Style
@conference{bioinformatics24,
author={Enrico Rossignolo and Matteo Comin},
title={USTAR2: Fast and Succinct Representation of k-mer Sets Using De Bruijn Graphs},
booktitle={Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 1: BIOINFORMATICS},
year={2024},
pages={368-378},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012423100003657},
isbn={978-989-758-688-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 1: BIOINFORMATICS
TI - USTAR2: Fast and Succinct Representation of k-mer Sets Using De Bruijn Graphs
SN - 978-989-758-688-0
AU - Rossignolo E.
AU - Comin M.
PY - 2024
SP - 368
EP - 378
DO - 10.5220/0012423100003657
PB - SciTePress