similar to that of the standard pipeline, in line with
VarGeno. However, GenoLight is up to 8 times faster
than the standard pipeline, while requiring a limited
amount of RAM, and it can be executed on a standard
laptop, unlike the other mapping-free tools. As a fu-
ture direction of investigation, it would be interesting
to extend GenoLight for the detection of other genetic
variations such as insertions and deletions.
REFERENCES
(2021). 100,000 genomes pilot on rare-disease diagnosis
in health care — preliminary report. New England
Journal of Medicine, 385(20):1868–1880.
Andreace, F., Pizzi, C., and Comin, M. (2021a). Metaprob
2: Improving unsupervised metagenomic binning
with efficient reads assembly using minimizers. In
Jha, S. K., M
˘
andoiu, I., Rajasekaran, S., Skums, P.,
and Zelikovsky, A., editors, Computational Advances
in Bio and Medical Sciences, pages 15–25, Cham.
Springer International Publishing.
Andreace, F., Pizzi, C., and Comin, M. (2021b). Metaprob
2: Metagenomic reads binning based on assembly
using minimizers and k-mers statistics. Journal of
Computational Biology, 28(11):1052–1062. PMID:
34448593.
B
ˇ
rinda, K., Baym, M. H., and Kucherov, G. (2021). Sim-
plitigs as an efficient and scalable representation of de
bruijn graphs. Genome Biology, 22.
Brandt, D. Y. C., Aguiar, V. R. C., Bitarello, B. D., Nunes,
K., Goudet, J., and Meyer, D. (2015). Mapping
Bias Overestimates Reference Allele Frequencies at
the HLA Genes in the 1000 Genomes Project Phase I
Data. G3 Genes—Genomes—Genetics, 5(5):931–941.
Denti, L., Previtali, M., Bernardini, G., Sch
¨
onhuth, A.,
and Bonizzoni, P. (2019). Malva: Genotyping
by mapping-free allele detection of known variants.
iScience, 18:20 – 27.
Ferragina, P. and Manzini, G. (2000). Opportunistic data
structures with applications. In Proceedings 41st An-
nual Symposium on Foundations of Computer Sci-
ence, pages 390–398.
G
¨
unther, T. and Nettelblad, C. (2019). The presence and
impact of reference bias on population genomic stud-
ies of prehistoric human populations. PLOS Genetics,
15(7):1–20.
Langmead, B. and Salzberg, S. L. (2012). Fast gapped-read
alignment with bowtie 2. Nature Methods, 9:357–359.
Li, H. (2011). A statistical framework for snp calling, mu-
tation discovery, association mapping and population
genetical parameter estimation from sequencing data.
Bioinformatics, 27 21:2987–93.
Li, H. and Durbin, R. (2010). Fast and accurate long-read
alignment with Burrows–Wheeler transform. Bioin-
formatics, 26(5):589–595.
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J.,
Homer, N., Marth, G., Abecasis, G., and Durbin, R.
(2009). The sequence alignment/map format and sam-
tools. Bioinformatics, 25:2078–2079.
Marchiori, D. and Comin, M. (2017). Skraken: Fast and
sensitive classification of short metagenomic reads
based on filtering uninformative k-mers. In BIOIN-
FORMATICS 2017 - 8th International Conference
on Bioinformatics Models, Methods and Algorithms,
Proceedings; Part of 10th International Joint Confer-
ence on Biomedical Engineering Systems and Tech-
nologies, BIOSTEC 2017, volume 3, pages 59–67.
Marco-Sola, S., Sammeth, M., Guig
´
o, R., and Ribeca, P.
(2012). The gem mapper: fast, accurate and versa-
tile alignment by filtration. Nature Methods, 9:1185–
1188.
McKenna, A. and et al. (2010). The genome analysis
toolkit: a mapreduce framework for analyzing next-
generation dna sequencing data. Genome research,
20:1297–1303.
Monsu, M. and Comin, M. (2021). Fast alignment of
reads to a variation graph with application to snp de-
tection. Journal of Integrative Bioinformatics, page
20210032.
Pastinen, T. and et al. (2000). A system for spe-
cific, high-throughput genotyping by allele-specific
primer extension on microarrays. Genome research,
10(7):1031–42.
Project, . G. (2008). Igsr and the 1000 genomes project.
Qian, J. and Comin, M. (2019). Metacon: Unsupervised
clustering of metagenomic contigs with probabilistic
k-mers statistics and coverage. BMC Bioinformatics,
20(367).
Qian, J., Marchiori, D., and Comin, M. (2018). Fast and
sensitive classification of short metagenomic reads
with skraken. In Peixoto, N., Silveira, M., Ali, H. H.,
Maciel, C., and van den Broek, E. L., editors, Biomed-
ical Engineering Systems and Technologies, pages
212–226, Cham. Springer International Publishing.
Rahman, A. and Medvedev, P. (2020). Representation
of k-mer sets using spectrum-preserving string sets.
bioRxiv.
Salavati, M., Bush, S. J., Palma-Vera, S., McCulloch, M.
E. B., Hume, D. A., and Clark, E. L. (2019). Elimina-
tion of reference mapping bias reveals robust immune
related allele-specific expression in crossbred sheep.
Frontiers in Genetics, 10:863.
Shajii, A., Yorukoglu, D., Yu, Y. W., and Berger, B. (2016).
Fast genotyping of known snps through approximate
k-mer matching. Bioinformatics, 32:538–544.
Sherry, S. T., Ward, M.-H., Kholodov, M., Baker, J., Phan,
L., Smigielski, E. M., and Sirotkin, K. (2001). dbSNP:
the NCBI database of genetic variation. Nucleic Acids
Research, 29(1):308–311.
Shibuya, Y. and Comin, M. (2019a). Better quality score
compression through sequence-based quality smooth-
ing. BMC Bioinformatics, 20. (Impact Factor 2.9).
Shibuya, Y. and Comin, M. (2019b). Indexing k-mers in
linear space for quality value compression. Journal of
Bioinformatics and Computational Biology, 17(5).
Siragusa, E., Weese, D., and Reinert, K. (2013). Fast and ac-
curate read mapping with approximate seeds and mul-
Efficient k-mer Indexing with Application to Mapping-free SNP Genotyping
69