
ing values without the need for imputation, which for
now limits the length of the windows.
Finally, users might find themselves limited by the
Gurobi license. Academic license is free but limited
to three instances of Gurobi running at the same time,
limiting the multi-threading potential. Attempts to
use the free CBC solver showed a decrease in per-
formance.
SOFTWARE AVAILABILITY
strainMiner is freely availabe on github at
github.com/RolandFaure/strainMiner, with the
Affero GPL3 license.
ACKNOWLEDGEMENTS
We wish to thank Dominique Lavenier, who formu-
lated the first version of the optimization problem.
Many thanks to Riccardo Vicedomini for his help
in pre-reviewing the article.
ChatGPT was used to correct and reformulate the
writing of the article.
REFERENCES
Albanese, D. and Donati, C. (2017). Strain profiling and
epidemiology of bacterial species from metagenomic
sequencing. Nature Communications, 8.
Baaijens, J., Aabidine, A., Rivals, E., and Sch
¨
onhuth, A.
(2017). De novo assembly of viral quasispecies using
overlap graphs. Genome Research.
Bansal, V. (2022). Hapcut2: A method for phasing genomes
using experimental sequence data. Methods in molec-
ular biology, 2590:139–147.
Bertrand, D., Shaw, J., Kalathiyappan, M., Ng, A. H. Q.,
Kumar, M. S., Li, C., Dvornicic, M., Soldo, J. P.,
Koh, J. Y., Tong, C., Ng, O. T., Barkham, T., Young,
B., Marimuthu, K., Chng, K. R., Sikic, M., and Na-
garajan, N. (2019). Hybrid metagenomic assembly
enables high-resolution analysis of resistance deter-
minants and mobile elements in human microbiomes.
Nature Biotechnology, 37(8):937–944.
Bickhart, D., Kolmogorov, M., Tseng, E., Portik, D., Ko-
robeynikov, A., Tolstoganov, I., Uritskiy, G., Liachko,
I., Sullivan, S., Shin, S., Zorea, A., Pascal, V., Panke-
Buisse, K., Medema, M., Mizrahi, I., Pevzner, P., and
Smith, T. (2022). Generating lineage-resolved, com-
plete metagenome-assembled genomes from complex
microbial communities. Nature Biotechnology, 40.
Cleary, B., Brito, I., Huang, K., Gevers, D., Shea, T.,
Young, S., and Alm, E. (2015). Detection of low-
abundance bacterial strains in metagenomic datasets
by eigengenome partitioning. Nature biotechnology,
33.
Faure, R., Flot, J.-F., and Lavenier, D. (2023). Hairsplit-
ter: separating strains in metagenome assemblies with
long reads. In Proceedings of JOBIM 2023, pages
124–131.
Fedarko, M., Kolmogorov, M., and Pevzner, P. (2022). An-
alyzing rare mutations in metagenomes assembled us-
ing long and accurate reads. Genome research, 32.
Feng, X., Cheng, H., Portik, D., and Li, H. (2022).
Metagenome assembly of high-fidelity long reads
with hifiasm-meta. Nature Methods, 19:1–4.
Feng, Z., Clemente, J., Wong, B., and Schadt, E. (2021).
Detecting and phasing minor single-nucleotide vari-
ants from long-read sequencing data. Nature Commu-
nications, 12:3032.
Fix, E. and Hodges, J. L. (1989). Discriminatory analy-
sis. nonparametric discrimination: Consistency prop-
erties. International Statistical Review / Revue Inter-
nationale de Statistique, 57(3):238–247.
Frank, C., Werber, D., Cramer, J. P., Askar, M., Faber, M.,
an der Heiden, M., Bernard, H., Fruth, A., Prager,
R., Spode, A., Wadl, M., Zoufaly, A., Jordan, S.,
Kemper, M. J., Follin, P., M
¨
uller, L., King, L. A.,
Rosner, B., Buchholz, U., Stark, K., and Krause, G.
(2011). Epidemic profile of shiga-toxin–producing
escherichia coli o104:h4 outbreak in germany. New
England Journal of Medicine, 365(19):1771–1780.
PMID: 21696328.
Gurobi Optimization, LLC (2023). Gurobi [computer soft-
ware]. gurobi.com.
Kang, X., Luo, X., and Sch
¨
onhuth, A. (2022). StrainXpress:
strain aware metagenome assembly from short reads.
Nucleic Acids Research, 50(17):e101–e101.
Kazantseva, E., Donmez, A., Pop, M., and Kolmogorov, M.
(2023). stRainy: assembly-based metagenomic strain
phasing using long reads. preprint, Bioinformatics.
Kolmogorov, M., Bickhart, D. M., Behsaz, B., Gurevich,
A., Rayko, M., Shin, S. B., Kuhn, K., Yuan, J., Pole-
vikov, E., Smith, T. P. L., and Pevzner, P. A. (2020).
metaFlye: scalable long-read metagenome assembly
using repeat graphs. Nature Methods, 17(11):1103–
1110.
Koren, S., Walenz, B. P., Berlin, K., Miller, J. R., Bergman,
N. H., and Phillippy, A. M. (2017). Canu: scalable
and accurate long-read assembly via adaptive k -mer
weighting and repeat separation. Genome Research,
27(5):722–736.
Mapleson, D., Accinelli, G., Kettleborough, G., Wright, J.,
and Clavijo, B. (2016). Kat: A k-mer analysis toolkit
to quality control ngs datasets and genome assemblies.
Bioinformatics (Oxford, England), 33.
Mikheenko, A., Saveliev, V., and Gurevich, A. (2015).
Metaquast: Evaluation of metagenome assemblies.
Bioinformatics, 32:btv697.
Peeters, R. (2003). The maximum edge biclique problem is
NP-complete. 131:651–654.
Quince, C., Delmont, T. O., Raguideau, S., Alneberg, J.,
Darling, A. E., Collins, G., and Eren, A. M. (2017).
Assembling Close Strains in Metagenome Assemblies Using Discrete Optimization
355