Indexing k-mers in Linear-space for Quality Value Compression

Yoshihiro Shibuya; Yoshihiro Shibuya; Matteo Comin

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Indexing k-mers in Linear-space for Quality Value Compression

Topics: Algorithms and Software Tools; Next Generation Sequencing; Sequence Analysis

In Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOSTEC, 21-29, 2019 , Prague, Czech Republic

Authors: Yoshihiro Shibuya ^{1

;

2} and Matteo Comin ¹

Affiliations: ¹ Department of Information Engineering, University of Padua, via Gradenigo 6B, Padua, Italy ; ² Laboratoire d’Informatique Gaspard-Monge (LIGM), University Paris-Est Marne-la-Vallée, Bâtiment Copernic - 5, bd Descartes, Champs sur Marne, France

Keyword(s): k-mers, Indexing, Quality Score, Read Compression.

Abstract: Many bioinformatics tools heavily rely on k-mer dictionaries to describe the composition of sequences and allow for faster reference-free algorithms or look-ups. Unfortunately, naive k-mer dictionaries are very memory inefficient, requiring very large amount of storage space to save each k-mer. This problem is generally worsened by the necessity of an index for fast queries. In this work we discuss how to build an indexed linear reference containing a set of input k-mers, and its application to the compression of quality score in FASTQ files. Most of the entropy of sequencing data lies in the quality scores, and thus they are difficult to compress. Here, we present an application to improve the compressibility of quality values while preserving the information for SNPs calling. We show how a dictionary of significant k-mers, obtained from SNPs databases and multiple genomes, can be indexed in linear space and used to improve the compression of quality value. Availability: the softwar e is freely available at https://github.com/yhhshb/yalff. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.84

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Shibuya, Y. and Comin, M. (2019). Indexing k-mers in Linear-space for Quality Value Compression. In Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019) - BIOINFORMATICS; ISBN 978-989-758-353-7; ISSN 2184-4305, SciTePress, pages 21-29. DOI: 10.5220/0007369100210029

@conference{bioinformatics19,
author={Yoshihiro Shibuya and Matteo Comin},
title={Indexing k-mers in Linear-space for Quality Value Compression},
booktitle={Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019) - BIOINFORMATICS},
year={2019},
pages={21-29},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007369100210029},
isbn={978-989-758-353-7},
issn={2184-4305},
}

TY - CONF

JO - Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019) - BIOINFORMATICS
TI - Indexing k-mers in Linear-space for Quality Value Compression
SN - 978-989-758-353-7
IS - 2184-4305
AU - Shibuya, Y.
AU - Comin, M.
PY - 2019
SP - 21
EP - 29
DO - 10.5220/0007369100210029
PB - SciTePress