Efficient Hashing of Multiple Spaced Seeds with Application

Eleonora Mian; Enrico Petrucci; Cinzia Pizzi; Matteo Comin

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Efficient Hashing of Multiple Spaced Seeds with Application

Topics: Algorithms and Software Tools; Sequence Analysis

In Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOSTEC, 155-162, 2023 , Lisbon, Portugal

Authors: Eleonora Mian ; Enrico Petrucci ; Cinzia Pizzi and Matteo Comin

Affiliation: Department of Information Engineering, University of Padova, Padova, 35131, Italy

Keyword(s): k-Mers, Gapped q-Gram, Multiple Spaced Seeds, Efficient Hashing.

Abstract: Alignment-Free analysis of sequences has enabled high-throughput processing of sequencing data in many bioinformatics pipelines. Hashing k-mers is a common function across many alignment-free applications and it is widely used for indexing, querying and rapid similarity search. Recently, spaced seeds, a special type of pattern that accounts for errors or mutations, are routinely used instead of k-mers. Spaced seeds allow to improve the sensitivity, with respect to k-mers, in many applications, however the hashing of spaced seeds increases substantially the computational time. Moreover, if multiple spaced seeds are used the accuracy can further increases at the cost of running time. In this paper we address the problem of efficient multiple spaced seed hashing. The proposed algorithms exploit the similarity of adjacent spaced seed hash values in an input sequence in order to efficiently compute the next hashes. We report the results on several tests which show that our methods signifi cantly outperform the previously proposed algorithms, with a speedup that can reach 20x. We also apply these efficient spaced seeds hashing algorithms to an application in the field of metagenomic, the classification of reads performed by Clark-S (Ounit and Lonardi, 2016), and we shown that a significant speedup can be obtained, thus resolving the slowdown introduced by the use of multiple spaced seeds. Code available at: https://github.com/CominLab/MISSH. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.5

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Mian, E., Petrucci, E., Pizzi, C., Comin and M. (2023). Efficient Hashing of Multiple Spaced Seeds with Application. In Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2023) - BIOINFORMATICS; ISBN 978-989-758-631-6; ISSN 2184-4305, SciTePress, pages 155-162. DOI: 10.5220/0011632900003414

@conference{bioinformatics23,
author={Eleonora Mian and Enrico Petrucci and Cinzia Pizzi and Matteo Comin},
title={Efficient Hashing of Multiple Spaced Seeds with Application},
booktitle={Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2023) - BIOINFORMATICS},
year={2023},
pages={155-162},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011632900003414},
isbn={978-989-758-631-6},
issn={2184-4305},
}

TY - CONF

JO - Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2023) - BIOINFORMATICS
TI - Efficient Hashing of Multiple Spaced Seeds with Application
SN - 978-989-758-631-6
IS - 2184-4305
AU - Mian, E.
AU - Petrucci, E.
AU - Pizzi, C.
AU - Comin, M.
PY - 2023
SP - 155
EP - 162
DO - 10.5220/0011632900003414
PB - SciTePress