loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Sebastian Wandelt and Ulf Leser

Affiliation: Humboldt-Universität zu Berlin, Germany

Keyword(s): Genome Compression, Referential Compression, String Search.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; BioInformatics & Pattern Discovery ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Methodologies and Technologies ; Operational Research ; Optimization ; Symbolic Systems

Abstract: Background: Improved sequencing techniques have led to large amounts of biological sequence data. One of the challenges in managing sequence data is efficient storage. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. However, so far sequences always have to be decompressed prior to an analysis. There is a need for algorithms working on compressed data directly, avoiding costly decompression. Summary: In our work, we address this problem by proposing an algorithm for exact string search over compressed data. The algorithm works directly on referentially compressed genome sequences, without needing an index for each genome and only using partial decompression. Results: Our string search algorithm for referentially compressed genomes performs exact string matching for large sets of genomes faster than using an index structure, e.g. suffix trees, for each genome , especially for short queries. We think that this is an important step towards space and runtime efficient management of large biological data sets. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.226.226.158

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Wandelt, S. and Leser, U. (2012). String Searching in Referentially Compressed Genomes. In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2012) - KDIR; ISBN 978-989-8565-29-7; ISSN 2184-3228, SciTePress, pages 95-102. DOI: 10.5220/0004143400950102

@conference{kdir12,
author={Sebastian Wandelt. and Ulf Leser.},
title={String Searching in Referentially Compressed Genomes},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2012) - KDIR},
year={2012},
pages={95-102},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004143400950102},
isbn={978-989-8565-29-7},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2012) - KDIR
TI - String Searching in Referentially Compressed Genomes
SN - 978-989-8565-29-7
IS - 2184-3228
AU - Wandelt, S.
AU - Leser, U.
PY - 2012
SP - 95
EP - 102
DO - 10.5220/0004143400950102
PB - SciTePress