Generating Features using Burrows Wheeler Transformation for
Biological Sequence Classification

Karthik Tangirala; Doina Caragea

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Generating Features using Burrows Wheeler Transformation for Biological Sequence Classification

Topics: Algorithms and Software Tools; Data Mining and Machine Learning; Pattern Recognition, Clustering and Classification

In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 0BIOSTEC, 196-203, 2014 , ESEO, Angers, Loire Valley, France

Authors: Karthik Tangirala and Doina Caragea

Affiliation: Kansas State University, United States

Keyword(s): Burrows Wheeler Transformation, Machine Learning, Supervised Learning, Feature Selection, Dimensionality Reduction, Biological Sequence Classification.

Related Ontology Subjects/Areas/Topics: Algorithms and Software Tools ; Bioinformatics ; Biomedical Engineering ; Data Mining and Machine Learning ; Pattern Recognition, Clustering and Classification

Abstract: Recent advancements in biological sciences have resulted in the availability of large amounts of sequence data (both DNA and protein sequences). The annotation of biological sequence data can be approached using machine learning techniques. Such techniques require that the input data is represented as a vector of features. In the absence of biologically known features, a common approach is to generate k-mers using a sliding window. A larger k value usually results in better features; however, the number of k-mer features is exponential in k, and many of the k-mers are not informative. Feature selection techniques can be used to identify the most informative features, but are computationally expensive when used over the set of all k-mers, especially over the space of variable length k-mers (which presumably capture better the information in the data). Instead of working with all k-mers, we propose to generate features using an approach based on Burrows Wheeler Transformation (BWT). Ou r approach generates variable length k-mers that represent a small subset of kmers. Experimental results on both DNA (alternative splicing prediction) and protein (protein localization) sequences show that the BWT features combined with feature selection, result in models which are better than models learned directly from k-mers. This shows that the BWT-based approach to feature generation can be used to obtain informative variable length features for DNA and protein prediction problems. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.181

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Tangirala, K., Caragea and D. (2014). Generating Features using Burrows Wheeler Transformation for Biological Sequence Classification. In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOSTEC 2014) - BIOINFORMATICS; ISBN 978-989-758-012-3; ISSN 2184-4305, SciTePress, pages 196-203. DOI: 10.5220/0004806201960203

@conference{bioinformatics14,
author={Karthik Tangirala and Doina Caragea},
title={Generating Features using Burrows Wheeler Transformation for Biological Sequence Classification},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOSTEC 2014) - BIOINFORMATICS},
year={2014},
pages={196-203},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004806201960203},
isbn={978-989-758-012-3},
issn={2184-4305},
}

TY - CONF

JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOSTEC 2014) - BIOINFORMATICS
TI - Generating Features using Burrows Wheeler Transformation for Biological Sequence Classification
SN - 978-989-758-012-3
IS - 2184-4305
AU - Tangirala, K.
AU - Caragea, D.
PY - 2014
SP - 196
EP - 203
DO - 10.5220/0004806201960203
PB - SciTePress