Machine Learning Studies of Non-coding RNAs based on Artificially Constructed Training Data

Mirele C. S. F. Costa, João Victor A. Oliveira, Waldeyr M. C. da Silva, Waldeyr M. C. da Silva, Rituparno Sen, Jörg Fallmann, Peter F. Stadler, Peter F. Stadler, Peter F. Stadler, Peter F. Stadler, Maria Emília M. T. Walter

2021

Abstract

Machine learning (ML) methods are often used to identify members of non-coding RNA classes such as microRNAs or snoRNAs. However, ML methods have not been successfully used for homology search tasks. A systematic evaluation of ML in homology search requires large, controlled, and known ground truth test sets, and thus, methods to construct large realistic artificial data sets. Here we describe a method for producing sets of arbitrarily large and diverse snoRNA sequences based on artificial evolution. These are then used to evaluate supervised ML methods (Support Vector Machine, Artificial Neural Network, and Random Forest) for snoRNA detection in a chordate genome. Our results indicate that ML approaches can indeed be competitive also for homology search.

Download


Paper Citation


in Harvard Style

Costa M., Oliveira J., C. da Silva W., Sen R., Fallmann J., Stadler P. and Walter M. (2021). Machine Learning Studies of Non-coding RNAs based on Artificially Constructed Training Data. In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 3: BIOINFORMATICS; ISBN 978-989-758-490-9, SciTePress, pages 176-183. DOI: 10.5220/0010346000002865


in Bibtex Style

@conference{bioinformatics21,
author={Mirele C. S. F. Costa and João Victor A. Oliveira and Waldeyr M. C. da Silva and Rituparno Sen and Jörg Fallmann and Peter F. Stadler and Maria Emília M. T. Walter},
title={Machine Learning Studies of Non-coding RNAs based on Artificially Constructed Training Data},
booktitle={Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 3: BIOINFORMATICS},
year={2021},
pages={176-183},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010346000002865},
isbn={978-989-758-490-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 3: BIOINFORMATICS
TI - Machine Learning Studies of Non-coding RNAs based on Artificially Constructed Training Data
SN - 978-989-758-490-9
AU - Costa M.
AU - Oliveira J.
AU - C. da Silva W.
AU - Sen R.
AU - Fallmann J.
AU - Stadler P.
AU - Walter M.
PY - 2021
SP - 176
EP - 183
DO - 10.5220/0010346000002865
PB - SciTePress