Machine Learning Studies of Non-coding RNAs based on Artificially Constructed Training Data
Mirele C. S. F. Costa, João Victor A. Oliveira, Waldeyr M. C. da Silva, Waldeyr M. C. da Silva, Rituparno Sen, Jörg Fallmann, Peter F. Stadler, Peter F. Stadler, Peter F. Stadler, Peter F. Stadler, Maria Emília M. T. Walter
2021
Abstract
Machine learning (ML) methods are often used to identify members of non-coding RNA classes such as microRNAs or snoRNAs. However, ML methods have not been successfully used for homology search tasks. A systematic evaluation of ML in homology search requires large, controlled, and known ground truth test sets, and thus, methods to construct large realistic artificial data sets. Here we describe a method for producing sets of arbitrarily large and diverse snoRNA sequences based on artificial evolution. These are then used to evaluate supervised ML methods (Support Vector Machine, Artificial Neural Network, and Random Forest) for snoRNA detection in a chordate genome. Our results indicate that ML approaches can indeed be competitive also for homology search.
DownloadPaper Citation
in Harvard Style
Costa M., Oliveira J., C. da Silva W., Sen R., Fallmann J., Stadler P. and Walter M. (2021). Machine Learning Studies of Non-coding RNAs based on Artificially Constructed Training Data. In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 3: BIOINFORMATICS; ISBN 978-989-758-490-9, SciTePress, pages 176-183. DOI: 10.5220/0010346000002865
in Bibtex Style
@conference{bioinformatics21,
author={Mirele C. S. F. Costa and João Victor A. Oliveira and Waldeyr M. C. da Silva and Rituparno Sen and Jörg Fallmann and Peter F. Stadler and Maria Emília M. T. Walter},
title={Machine Learning Studies of Non-coding RNAs based on Artificially Constructed Training Data},
booktitle={Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 3: BIOINFORMATICS},
year={2021},
pages={176-183},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010346000002865},
isbn={978-989-758-490-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 3: BIOINFORMATICS
TI - Machine Learning Studies of Non-coding RNAs based on Artificially Constructed Training Data
SN - 978-989-758-490-9
AU - Costa M.
AU - Oliveira J.
AU - C. da Silva W.
AU - Sen R.
AU - Fallmann J.
AU - Stadler P.
AU - Walter M.
PY - 2021
SP - 176
EP - 183
DO - 10.5220/0010346000002865
PB - SciTePress