RUDEUS: A Machine Learning Classification System to Study DNA-Binding Proteins

David Medina-Ortiz, David Medina-Ortiz, Gabriel Cabas-Mora, Iván Moya, Nicole Soto-García, Roberto Uribe-Paredes

2024

Abstract

DNA-binding proteins play crucial roles in biological processes such as replication, transcription, pack-aging, and chromatin remodeling. Their study has gained importance across scientific fields, with computational biology complementing traditional methods. While machine learning has advanced bioinformatics, generalizable pipelines for identifying DNA-binding proteins and their specific interactions remain scarce. We present RUDEUS, a Python library with hierarchical classification models to identify DNA-binding proteins and distinguish between single- and double-stranded DNA interactions. RUDEUS integrates protein language models, supervised learning, and Bayesian optimization, achieving 95% precision in DNA-binding identification and 89% accuracy in distinguishing interaction types. The library also includes tools for annotating unknown sequences and validating DNA-protein interactions through molecular docking. RUDEUS delivers competitive performance and is easily integrated into protein engineering workflows. It is available under the MIT License, with the source code and models available on the GitHub repository https://github.com/ProteinEngineering-PESB2/RUDEUS.

Download


Paper Citation


in Harvard Style

Medina-Ortiz D., Cabas-Mora G., Moya I., Soto-García N. and Uribe-Paredes R. (2024). RUDEUS: A Machine Learning Classification System to Study DNA-Binding Proteins. In Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR; ISBN 978-989-758-716-0, SciTePress, pages 302-310. DOI: 10.5220/0012946500003838


in Bibtex Style

@conference{kdir24,
author={David Medina-Ortiz and Gabriel Cabas-Mora and Iván Moya and Nicole Soto-García and Roberto Uribe-Paredes},
title={RUDEUS: A Machine Learning Classification System to Study DNA-Binding Proteins},
booktitle={Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR},
year={2024},
pages={302-310},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012946500003838},
isbn={978-989-758-716-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR
TI - RUDEUS: A Machine Learning Classification System to Study DNA-Binding Proteins
SN - 978-989-758-716-0
AU - Medina-Ortiz D.
AU - Cabas-Mora G.
AU - Moya I.
AU - Soto-García N.
AU - Uribe-Paredes R.
PY - 2024
SP - 302
EP - 310
DO - 10.5220/0012946500003838
PB - SciTePress