Authors:
Ahtisham Fazeel
1
;
2
;
Areeb Agha
2
;
Andreas Dengel
1
;
2
and
Sheraz Ahmed
1
Affiliations:
1
German Research Center for Artificial Intelligence, Kaiserslautern, Germany
;
2
Department of Computer Science, Technical University of Kaiserslautern, Germany
Keyword(s):
Nucleosome Position, DNA, Genomics, Language Models, Transformers, BERT, Masked Language Modeling, Transfer Learning.
Abstract:
Nucleosomes are complexes of histone and DNA base pairs in which DNA is wrapped around histone proteins to achieve compactness. Nucleosome positioning is associated with various biological processes such as DNA replication, gene regulation, DNA repair, and its dysregulation can lead to various diseases such as sepsis, and tumor. Since nucleosome positioning can be determined only to a limited extent in wet lab experiments, various artificial intelligence-based methods have been proposed to identify nucleosome positioning. Existing predictors/tools do not provide consistent performance, especially when evaluated on 12 publicly available benchmark datasets. Given such limitation, this study proposes a nucleosome positioning predictor, namely NP-BERT. NP-BERT is extensively evaluated in different settings on 12 publicly available datasets from 4 different species. Evaluation results reveal that NP-BERT achieves significant performance on all datasets, and beats state-of-the-art methods
on 8/12 datasets, and achieves equivalent performance on 2 datasets. The codes and datasets used in this study are provided in https://github.com/FAhtisham/Nucleosome-position-prediction.
(More)