Authors:
Malik Yousef
1
;
Walid Khaleifa
2
and
Tugba Onal-Suzek
3
;
4
Affiliations:
1
Department of Community Information Systems, Zefat Academic College, Zefat, Israel
;
2
Computer Science, The College of Sakhnin, Sakhnin, Israel
;
3
Department of Computer Engineering, Mugla Sitki Kocman University, Mugla, Turkey
;
4
Bioinformatics Graduate Program, Mugla Sitki Kocman University, Mugla, Turkey
Keyword(s):
ncRNA, Machine Learning, Differentiate Reliable ncRNA-ncRNA Interactions, k-mer ncRNA Categorization.
Abstract:
A recent catalogue of human transcriptome, namely CHESS database, assembled from RNA sequencing experiments as a part of the Genotype-Tissue Expression (GTEx) Project reported more non-coding RNA genes (21,856) than protein-coding (21,306), revealing an unexpectedly vast amount of transcriptional noise (Pertea et al, 2018). In this study, we introduce a workflow coded in KNIME that computationally distinguishes the ncRNA-ncRNA interaction sites with less reliable interaction sites containing less experimentally validated binding sites than the interaction sites with more experimental validation. Duplex structure and k-mer features of the ncRNA-ncRNA binding sites with experimental verification were used as input to the classification workflow. In our analysis, we observed that although duplex structure features had no positive effect on the success rate of the classification, using just the k-mer features, ~80% success could be achieved in categorization of the confidence of the ncRN
A-ncRNA binding sites. Our result verified the classification performance of miRNA-mRNA targets using only k-mer features from our previous study (Yousef et al, 2018).
(More)