Authors:
Jonny A. Uribe
1
;
Julián D. Arias-Londoño
1
and
Alexandre Perera-Lluna
2
Affiliations:
1
Universidad de Antioquia, Colombia
;
2
Universitat Politècnica de Catalunya, Spain
Keyword(s):
Intrinsically Disordered Proteins, Intrinsically Disordered Regions, Entropy Measures, Kullback-Leibler Divergence, Dihedral Torsion Angles, Ramachandran Plot, Conditional Random Fields.
Abstract:
This paper addresses the problem of order/disorder prediction in protein sequences from alignment free methods.
The proposed approach is based on a set of 11 information theory measures estimated from the distribution
of the dihedral torsion angles in the amino acid chain. The aim is to characterize the energetically allowed
regions for amino acids in the protein structures, as a way of measuring the rigidity/flexibility of every amino
acid in the chain, and the effect of such rigidity on the disorder propensity. The features are estimated from
empirical Ramachandran Plots obtained using the Protein Geometry Database. The proposed features are used
in conjunction with well-established features in the state of the art for disorder prediction. The classification
is performed using two different strategies: one based on conventional supervised methods and the other one
based on structural learning. The performance is evaluated in terms of AUC (Area Under the ROC Curve), and
thr
ee suitable performance metrics for unbalanced classification problems. The results show that the proposed
scheme using conventional supervised methods is able to achieve results similar than well-known alignment
free methods for disorder prediction. Moreover, the scheme based on structural learning outperforms the
results obtained for all the methods evaluated, including three alignment-based methods.
(More)