Using a Random Forest Classifier to Find Nuclear Export Signals in Proteins of Arabidopsis thaliana

Claudia Rubiano, Thomas Merkle, Tim W. Nattkemper


This paper presents a new computational strategy for predicting Nuclear Export Signals (NESs) in proteins of the model plant Arabidopsis thaliana based on a random forest classifier. NESs are amino acid sequences that enable a protein to interact with a nuclear receptor and in this way to be exported from the nucleus to the cytoplasm. The proposed classifier uses two kinds of features, the sequence of the NESs expressed as the score obtained from a HMM profile and physicochemical properties of the amino acid residues expressed as amino acid index values. Around 5000 proteins from the total of protein sequences from Arabidopsis were predicted as containing NESs. A small group of these proteins was experimentally tested for the actual presence of an NES. 11 out of 13 tested proteins showed positive interaction with the receptor Exportin 1 (XPO1a) from Arabidopsis in yeast two-hybrid assays, which indicates they contain NESs. The experimental validation of the nuclear export activity in a selected group of proteins is an indicator of the potential usefulness of the tool. From the biological perspective, the nuclear export activity observed in those proteins strongly suggests that nucleo-cytoplasmic partitioning could be involved in regulation of their functions.


